1,720,988 research outputs found
Nonlinear spectral geometry processing via the TV transform
We introduce a novel computational framework for digital geometry processing, based upon the derivation of a nonlinear operator associated to the total variation functional. Such an operator admits a generalized notion of spectral decomposition, yielding a convenient multiscale representation akin to Laplacian-based methods, while at the same time avoiding undesirable over-smoothing effects typical of such techniques. Our approach entails accurate, detail-preserving decomposition and manipulation of 3D shape geometry while taking an especially intuitive form: non-local semantic details are well separated into different bands, which can then be filtered and re-synthesized with a straightforward linear step. Our computational framework is flexible, can be applied to a variety of signals, and is easily adapted to different geometry representations, including triangle meshes and point clouds. We showcase our method through multiple applications in graphics, ranging from surface and signal denoising to enhancement, detail transfer, and cubic stylization
Foreword to the Special Section on Smart Tools and Applications in Graphics (STAG 2021)
The Special Section contains extended and revised versions of the best papers presented at the 8th Conference on Smart Tools and Applications in Graphics (STAG 2021), held virtually on October 28–29, 2021. Five papers were selected by appointed members from the Program Committee; extended versions were submitted and further reviewed by external experts. The result is a collection of paper spanning different Visual Computing domains: from vector graphics on surfaces to image vectorization, from 3D geometric modelling of geological data to computation of the geometric kernel of polyhedra, up to lens-based scene exploration of annotated visual data
Multimodal Neural Databases
The rise in loosely-structured data available through text, images,
and other modalities has called for new ways of querying them.
Multimedia Information Retrieval has filled this gap and has witnessed exciting progress in recent years. Tasks such as search and
retrieval of extensive multimedia archives have undergone massive
performance improvements, driven to a large extent by recent developments in multimodal deep learning. However, methods in this
field remain limited in the kinds of queries they support and, in
particular, their inability to answer database-like queries. For this
reason, inspired by recent work on neural databases, we propose
a new framework, which we name Multimodal Neural Databases
(MMNDBs). MMNDBs can answer complex database-like queries
that involve reasoning over different input modalities, such as text
and images, at scale. In this paper, we present the first architecture able to fulfill this set of requirements and test it with several
baselines, showing the limitations of currently available models.
The results show the potential of these new techniques to process
unstructured data coming from different modalities, paving the way
for future research in the area
Newton's fractals on surfaces via bicomplex algebra
We show how bicomplex numbers can be exploited for computing Newton's fractals in three and four dimensions.
The patterns derived from these fractals can be computed very efficiently on GPU as pixel shaders, and they are well suited for surface decoration, material masking and volumetric rendering
3D Human Pose Estimation Using Möbius Graph Convolutional Networks
3D human pose estimation is fundamental to understanding human behavior. Recently, promising results have been achieved by graph convolutional networks (GCNs), which achieve state-of-the-art performance and provide rather light-weight architectures. However, a major limitation of GCNs is their inability to encode all the transformations between joints explicitly. To address this issue, we propose a novel spectral GCN using the Möbius transformation (Möbius-GCN). In particular, this allows us to directly and explicitly encode the transformation between joints, resulting in a significantly more compact representation.
Compared to even the lightest architectures so far, our novel approach requires 90–98% fewer parameters, i.e. our lightest MöbiusGCN uses only 0.042M trainable parameters. Besides the drastic parameter reduction, explicitly encoding the transformation of joints also enables us to achieve state-of-the-art results. We evaluate our approach on the two challenging pose estimation benchmarks, Human3.6M and MPI-INF-3DHP, demonstrating both state-of-the-art results and the generalization capabilities of MöbiusGCN
Learning disentangled representations via product manifold projection
We propose a novel approach to disentangle the generative factors of variation underlying a given set of observations. Our method builds upon the idea that the (unknown) low-dimensional manifold underlying the data space can be explicitly modeled as a product of submanifolds. This definition of disentanglement gives rise to a novel weakly-supervised algorithm for recovering the unknown explanatory factors behind the data. At training time, our algorithm only requires pairs of non i.i.d. data samples whose elements share at least one, possibly multidimensional, generative factor of variation. We require no knowledge on the nature of these transformations, and do not make any limiting assumption on the properties of each subspace. Our approach is easy to implement, and can be successfully applied to different kinds of data (from images to 3D surfaces) undergoing arbitrary transformations. In addition to standard synthetic benchmarks, we showcase our method in challenging real-world applications, where we compare favorably with the state of the art
Generating Adversarial Surfaces via Band-Limited Perturbations
Adversarial attacks have demonstrated remarkable efficacy in altering the output of a learning model by applying a minimal perturbation to the input data. While increasing attention has been placed on the image domain, however, the study of adversarial perturbations for geometric data has been notably lagging behind. In this paper, we show that effective adversarial attacks can be concocted for surfaces embedded in 3D, under weak smoothness assumptions on the perceptibility of the attack. We address the case of deformable 3D shapes in particular, and introduce a general model that is not tailored to any specific surface representation, nor does it assume access to a parametric description of the 3D object. In this context, we consider targeted and untargeted variants of the attack, demonstrating compelling results in either case. We further show how discovering adversarial examples, and then using them for adversarial training, leads to an increase in both robustness and accuracy. Our findings are confirmed empirically over multiple datasets spanning different semantic classes and deformations
A parametric analysis of discrete Hamiltonian functional maps
In this paper we develop an in-depth theoretical investigation of the discrete Hamiltonian eigenbasis, which remains quite unexplored in the geometry processing community. This choice is supported by the fact that Dirichlet eigenfunctions can be equivalently computed by defining a Hamiltonian operator, whose potential energy and localization region can be controlled with ease. We vary with continuity the potential energy and study the relationship between the Dirichlet Laplacian and the Hamiltonian eigenbases with the functional map formalism. We develop a global analysis to capture the asymptotic behavior of the eigenpairs. We then focus on their local interactions, namely the veering patterns that arise between proximal eigenvalues. Armed with this knowledge, we are able to track the eigenfunctions in all possible configurations, shedding light on the nature of the functional maps. We exploit the Hamiltonian-Dirichlet connection in a partial shape matching problem, obtaining state of the art results, and provide directions where our theoretical findings could be applied in future research
GENERALIZED MULTI-SOURCE INFERENCE FOR TEXT CONDITIONED MUSIC DIFFUSION MODELS
Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation. Despite their versatility, they require estimating the joint distribution over the sources, necessitating pre-separated musical data, which is rarely available, and fixing the number and type of sources at training time. This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings. These models do not require separated data as they are trained on mixtures, can parameterize an arbitrary number of sources, and allow for rich semantic control. We propose an inference procedure enabling the coherent generation of sources and accompaniments. Additionally, we adapt the Dirac separator of MSDM to perform source separation. We experiment with diffusion models trained on Slakh2100 and MTG-Jamendo, showcasing competitive generation and separation results in a relaxed data setting
Multimodal Feature Fusion and Knowledge-Driven Learning via Experts Consult for Thyroid Nodule Classification
Computer-aided diagnosis (CAD) is becoming a prominent approach to assist clinicians spanning across multiple fields. These automated systems take advantage of various computer vision (CV) procedures, as well as artificial intelligence (AI) techniques, to formulate a diagnosis of a given image, e.g., computed tomography and ultrasound. Advances in both areas (CV and AI) are enabling ever increasing performances of CAD systems, which can ultimately avoid performing invasive procedures such as fine-needle aspiration. In this study, a novel end-to-end knowledge-driven classification framework is presented. The system focuses on multimodal data generated by thyroid ultrasonography, and acts as a CAD system by providing a thyroid nodule classification into the benign and malignant categories. Specifically, the proposed system leverages cues provided by an ensemble of experts to guide the learning phase of a densely connected convolutional network (DenseNet). The ensemble is composed by various networks pretrained on ImageNet, including AlexNet, ResNet, VGG, and others. The previously computed multimodal feature parameters are used to create ultrasonography domain experts via transfer learning, decreasing, moreover, the number of samples required for training. To validate the proposed method, extensive experiments were performed, providing detailed performances for both the experts ensemble and the knowledge-driven DenseNet. As demonstrated by the results, the proposed system achieves relevant performances in terms of qualitative metrics for the thyroid nodule classification task, thus resulting in a great asset when formulating a diagnosis
- …
