1,721,950 research outputs found
Manifold-Valued Image Generation with Wasserstein Generative Adversarial Nets
Generative modeling over natural images is one of the most fundamental machine learning problems. However, few modern generative models, including Wasserstein Generative Adversarial Nets (WGANs), are studied on manifold-valued images that are frequently encountered in real-world applications. To fill the gap, this paper first formulates the problem of generating manifold-valued images and exploits three typical instances: hue-saturation-value (HSV) color image generation, chromaticity-brightness (CB) color image generation, and diffusion-tensor (DT) image generation. For the proposed generative modeling problem, we then introduce a theorem of optimal transport to derive a new Wasserstein distance of data distributions on complete manifolds, enabling us to achieve a tractable objective under the WGAN framework. In addition, we recommend three benchmark datasets that are CIFAR-10 HSV/CB color images, ImageNet HSV/CB color images, UCL DT image datasets. On the three datasets, we experimentally demonstrate the proposed manifold-aware WGAN model can generate more plausible manifold-valued images than its competitors
Covariance pooling for facial expression recognition
Classifying facial expressions into different categoriesrequires capturing regional distortions of facial landmarks.We believe that second-order statistics such as covariance isbetter able to capture such distortions in regional facial features. In this work, we explore the benefits of using a manifold network structure for covariance pooling to improvefacial expression recognition. In particular, we first employsuch kind of manifold networks in conjunction with traditional convolutional networks for spatial pooling within individual image feature maps in an end-to-end deep learningmanner. By doing so, we are able to achieve a recognitionaccuracy of 58.14% on the validation set of Static FacialExpressions in the Wild (SFEW 2.0) and 87.0% on the validation set of Real-World Affective Faces (RAF) Database1.Both of these results are the best results we are aware of.Besides, we leverage covariance pooling to capture the temporal evolution of per-frame features for video-based facialexpression recognition. Our reported results demonstratethe advantage of pooling image-set features temporally bystacking the designed manifold network of covariance pooling on top of convolutional network layer
Automatic Workflow Monitoring in Industrial Environments
Robust automatic workflow monitoring using visual sensors in industrial environments is still an unsolved problem. This is mainly due to the difficulties of recording data in work settings and the environmental conditions (large occlusions, similar background/foreground) which do not allow object detection/tracking algorithms to perform robustly. Hence approaches analysing trajectories are limited in such environments. However, workflow monitoring is especially needed due to quality and safety requirements. In this paper we propose a robust approach for workflow classification in industrial environments. The proposed approach consists of a robust scene descriptor and an efficient time series analysis method. Experimental results on a challenging car manufacturing dataset showed that the proposed scene descriptor is able to detect both human and machinery related motion robustly and the used time series analysis method can classify tasks in a given workflow automatically
Building deep networks on Grassmann manifolds
Learning representations on Grassmann manifolds is popular in quite a few visual recognition tasks. In order to enable deep learning on Grassmann manifolds, this paper proposes a deep network architecture by generalizing the Euclidean network paradigm to Grassmann manifolds. In particular, we design full rank mapping layers to transform input Grassmannian data to more desirable ones, exploit re-orthonormalization layers to normalize the resulting matrices, study projection pooling layers to reduce the model complexity in the Grassmannian context, and devise projection mapping layers to respect Grassmannian geometry and meanwhile achieve Euclidean forms for regular output layers. To train the Grassmann networks, we exploit a stochastic gradient descent setting on manifolds of the connection weights, and study a matrix generalization of backpropagation to update the structured data. The evaluations on three visual recognition tasks show that our Grassmann networks have clear advantages over existing Grassmann learning methods, and achieve results comparable with state-of-the-art approaches
Discrimination of locomotion direction at different speeds: A comparison between macaque monkeys and algorithms
status: Publishe
Learning continuous piecewise non-linear activation functions for deep neural networks
Activation functions provide the non-linearity to deep neural networks, which are crucial for the optimization and performance improvement. In this paper, we propose a learnable continuous piece-wise nonlinear activation function (or CPN in short), which improves the widely used ReLU from three directions, i.e., finer pieces, non-linear terms and learnable parameterization. CPN is a continuous activation function with multiple pieces and incorporates non-linear terms in every interval. We give a general formulation of CPN and provide different implementations according to three key factors: whether the activation space is divided uniformly or not, whether the non-linear terms exist or not, and whether the activation function is continuous or not. We demonstrate the effectiveness of our method on image classification and single image super-resolution tasks by simply changing the activation function. For example, CPN improves 4.78% / 4.52% top-1 accuracy over ReLU on MobileNetV2_0.25 / MobileNetV2_0.35 for ImageNet classification and achieves better PSNR on several benchmarks for super-resolution. Our implementation is available at https: //github.com/xc-G/CPN
- …
