1,720,993 research outputs found
Accelerated low-rank sparse metric learning for person re-identification
Person re-identification is an open and challenging problem in computer vision. A surge of effort has been spent design the best feature representation, and to learn either the transformation of such features across cameras or an optimal matching metric. Metric learning solutions which are currently in vogue in the field generally require a dimensionality reduction pre-processing stage to handle the high-dimensionality of the adopted feature representation. Such an approach is suboptimal and a better solution can be achieved by combining such a step in the metric learning process. Towards this objective, a low-rank matrix which projects the high-dimensional vectors to a low-dimensional manifold with a discriminative Euclidean distance is introduced. The goal is achieved with a stochastic accelerated proximal gradient method. Experiments on two public benchmark datasets show that better performances than state-of-the-art methods are achieved
CE-VAE: Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement
Unmanned underwater image analysis for marine monitoring faces two key challenges: (i) degraded image quality due to light attenuation and (ii) hardware storage constraints limiting high-resolution image collection. Existing methods primarily address image enhancement with approaches that hinge on storing the full-size input. In contrast, we introduce the Capsule Enhanced Variational AutoEncoder (CE-VAE), a novel architecture designed to efficiently compress and enhance degraded underwater images. Our attention-aware image encoder can project the input image onto a latent space representation while being able to run online on a remote device. The only information that needs to be stored on the device or sent to a beacon is a compressed representation. There is a dual-decoder module that performs offline, full-size enhanced image generation. One branch reconstructs spatial details from the compressed latent space, while the second branch utilizes a capsule-clustering layer to capture entity-level structures and complex spatial relationships. This parallel decoding strategy enables the model to balance fine-detail preservation with context-aware enhancements. CE- VAE achieves state-of-the-art performance in underwater image enhancement on six benchmark datasets, providing up to 3 × higher compression efficiency than existing approaches. Code available at https://github.com/iN1k1/ce-vae-underwater-image-enhancement
Self-Attention Agreement among Capsules
At the state of the art, Capsule Networks (CapsNets) have shown to be a promising alternative to Convolutional Neural Networks (CNNs) in many computer vision tasks, due to their ability to encode object viewpoint variations. Network capsules provide maps of votes that focus on entities presence in the image and their pose. Each map is the point of view of a given capsule. To compute such votes, CapsNets rely on the routing-by-agreement mechanism. This computationally costly iterative algorithm selects the most appropriate parent capsule to have nodes in a parse tree for all the active capsules but this behaviour is not ensured by the routing, hence it possibly causes vanishing weights during training. We hypothesise that an attention-like mechanism will help capsules to select the predominant regions among the maps to focus on, hence introducing a more reliable way of learning the agreement between the capsules in a single pass. We propose the Attention Agreement Capsule Networks (AA-Caps) architecture that builds upon CapsNet by introducing a self-attention layer to suppress irrelevant capsule votes thus keeping only the ones that are useful for capsules agreements on a specific entity. The generated capsule attention map is then assigned to classification layer responsible of emitting the predicted image class. The proposed AA-Caps model has been evaluated on five benchmark datasets to validate its ability in dealing with the diverse and complex data that CapsNet often fails with. The achieved results demonstrate that AA-Caps outperforms existing methods without the need of more complex architectures or model ensembles
Oriented Splits Network to Distill Background for Vehicle Re-Identification
Vehicle re-identification (re-id) is a challenging task due to the presence of high intra-class and low inter-class variations in the visual data acquired from monitoring camera networks. Unique and discriminative feature representations are needed to overcome the existence of several variations including color, illumination, orientation, background and occlusion. The orientations of the vehicles in the images make the learned models unable to learn multiple parts of the vehicle and relationship between them. The combination of global and partial features is one of the solutions to improve the discriminative learning of deep learning models. Leveraging on such solutions, we propose an Oriented Splits Network (OSN) for an end to end learning of multiple features along with global features to form a strong descriptor for vehicle re-identification. To capture the orientation variability of the vehicles, the proposed network introduces a partition of the images into several oriented stripes to obtain local descriptors for each part/region. Such a scheme is therefore exploited by a camera based feature distillation (CBD) training strategy to remove the background features. These are filtered out from oriented vehicles representations which yield to a much stronger unique representation of the vehicles. We perform experiments on two benchmark vehicle re-id datasets to verify the performance of the proposed approach which show that the proposed solution achieves better result with respect to the state of the art with margin
Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation
Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this paper we overcome such limitations by presenting the first methodology for domain adaption of such a class of trackers. To reduce the labeling effort we propose a weakly-supervised adaptation strategy, in which reinforcement learning is used to express weak supervision as a scalar application-dependent and temporally-delayed feedback. At the same time, knowledge distillation is employed to guarantee learning stability and to compress and transfer knowledge from more powerful but slower trackers. Extensive experiments on five different domains demonstrate the relevance of our methodology. Real-time speed is achieved on embedded devices and on machines without GPUs, while accuracy reaches significant results
Collaborative image and object level features for image colourisation
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user-interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image-level (context) features. However, obtaining human hints is not always feasible and CNNs alone are not able to learn entity-level semantics, unless multiple models pre-trained with supervision are considered. In this work, we propose a single network, named UCapsNet, that takes into consideration the image-level features obtained through convolutions and entity-level features captured by means of capsules. Then, by skip connections over different layers, we enforce collaboration between such the convolutional and entity factors to produce a high-quality and plausible image colourisation. We pose the problem as a classification task that can be addressed by a fully unsupervised approach, thus requires no human effort. Experimental results on three benchmark datasets show that our approach outperforms existing methods on standard quality metrics and achieves state-of-the-art performances on image colourisation. A large scale user study shows that our method is preferred over existing solutions. Code available at https://github.com/Riretta/Image_Colourisation_WiCV_2021
Lightweight Implicit Blur Kernel Estimation Network for Blind Image Super-Resolution
Blind image super-resolution (Blind-SR) is the process of leveraging a low-resolution (LR) image, with unknown degradation, to generate its high-resolution (HR) version. Most of the existing blind SR techniques use a degradation estimator network to explicitly estimate the blur kernel to guide the SR network with the supervision of ground truth (GT) kernels. To solve this issue, it is necessary to design an implicit estimator network that can extract discriminative blur kernel representation without relying on the supervision of ground-truth blur kernels. We design a lightweight approach for blind super-resolution (Blind-SR) that estimates the blur kernel and restores the HR image based on a deep convolutional neural network (CNN) and a deep super-resolution residual convolutional generative adversarial network. Since the blur kernel for blind image SR is unknown, following the image formation model of blind super-resolution problem, we firstly introduce a neural network-based model to estimate the blur kernel. This is achieved by (i) a Super Resolver that, from a low-resolution input, generates the corresponding SR image; and (ii) an Estimator Network generating the blur kernel from the input datum. The output of both models is used in a novel loss formulation. The proposed network is end-to-end trainable. The methodology proposed is substantiated by both quantitative and qualitative experiments. Results on benchmarks demonstrate that our computationally efficient approach (12x fewer parameters than the state-of-the-art models) performs favorably with respect to existing approaches and can be used on devices with limited computational capabilities
Cloth-Changing Person Re-identification with Self-Attention
The basic assumption in the standard person reidentification (ReID) problem is that the clothing of the target person IDs would remain constant over long periods. This assumption creates errors during real-world implementations. In addition, most of the methods that handle ReID use CNN-based networks and have found limited success because CNNs can exploit only local dependencies and suffer the loss of information due to the use of downsampling operations. In this paper, we focus on a more challenging, realistic scenario of long-term cloth-changing ReID (CC-ReID). We aim to learn robust and unique feature representations that are invariant to clothing changes to address the CC-ReID problem. To overcome the limitations faced by CNNs, we propose a Vision-transformer-based framework. We also propose to intuitively exploit the unique soft-biometric-based discriminative information such as gait features and pair them with ViT feature representation for allowing the model to generate long-range structural and contextual relationships that are crucial for re-identification task in the long-term scenario. To evaluate the proposed approach, we perform experiments on two recent CC-ReID datasets, PRCC and LTCC. The experimental results show that the proposed approach achieves state-of-the-art results on the CC-ReID task
Deep convolutional feature details for better knee disorder diagnoses in magnetic resonance images
Convolutional neural networks (CNNs) applied to magnetic resonance imaging (MRI) have demonstrated their ability in the automatic diagnosis of knee injuries. Despite the promising results, the currently available solutions do not take into account the particular anatomy of knee disorders. Existing works have shown that injuries are localized in small-sized knee regions near the center of MRI scans. Based on such insights, we propose MRPyrNet, a CNN architecture capable of extracting more relevant features from these regions. Our solution is composed of a Feature Pyramid Network with Pyramidal Detail Pooling, and can be plugged into any existing CNN-based diagnostic pipeline. The first module aims to enhance the CNN intermediate features to better detect the small-sized appearance of disorders, while the second one captures such kind of evidence by maintaining its detailed information. An extensive evaluation campaign is conducted to understand in-depth the potential of the proposed solution. The experimental results achieved demonstrate that the application of MRPyrNet to baseline methodologies improves their diagnostic capability, especially in the case of anterior cruciate ligament tear and meniscal tear because of MRPyrNet's ability in exploiting the relevant appearance features of such disorders. Code is available at https://github.com/matteo-dunnhofer/MRPyrNet
Where Did i See It? Object Instance Re-Identification with Attention
Existing methods dealing with object instance re-identification (OIRe-ID) look for the best visual features match of a target object within a set of frames. Due to the nature of the problem, relying only on the visual appearance of object instances is likely to provide many false matches when there are multiple objects with similar appearance or multiple instances of same object class present in the scene. We focus on a rigid scene setup and to limit the negative effects of the aforementioned cases, we propose to exploit the background information. We believe that this would be particularly helpful in a rigid environment with a lot of reoccurring identical models of objects since it would provide rich context information. We introduce an attention-based mechanism to the existing Mask R-CNN architecture such that we learn to encode the important and distinct information in the background jointly with the foreground features relevant to rigid real-world scenarios. To evaluate the proposed approach, we run compelling experiments on the ScanNet dataset. Results demonstrate that we outperform significantly compared to different baselines and SOTA methods
- …
