1,721,067 research outputs found

    Self Attention based multi branch Network for Person Re-Identification

    Full text link
    Recent progress in the field of person re-identification have shown promising improvement by designing neural networks to learn most discriminative features representations. Some efforts utilize similar parts from different locations to learn better representation with the help of soft attention, while others search for part based learning methods to enhance consecutive regions relationships in the learned features. However, only few attempts have been made to learn non-local similar parts directly for the person re-identification problem. In this paper, we propose a novel self attention based multi branch(classifier) network to directly model long-range dependencies in the learned features. Multi classifiers assist the model to learn discriminative features while self attention module encourages the learning to be independent of the feature map locations. Spectral normalization is applied in the whole network to improve the training dynamics and for the better convergence of the model. Experimental results on two benchmark datasets have shown the robustness of the proposed work

    Visual tracking in camera-switching outdoor sport videos: Benchmark and baselines for skiing

    No full text
    Skiing is a globally popular winter sport discipline with a rich history of competitive events. This domain offers ample opportunities for the application of computer vision to enhance the understanding of athletes’ performances. However, this potential has remained relatively untapped in comparison to other sports, primarily due to the limited availability of dedicated research studies and datasets. The present paper takes a significant stride towards bridging these gaps. It conducts a comprehensive examination of skier appearance tracking in videos capturing their entire performance—an essential step for more advanced performance analyses. To implement this investigation, we introduce SkiTB, the largest and most annotated dataset tailored for computer vision applications in skiing. We subject a range of visual object tracking algorithms to rigorous testing, including both well-established methodologies and a novel skier-specific baseline algorithm. The results yield valuable insights into the suitability of various tracking techniques for vision-based skiing analysis and into the generalization of state-of-the-art algorithms to complex target behaviors and conditions set by winter outdoor environments. To foster further development, we make SkiTB, the associated code, and the obtained results accessible through https://machinelearning.uniud.it/datasets/skitb

    Real image super-resolution using GAN through modeling of LR and HR process.

    Full text link
    The current existing deep image super-resolution methods usually assume that a Low Resolution (LR) image is bicubicly downscaled of a High Resolution (HR) image. However, such an ideal bicubic downsampling process is different from the real LR degradations, which usually come from complicated combinations of different degradation processes, such as camera blur, sensor noise, sharpening artifacts, JPEG compression, and further image editing, and several times image transmission over the internet and unpredictable noises. It leads to the highly ill-posed nature of the inverse upscaling problem. To address these issues, we propose a GAN-based SR approach with learnable adaptive sinusoidal nonlinearities incorporated in LR and SR models by directly learn degradation distributions and then synthesize paired LR/HR training data to train the generalized SR model to real image degradations. We demonstrate the effectiveness of our proposed approach in quantitative and qualitative experiments

    Collaborative image and object level features for image colourisation

    No full text
    Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user-interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image-level (context) features. However, obtaining human hints is not always feasible and CNNs alone are not able to learn entity-level semantics, unless multiple models pre-trained with supervision are considered. In this work, we propose a single network, named UCapsNet, that takes into consideration the image-level features obtained through convolutions and entity-level features captured by means of capsules. Then, by skip connections over different layers, we enforce collaboration between such the convolutional and entity factors to produce a high-quality and plausible image colourisation. We pose the problem as a classification task that can be addressed by a fully unsupervised approach, thus requires no human effort. Experimental results on three benchmark datasets show that our approach outperforms existing methods on standard quality metrics and achieves state-of-the-art performances on image colourisation. A large scale user study shows that our method is preferred over existing solutions. Code available at https://github.com/Riretta/Image_Colourisation_WiCV_2021

    Lightweight Implicit Blur Kernel Estimation Network for Blind Image Super-Resolution

    Full text link
    Blind image super-resolution (Blind-SR) is the process of leveraging a low-resolution (LR) image, with unknown degradation, to generate its high-resolution (HR) version. Most of the existing blind SR techniques use a degradation estimator network to explicitly estimate the blur kernel to guide the SR network with the supervision of ground truth (GT) kernels. To solve this issue, it is necessary to design an implicit estimator network that can extract discriminative blur kernel representation without relying on the supervision of ground-truth blur kernels. We design a lightweight approach for blind super-resolution (Blind-SR) that estimates the blur kernel and restores the HR image based on a deep convolutional neural network (CNN) and a deep super-resolution residual convolutional generative adversarial network. Since the blur kernel for blind image SR is unknown, following the image formation model of blind super-resolution problem, we firstly introduce a neural network-based model to estimate the blur kernel. This is achieved by (i) a Super Resolver that, from a low-resolution input, generates the corresponding SR image; and (ii) an Estimator Network generating the blur kernel from the input datum. The output of both models is used in a novel loss formulation. The proposed network is end-to-end trainable. The methodology proposed is substantiated by both quantitative and qualitative experiments. Results on benchmarks demonstrate that our computationally efficient approach (12x fewer parameters than the state-of-the-art models) performs favorably with respect to existing approaches and can be used on devices with limited computational capabilities

    An ensemble feature method for food classification

    No full text
    In the last years, several works on automatic image-based food recognition have been proposed, often based on texture feature extraction and classification. However, there is still a lack of proper comparisons to evaluate which approaches are better suited for this specific task. In this work, we adopt a Random Forest classifier to measure the performances of different texture filter banks and feature encoding techniques on three different food image datasets. Comparative results are given to show the performance of each considered approach, as well as to compare the proposed Random Forest classifiers with other feature-based state-of-the-art solutions
    corecore