1,720,979 research outputs found
Visual tracking in camera-switching outdoor sport videos: Benchmark and baselines for skiing
Skiing is a globally popular winter sport discipline with a rich history of competitive events. This domain offers ample opportunities for the application of computer vision to enhance the understanding of athletes’ performances. However, this potential has remained relatively untapped in comparison to other sports, primarily due to the limited availability of dedicated research studies and datasets. The present paper takes a significant stride towards bridging these gaps. It conducts a comprehensive examination of skier appearance tracking in videos capturing their entire performance—an essential step for more advanced performance analyses. To implement this investigation, we introduce SkiTB, the largest and most annotated dataset tailored for computer vision applications in skiing. We subject a range of visual object tracking algorithms to rigorous testing, including both well-established methodologies and a novel skier-specific baseline algorithm. The results yield valuable insights into the suitability of various tracking techniques for vision-based skiing analysis and into the generalization of state-of-the-art algorithms to complex target behaviors and conditions set by winter outdoor environments. To foster further development, we make SkiTB, the associated code, and the obtained results accessible through https://machinelearning.uniud.it/datasets/skitb
CoCoLoT: Combining Complementary Trackers in Long-Term Visual Tracking
How to combine the complementary capabilities of an ensemble of different algorithms has been of central interest in visual object tracking. A significant progress on such a problem has been achieved, but considering short-term tracking scenarios. Instead, long-term tracking settings have been substantially ignored by the solutions. In this paper, we explicitly consider long-term tracking scenarios and provide a framework, named CoCoLoT, that combines the characteristics of complementary visual trackers to achieve enhanced long-term tracking performance. CoCoLoT perceives whether the trackers are following the target object through an online learned deep verification model, and accordingly activates a decision policy which selects the best performing tracker as well as it corrects the performance of the failing one. The proposed methodology is evaluated extensively and the comparison with several other solutions reveals that it competes favourably with the state-of-the-art on the most popular long-term visual tracking benchmarks
LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones
Research in efficient vision backbones is evolving into models that are a mixture of convolutions and transformer blocks. A smart combination of both, architecture-wise and component-wise is mandatory to excel in the speed-accuracy trade-off. Most publications focus on maximizing accuracy and utilize MACs (multiply accumulate operations) as an efficiency metric. The latter however often do not measure accurately how fast a model actually is due to factors like memory access cost and degree of parallelism. We analyzed common modules and architectural design choices for backbones not in terms of MACs, but rather in actual throughput and latency, as the combination of the latter two is a better representation of the efficiency of models in real applications. We applied the conclusions taken from that analysis to create a recipe for increasing hardware-efficiency in macro design. Additionally we introduce a simple slimmed-down version of Multi-Head Self-Attention, that aligns with our analysis. We combine both macro and micro design to create a new family of hardware-efficient backbone networks called Low-Former. Low Former achieves a remarkable speedup in terms of throughput and latency, while achieving similar or better accuracy than current state-of-the-art efficient backbones. In order to prove the generalizability of our hardware-efficient design, we evaluate our method on GPU, mobile GPU and ARM CPU. We further show that the downstream tasks object detection and semantic segmentation profit from our hardware-efficient architecture. Code and models are available at https://github.com/altair199797/LowFormer
Deep convolutional feature details for better knee disorder diagnoses in magnetic resonance images
Convolutional neural networks (CNNs) applied to magnetic resonance imaging (MRI) have demonstrated their ability in the automatic diagnosis of knee injuries. Despite the promising results, the currently available solutions do not take into account the particular anatomy of knee disorders. Existing works have shown that injuries are localized in small-sized knee regions near the center of MRI scans. Based on such insights, we propose MRPyrNet, a CNN architecture capable of extracting more relevant features from these regions. Our solution is composed of a Feature Pyramid Network with Pyramidal Detail Pooling, and can be plugged into any existing CNN-based diagnostic pipeline. The first module aims to enhance the CNN intermediate features to better detect the small-sized appearance of disorders, while the second one captures such kind of evidence by maintaining its detailed information. An extensive evaluation campaign is conducted to understand in-depth the potential of the proposed solution. The experimental results achieved demonstrate that the application of MRPyrNet to baseline methodologies improves their diagnostic capability, especially in the case of anterior cruciate ligament tear and meniscal tear because of MRPyrNet's ability in exploiting the relevant appearance features of such disorders. Code is available at https://github.com/matteo-dunnhofer/MRPyrNet
Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation
Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this paper we overcome such limitations by presenting the first methodology for domain adaption of such a class of trackers. To reduce the labeling effort we propose a weakly-supervised adaptation strategy, in which reinforcement learning is used to express weak supervision as a scalar application-dependent and temporally-delayed feedback. At the same time, knowledge distillation is employed to guarantee learning stability and to compress and transfer knowledge from more powerful but slower trackers. Extensive experiments on five different domains demonstrate the relevance of our methodology. Real-time speed is achieved on embedded devices and on machines without GPUs, while accuracy reaches significant results
Visualizing Skiers' Trajectories in Monocular Videos
Trajectories are fundamental to winning in alpine skiing. Tools enabling the analysis of such curves can enhance the training activity and enrich broadcasting content. In this paper, we propose SkiTraVis, an algorithm to visualize the sequence of points traversed by a skier during its performance. SkiTraVis works on monocular videos and constitutes a pipeline of a visual tracker to model the skier's motion and of a frame correspondence module to estimate the camera's motion. The separation of the two motions enables the visualization of the trajectory according to the moving camera's perspective. We performed experiments on videos of real-world professional competitions to quantify the visualization error, the computational efficiency, as well as the applicability. Overall, the results achieved demonstrate the potential of our solution for broadcasting media enhancement and coach assistance
Young drivers’ pedestrian anti-collision braking operation data modelling for ADAS development
Smart cities and smart mobility come from intelligent systems designed by humans. Artificial Intelligence (AI) is contributing significantly to the development of these systems, and the automotive industry is the most prominent example of "smart" technology entering the market: there are Advanced Driver Assistance System (ADAS), Radar/LIDAR detection units and camera-based Computer Vision systems that can assess driving conditions. Actually, these technologies have become consumer goods and services in mass-produced vehicles to provide human drivers with tools for a more comfortable and safer driving. Nevertheless, they need to be further improved for progress in the transition to fully automated driving or simply to increase vehicle automation levels. To this end, it becomes imperative to accurately predict driver’s decisions, model human driving behaviors, and introduce more accurate risk assessment metrics. This paper presents a system that can learn to predict the future braking behavior of a driver in a typically urban vehicle-pedestrian conflict, i.e., when a pedestrian enters a zebra crossing from the curb and a vehicle is approaching. The algorithm proposes a sequential prediction of relevant operational indicators that continuously describe the encounter process. A car driving simulator was used to collect reliable data on braking behaviours of a cohort of 68 licensed university students, who faced the same urban scenario. The vehicle speed, steering wheel angle, and pedal activity were recorded as the participants approached the crosswalk, along with the azimuth angle of the pedestrian and the relative longitudinal distance between the vehicle and the pedestrian: the proposed system employs the vehicle information as human driving decisions and the pedestrian information as explanatory variables of the environmental state. In fact, the pedestrian’s polar coordinates are usually calculated by an on-board millimeter-wave radar which is typically used to perceive the environment around a vehicle. All mentioned information is represented in the form of time series data and is used to train a recurrent neural network in a supervised machine learning process. The main purpose of this research is to define a system of behavioral profiles in non-collision conditions that could be used for enhancing the existing intelligent driving systems, e.g., to reduce the number of warnings when the driver is not on a collision course with a pedestrian. Preliminary experiments reveal the feasibility of the proposed system
Is First Person Vision Challenging for Object Tracking?
Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. However, despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art trackers in this domain is still missing. In this paper, we fill the gap by presenting the first systematic study of object tracking in FPV. Our study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure. This is achieved through TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem so that tracking could benefit FPV tasks
Visual Object Tracking in First Person Vision
The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used “off-the-shelf” or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated
Tracking Skiers from the Top to the Bottom
Skiing is a popular winter sport discipline with a long history of competitive events. In this domain, computer vision has the potential to enhance the understanding of athletes' performance, but its application lags behind other sports due to limited studies and datasets. This paper makes a step forward in filling such gaps. A thorough investigation is performed on the task of skier tracking in a video capturing his/her complete performance. Obtaining continuous and accurate skier localization is preemptive for further higher-level performance analyses. To enable the study, the largest and most annotated dataset for computer vision in skiing, SkiTB, is introduced. Several visual object tracking algorithms, including both established methodologies and a newly introduced skier-optimized baseline algorithm, are tested using the dataset. The results provide valuable insights into the applicability of different tracking methods for vision-based skiing analysis. SkiTB, code, and results are available at https://machinelearning.uniud.it/datasets/skitb
- …
