1,720,972 research outputs found
GCK-Maps: A Scene Unbiased Representation for Efficient Human Action Recognition
Human action recognition from visual data is a popular topic in Computer Vision, applied in a wide range of domains. State-of-the-art solutions often include deep-learning approaches based on RGB videos and pre-computed optical flow maps. Recently, 3D Gray-Code Kernels projections have been assessed as an alternative way of representing motion, being able to efficiently capture space-time structures. In this work, we investigate the use of GCK pooling maps, which we called GCK-Maps, as input for addressing Human Action Recognition with CNNs. We provide an experimental comparison with RGB and optical flow in terms of accuracy, efficiency, and scene-bias dependency. Our results show that GCK-Maps generally represent a valuable alternative to optical flow and RGB frames, with a significant reduction of the computational burden
Food Image Classification: The Benefit of In-Domain Transfer Learning
Monitoring food intake and calories may be fundamental for a healthy lifestyle and preventing nutrition-related illnesses. Recently, deep-learning approaches have been extensively exploited to provide an automatic analysis of food images. However, food image datasets have peculiar challenges, including fine granularity with a high intra-class and low inter-class variability. In this work, we focus on training strategies considering the typical scenario where data availability and computational resources are limited. Exploiting convolutional neural networks, we show that in-domain source datasets provide a better representation with respect to only using ImageNet, bringing a significant increase in test accuracy. We finally show that ensembling different CNN models further improves the learned representation
Anomaly detection in feature space for detecting changes in phytoplankton populations
Plankton organisms are fundamental components of the earth’s ecosystem. Zooplankton feeds on phytoplankton and is predated by fish and other aquatic animals, being at the core of the aquatic food chain. On the other hand, Phytoplankton has a crucial role in climate regulation, has produced almost 50% of the total oxygen in the atmosphere and it’s responsible for fixing around a quarter of the total earth’s carbon dioxide. Importantly, plankton can be regarded as a good indicator of environmental perturbations, as it can react to even slight environmental changes with corresponding modifications in morphology and behavior. At a population level, the biodiversity and the concentration of individuals of specific species may shift dramatically due to environmental changes. Thus, in this paper, we propose an anomaly detection-based framework to recognize heavy morphological changes in phytoplankton at a population level, starting from images acquired in situ. Given that an initial annotated dataset is available, we propose to build a parallel architecture training one anomaly detection algorithm for each available class on top of deep features extracted by a pre-trained Vision Transformer, further reduced in dimensionality with PCA. We later define global anomalies, corresponding to samples rejected by all the trained detectors, proposing to empirically identify a threshold based on global anomaly count over time as an indicator that can be used by field experts and institutions to investigate potential environmental perturbations. We use two publicly available datasets (WHOI22 and WHOI40) of grayscale microscopic images of phytoplankton collected with the Imaging FlowCytobot acquisition system to test the proposed approach, obtaining high performances in detecting both in-class and out-of-class samples. Finally, we build a dataset of 15 classes acquired by the WHOI across four years, showing that the proposed approach’s ability to identify anomalies is preserved when tested on images of the same classes acquired across a timespan of years
Is In-Domain Data Beneficial in Transfer Learning for Landmarks Detection in X-Ray Images?
In recent years, deep learning has emerged as a promising technique for medical image analysis. However, this application domain is likely to suffer from a limited availability of large public datasets and annotations. A common solution to these challenges in deep learning is the usage of a transfer learning framework, typically with a fine-tuning protocol, where a large-scale source dataset is used to pre-train a model, further fine-tuned on the target dataset. In this paper, we present a systematic study analyzing whether the usage of small-scale in-domain x-ray image datasets may provide any improvement for landmark detection over models pre-trained on large natural image datasets only. We focus on the multi-landmark localization task for three datasets, including chest, head, and hand x-ray images. Our results show that using in-domain source datasets brings marginal or no benefit with respect to an ImageNet out-of-domain pre-training. Our findings can provide an indication for the development of robust landmark detection systems in medical images when no large annotated dataset is available
Computer vision and deep learning meet plankton: Milestones and future directions
Planktonic organisms play a pivotal role within aquatic ecosystems, serving as the foundation of the aquatic food chain while also playing a critical role in climate regulation and the production of oxygen. In recent years, the advent of automated systems for capturing in-situ images has led to a huge influx of plankton images, making manual classification impractical. This, at the same time, has opened up opportunities for the application of machine learning and deep learning solutions. This paper undertakes an extensive analysis of the broad range of computer vision techniques and methodologies that have emerged to facilitate the automatic analysis of small- to large-scale datasets containing plankton images. By focusing on different computer vision tasks, we present findings and limitations in order to offer a comprehensive overview of the current state-of-the-art, while also pinpointing the open challenges that demand further research and attention
Ensembles of Deep Neural Networks for the Automatic Detection of Building Facade Defects From Images
Preserving the value of buildings and ensuring performance levels within acceptable parameters throughout their lifespan necessitates constant monitoring. In recent years, artificial intelligence has provided a valuable supplement to conventional inspection practices, potentially offering a supporting tool for building maintenance in smart cities. Exploiting machine learning algorithms for detecting or classifying building facade defects from acquired images has emerged as a promising automatic building monitoring strategy. However, an effective approach should be capable of accurately classifying fine-grained defects, thus requiring ad-hoc solutions to maximize predictive accuracy. For this reason, in this work, we introduced a novel and effective classification protocol, based on different ensemble strategies of complex and recent deep neural networks, namely Vision Transformers and ConvNexts, for building facade defects automatic classification. First, we validated our method on a popular benchmark dataset with different damage
classification tasks, outperforming the state-of-the-art available works. Then, we analyzed a custom dataset, named Facade Building Defects (FBD), containing building facade images labeled into four different defect classes, that we introduced in this work and released as open access. The proposed ensemble showed a test
accuracy of 90.9%, achieving an improvement of 1.6% with respect to the best single model, thus empirically proving the benefit of model ensembling for the task of automatic building facade defects classification
Top-tuning: A study on transfer learning for an efficient alternative to fine tuning for image classification with fast kernel methods
The impressive performance of deep learning architectures is associated with a massive increase in model complexity. Millions of parameters need to be tuned, with training and inference time scaling accordingly, together with energy consumption. But is massive fine-tuning always necessary? In this paper, focusing on image classification, we consider a simple transfer learning approach exploiting pre-trained convolutional features as input for a fast-to-train kernel method. We refer to this approach as top-tuning since only the kernel classifier is trained on the target dataset. In our study, we perform more than 3000 training processes focusing on 32 small to medium-sized target datasets, a typical situation where transfer learning is necessary. We show that the top-tuning approach provides comparable accuracy with respect to fine-tuning, with a training time between one and two orders of magnitude smaller. These results suggest that top-tuning is an effective alternative to fine-tuning in small/medium datasets, being especially useful when training time efficiency and computational resources saving are crucial
In Domain Transfer Learning for Prostate MRI Segmentation
In recent years, deep learning has been widely applied to different medical image analysis tasks. However, large-scale annotated datasets are typically unavailable in such a domain, potentially hindering deep learning applications. Transfer learning with a fine-tuning framework is a commonly adopted solution to this issue, exploiting large-scale natural image datasets (e.g., ImageNet) to pre-train a deep neural network, and fine-tuning the resulting model on the target dataset. A potential alternative could be gathering data coming from different specialized centers to increase the number of available training data. However, privacy issues as well as diverse acquisition modalities are important challenges to such a solution. In this paper, we investigate if small-scale datasets for in-domain fine-tuning can be beneficial over natural image datasets pre-training only. Using popular small-scale benchmark datasets of prostate MRI volumes and ImageNet pre-trained models, we show that there is always a benefit when using in-domain data to fine-tune the ImageNet pre-trained model, before fine-tuning it on the target dataset. Our results provide insights for a potential improvement of deep-learning-based prostate segmentation in MRI images, showing benefits when using data acquired in different specialized centers within a transfer learning framework
Establishing the baseline for using plankton as biosensor
Plankton is at the bottom of the food chain. Microscopic phytoplankton account for about 50% of all photosynthesis on Earth, corresponding to 50 billion tons of carbon each year, or about 125 billion tonnes of sugar[1]. Plankton is also the food for most species of fish, and therefore it represents the backbone of the aquatic environment. Thus, monitoring plankton is paramount to infer potential dangerous changes to the ecosystem. In this work we use a collection of plankton species extracted from a large dataset of images from the Woods Hole Oceanographic Institute (WHOI), to establish a basic set of morphological features for supporting the use of plankton as a biosensor. Using a perturbation detection approach, we show that it is possible to detect deviation from the average space of features for each species of plankton microorganisms, that we propose could be related to environmental threat or perturbations. Such an approach, can open the way for the development of an automatic Artificial Intelligence (AI) based system for using plankton as biosensor
- …
