Search CORE

1,720,972 research outputs found

Multi-Modal Learning for Cross-Domain Analysis of Egocentric Action and Object Recognition

Author: PLANAMENTE MIRCO
Publication venue
Publication date: 30/10/2023
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Recurrent convolutional fusion for RGB-D object recognition

Author: Caputo Barbara
Vincze Markus
Planamente Mirco
Loghmani Mohammad Reza
Publication venue
Publication date: 01/01/2019
Field of study

Providing robots with the ability to recognize objects like humans has always been one of the primary goals of robot vision. The introduction of RGB-D cameras has paved the way for a significant leap forward in this direction thanks to the rich information provided by these sensors. However, the robot vision community still lacks an effective method to synergically use the RGB and depth data to improve object recognition. In order to take a step in this direction, we introduce a novel end-to-end architecture for RGB-D object recognition called recurrent convolutional fusion (RCFusion). Our method generates compact and highly discriminative multi-modal features by combining RGB and depth information representing different levels of abstraction. Extensive experiments on two popular datasets, RGB-D Object Dataset and JHUIT-50, show that RCFusion significantly outperforms state-of-the-art approaches in both the object categorization and instance recognition tasks. In addition, experiments on the more challenging Object Clutter Indoor Dataset confirm the validity of our method in the presence of clutter and occlusion. The code is publicly available at: “ https://github.com/MRLoghmani/rcfusion .

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Toward human-robot cooperation: unsupervised domain adaptation for egocentric action recognition

Author: Goletto Gabriele
Averta Giuseppe
Planamente Mirco
Trivigno Gabriele
Caputo Barbara
Publication venue
Publication date: 01/01/2023
Field of study

With the advent of collaborative manipulators, the community is pushing the limits of human-robot interaction with novel control, planning, and task allocation strategies. For a purposeful interaction, however, the robot is also required to understand and predict the action of the human not only at a kinematic level (i.e. motion estimation), but also at an higher level of abstraction (i.e. action recognition), ideally from the human own perspective. Dealing with egocentric videos comes with the benefit that the data source already embeds an intrinsic attention mechanism, driven by the focus of the user. However, the deployment of such technology in realistic use-cases cannot ignore the large variability of background characteristics when changing environment, resulting in a domain shift in features space not learnable from labels at training time. In this paper, we discuss a method to perform Domain Adaptation with no external supervision, which we test on the EPIC-Kitchens-100 UDA Challenge in Action Recognition. More specifically, we move from our previous work on Relative Norm Alignment and extend the approach to unlabelled target data, enabling a simpler adaptation of the model to the target distribution in an unsupervised fashion. To this purpose, we enhanced our framework with multi-level adversarial alignment and with a set of losses aimed at reducing the classifier’s uncertainty on the target data. Extensive experiments demonstrate how our approach is capable to perform Multi-Source Multi-Target Domain Adaptation, thus minimising both temporal (i.e. different recording times) and environmental (i.e. different kitchens) biases

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PoliTO-IIT Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

Author: Plizzari Chiara
Alberti Emanuele
Planamente Mirco
Caputo Barbara
Publication venue
Publication date: 01/01/2021
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PoliTO-IIT-CINI Submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition

Author: Goletto Gabriele
Averta Giuseppe
Planamente Mirco
Trivigno Gabriele
Caputo Barbara
Publication venue
Publication date: 01/01/2022
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Domain generalization through audio-visual relative norm alignment in first person action recognition

Author: Caputo Barbara
Plizzari Chiara
Planamente Mirco
Alberti Emanuele
Publication venue
Publication date: 01/01/2022
Field of study

First person action recognition is becoming an increasingly researched area thanks to the rising popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic "environmental bias". This strongly affects the ability to generalize to unseen scenarios, limiting the application of current methods to real settings where labeled data are not available during training. In this work, we introduce the first domain generalization approach for egocentric activity recognition, by proposing a new audiovisual loss, called Relative Norm Alignment loss. It rebalances the contributions from the two modalities during training, over different domains, by aligning their feature norm representations. Our approach leads to strong results in domain generalization on both EPIC-Kitchens-55 and EPIC-Kitchens-100, as demonstrated by extensive experiments, and can be extended to work also on domain adaptation settings with competitive results

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

DA4Event: towards bridging the Sim-to-Real Gap for Event Cameras using Domain Adaptation

Author: Cannici Marco
Plizzari Chiara
Strada Francesco
Matteucci Matteo
Ciccone Marco
Planamente Mirco
Planamente Mirco
Plizzari Chiara
Bottino Andrea
Caputo Barbara
Publication venue
Publication date: 01/01/2021
Field of study

Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level intensity changes in the form of ``events". The innovative way they acquire data presents several advantages over standard devices, especially in poor lighting and high-speed motion conditions. However, the novelty of these sensors results in the lack of a large amount of training data capable of fully unlocking their potential. The most common approach implemented by researchers to address this issue is to leverage extit{simulated event data}. Yet, this approach comes with an open research question: extit{how well simulated data generalize to real data?} To answer this, we propose to exploit, in the event-based context, recent Domain Adaptation (DA) advances in traditional computer vision, showing that DA techniques applied to event data help reduce the extit{sim-to-real} gap. To this purpose, we propose a novel architecture, which we call {Multi-View DA4E} ({MV-DA4E}), that better exploits the peculiarities of frame-based event representations while also promoting domain invariant characteristics in features. Through extensive experiments, we prove the effectiveness of DA methods and {MV-DA4E} on N-Caltech101. Moreover, we validate their soundness in a real-world scenario through a cross-domain analysis on the popular RGB-D Object Dataset (ROD), which we extended to the event modality (RGB-E)

Archivio istituzionale della ricerca - Politecnico di Milano

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Leveraging over depth in egocentric activity recognition

Author: Planamente Mirco
Russo Paolo
Mirco Planamente
Paolo Russo
Caputo Barbara
Barbara Caputo
Publication venue
Publication date: 01/01/2019
Field of study

Activity recognition from first person videos is a growing research area. The increasing diffusion of egocentric sensors in various devices makes it timely to develop approaches able to recognize fine grained first person actions like picking up, putting down, pouring and so forth. While most of previous work focused on RGB data, some authors pointed out the importance of leveraging over depth information in this domain. In this paper we follow this trend and we propose the first deep architecture that uses depth maps as an attention mechanism for first person activity recognition. Specifically, we blend together the RGB and depth data, so to obtain an enriched input for the network. This blending puts more or less emphasis on different parts of the image based on their distance from the observer, hence acting as an attention mechanism. To further strengthen the proposed activity recognition protocol, we opt for a self labeling approach. This, combined with a Conv-LSTM block for extracting temporal information from the various frames, leads to the new state of the art on two publicly available benchmark databases. An ablation study completes our experimental findings, confirming the effectiveness of our approac

ZENODO

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Bringing Online Egocentric Action Recognition into the wild

Author: Goletto Gabriele
Averta Giuseppe
Planamente Mirco
Giuseppe Averta
Gabriele Goletto
Mirco Planamente
Caputo Barbara
Barbara Caputo
Publication venue
Publication date: 01/01/2023
Field of study

To enable a safe and effective human-robot cooperation, it is crucial to develop models for the identification of human activities. Egocentric vision seems to be a viable solution to solve this problem, and therefore many works provide deep learning solutions to infer human actions from first person videos. However, although very promising, most of these do not consider the major challenges that comes with a realistic deployment, such as the portability of the model, the need for real-time inference, and the robustness with respect to the novel domains (i.e., new spaces, users, tasks). With this paper, we set the boundaries that egocentric vision models should consider for realistic applications, defining a novel setting of egocentric action recognition in the wild, which encourages researchers to develop novel, applications-aware solutions. We also present a new model-agnostic technique that enables the rapid repurposing of existing architectures in this new context, demonstrating the feasibility to deploy a model on a tiny device (Jetson Nano) and to perform the task directly on the edge with very low energy consumption (2.4W on average at 50 fps). The code is publicly available at: https://github.com/EgocentricVision/EgoWild.Comment: Accepted to RA-L, for associated video, see https://www.youtube.com/watch?v=7rtynmoYnuw&t=9

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Egocentric zone-aware action recognition across environments

Author: Goletto Gabriele
Averta Giuseppe
Peirone Simone Alberto
Planamente Mirco
Bottino Andrea
Caputo Barbara
Publication venue
Publication date: 01/01/2025
Field of study

Human activities exhibit a strong correlation between actions and the places where these are performed, such as washing something at a sink. More specifically, in daily living environments we may identify particular locations, hereinafter named activity-centric zones, which may afford a set of homogeneous actions. Their knowledge can serve as a prior to favor vision models to recognize human activities. However, the appearance of these zones is scene-specific, limiting the transferability of this prior information to unfamiliar areas and domains. This problem is particularly relevant in egocentric vision, where the environment takes up most of the image, making it even more difficult to separate the action from the context. In this paper, we discuss the importance of decoupling the domain-specific appearance of activity-centric zones from their universal, domain-agnostic representations, and show how the latter can improve the cross-domain transferability of Egocentric Action Recognition (EAR) models. We validate our solution on the Epic-Kitchens-100 and Argo1M datasets

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)