1,720,966 research outputs found
On modelling humans: forecasting, synthesis and human-X interaction
Modeling human motion and interaction is fundamental to advancing technologies in robotics, virtual reality, autonomous systems, and behavioral analysis. The ability to understand and predict human movements and social dynamics opens up new possibilities for machines to interact with people in intuitive and effective ways, contributing to areas such as human-robot collaboration, motion synthesis, and anomaly detection. Research into human dynamics plays a pivotal role in developing systems that can anticipate future actions, generate realistic behaviors, and adapt to unpredictable environments.
This work addresses these challenges through a sequence of studies, starting with human pose forecasting using graph convolutional networks (STS-GCN), which models the intricate space-time correlations of body joints to predict future poses. It then extends this framework to collaborative environments, enabling robots to predict human movements in industrial settings (SES-GCN). The exploration continues with the prediction of motion between interacting people (2BODY) and expands further to scene-aware human trajectory and pose forecasting (STAG), focusing on the interaction between humans and their environments.
The study of human dynamics also delves into specific applications, such as detecting anomalies in human behavior (COSKAD) and forecasting player trajectories in sports, where role-based interactions within teams play a key role (NBA). The latter part of this research builds upon these foundational insights, proposing methods for egocentric 3D pose estimation in video sequences (SEE-ME) and tackling the generation of human motion sequences from textual descriptions with variable lengths (LADiff). Finally, my research addresses real-time human-robot collaboration, where robots learn to follow humans and adapt to social dynamics to avoid collisions in shared spaces (SDA).
Altogether, this body of research highlights the importance of human motion modeling and social interaction, paving the way for intelligent systems capable of seamlessly integrating into human environments and enhancing their ability to interact, collaborate, and coexist with people
Space-time-separable graph convolutional network for pose forecasting
Human pose forecasting is a complex structured-data sequence-modelling task, which has received increasing attention, also due to numerous potential applications. Research has mainly addressed the temporal dimension as time series and the interaction of human body joints with a kinematic tree or by a graph. This has decoupled the two aspects and leveraged progress from the relevant fields, but it has also limited the understanding of the complex structural joint spatio-temporal dynamics of the human pose. Here we propose a novel Space-Time-Separable Graph Convolutional Network (STS-GCN) for pose forecasting. For the first time, STS-GCN models the human pose dynamics only with a graph convolutional network (GCN), including the temporal evolution and the spatial joint interaction within a single-graph framework, which allows the cross-talk of motion and spatial correlations. Concurrently, STS-GCN is the first space-time-separable GCN: the space-time graph connectivity is factored into space and time affinity matrices, which bottlenecks the space-time cross-talk, while enabling full joint-joint and time-time correlations. Both affinity matrices are learnt end-to-end, which results in connections substantially deviating from the standard kinematic tree and the linear-time time series. In experimental evaluation on three complex, recent and large-scale benchmarks, Human3.6M [Ionescu et al. TPAMI'14], AMASS [Mahmood et al. ICCV'19] and 3DPW [Von Marcard et al. ECCV'18], STS-GCN outperforms the state-of-the-art, surpassing the current best technique [Mao et al. ECCV'20] by over 32% in average at the most difficult long-term predictions, while only requiring 1.7% of its parameters. We explain the results qualitatively and illustrate the graph interactions by the factored joint-joint and time-time learnt graph connections
Contracting skeletal kinematics for human-related video anomaly detection
Detecting the anomaly of human behavior is paramount to timely recognizing endangering situations, such as street fights or elderly falls. However, anomaly detection is complex since anomalous events are rare and because it is an open set recognition task, i.e., what is anomalous at inference has not been observed at training. We propose COSKAD, a novel model that encodes skeletal human motion by a graph convolutional network and learns to COntract SKeletal kinematic embeddings onto a latent hypersphere of minimum volume for Video Anomaly Detection. We propose three latent spaces: the commonly-adopted Euclidean and the novel spherical and hyperbolic. All variants outperform the state-of-the-art on the most recent UBnormal dataset, for which we contribute a human-related version with annotated skeletons. COSKAD sets a new state-of-the-art on the human-related versions of ShanghaiTech Campus and CUHK Avenue, , with performance comparable to video-based methods. Source code and dataset will be released upon acceptance
Best Practices for 2-Body Pose Forecasting
The task of collaborative human pose forecasting stands for predicting the future poses of multiple interacting people, given those in previous frames. Predicting two people in interaction, instead of each separately, promises better performance, due to their body-body motion correlations. But the task has remained so far primarily unexplored.In this paper, we review the progress in human pose forecasting and provide an in-depth assessment of the single-person practices that perform best for 2-body collaborative motion forecasting. Our study confirms the positive impact of frequency input representations, space-time separable and fully-learnable interaction adjacencies for the encoding GCN and FC decoding. Other single-person practices do not transfer to 2-body, so the proposed best ones do not include hierarchical body modeling or attention-based interaction encoding.We further contribute a novel initialization procedure for the 2-body spatial interaction parameters of the encoder, which benefits performance and stability. Altogether, our proposed 2-body pose forecasting best practices yield a performance improvement of 21.9% over the state-of-the-art on the most recent ExPI dataset, whereby the novel initialization accounts for 3.5%. See our project page at https://www.pinlab.org/bestpractices2bod
Mesoscale precipitation nowcasting from weather radar data using space-time-separable graph convolutional networks
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
About latent roles in forecasting players in team sports
Forecasting players in sports has grown in popularity due to the potential
for a tactical advantage and the applicability of such research to multi-agent
interaction systems. Team sports contain a significant social component that
influences interactions between teammates and opponents. However, it still
needs to be fully exploited. In this work, we hypothesize that each participant
has a specific function in each action and that role-based interaction is
critical for predicting players' future moves. We create RolFor, a novel
end-to-end model for Role-based Forecasting. RolFor uses a new module we
developed called Ordering Neural Networks (OrderNN) to permute the order of the
players such that each player is assigned to a latent role. The latent role is
then modeled with a RoleGCN. Thanks to its graph representation, it provides a
fully learnable adjacency matrix that captures the relationships between roles
and is subsequently used to forecast the players' future trajectories.
Extensive experiments on a challenging NBA basketball dataset back up the
importance of roles and justify our goal of modeling them using optimizable
models. When an oracle provides roles, the proposed RolFor compares favorably
to the current state-of-the-art (it ranks first in terms of ADE and second in
terms of FDE errors). However, training the end-to-end RolFor incurs the issues
of differentiability of permutation methods, which we experimentally review.
Finally, this work restates differentiable ranking as a difficult open problem
and its great potential in conjunction with graph-based interaction models.
Project is available at: https://www.pinlab.org/aboutlatentrole
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
- …
