1,720,956 research outputs found

    Predictive perception for detecting human motion anomalies and procedural mistakes

    Full text link
    Computer Vision emerges as a cornerstone field within Artificial intelligence, enabling digital systems to sense the world through images, mirroring the human ability to see and interpret their surroundings. This ability is paramount, as it allows autonomous systems to interact with humans, promising to reliably extend the applications of AI to productive systems. For example, in Human-Robot collaboration (HRC), accurate vision-based techniques can prevent accidents by providing the cobot with the ability to interpret and swiftly respond to human worker actions. Similarly, in smart manufacturing, Computer Vision methods allow for the timely detection of errors and anomalies in production lines, enhancing quality control and safety, or in video surveillance, where they monitor environments for security threats, promptly identifying unusual behaviors or hazardous situations before they exacerbate. However, the deployment of Computer Vision technologies in real-world scenarios is hampered by significant challenges. % realizing the full potential of Computer Vision in practical settings is constrained by critical issues, including These include the requirement for real-time responsiveness, the ability to function reliably in diverse and unpredictable environments, and the development of comprehensive metrics for assessing detection accuracy and system reliability. This thesis explores machine perception's role in enhancing safety and productive integrity across several domains. By leveraging cutting-edge methodologies such as Denoising Diffusion Probabilistic Models and Large Language Models in novel domains, we propose innovative solutions for applications that require a fine understanding of human behaviors and environments to promote effectiveness, safety, and efficiency. First, we delve into the HRC domain. % We exploit human pose data to develop a method for preventing dangerous collisions in HRC. Aiming to improve the current methods' efficiency, we devise a lightweight Separable-Sparse Graph Convolutional model that we dub \emph{SeS-GCN}. SeS-GCN bottlenecks the interaction of the GCN's spatial, temporal, and channel-wise dimensions and further learns sparse adjacency matrices by a teacher-student framework. These modeling choices lower the model's memory footprint, providing a practical solution that proves effective both in Human-Pose Forecasting and Collision Avoidance. Moreover, the Cobots and Humans in Industrial COllaboration (CHICO) dataset is proposed to foster research in this field. For the first time, CHICO encompasses 3D-synchronized views and recorded poses of humans and cobots while collaborating in a real industrial scenario, representing a precious resource for advancing safe human-robot collaboration. Safety often coincides with promptly detecting and responding to mistakes or anomalies, which risk otherwise aggravating, potentially producing dangerous collisions or productive inefficiencies. Thus, following a review of the latest advancements in Video Anomaly Detection methodologies, this thesis builds on the established one-class classification framework, proposing two techniques for human-related Anomaly Detection. The first study investigates adopting non-Euclidean latent spaces to set the one-class-classification's metric objective. We leverage the unique properties of the hyperbolic and spherical metric manifolds for improving human-related anomaly detection. The second proposal introduces a Motion Conditioned Diffusion-based approach for Anomaly Detection (\emph{MoCoDAD}). Indeed, for the first time, MoCoDAD introduces a method for video anomaly detection that exploits cutting-edge diffusive models for spotting anomalies in motion sequences. We review the common reconstruction-based technique, coupling it with the generative ability of diffusion probabilistic models, extending the state-of-the-art in human-related Video Anomaly Detection, and providing relevant insights that serve as the foundation for online mistake detection. Next, this thesis deals with error anticipation in procedural activities. Acknowledging the absence of a proper benchmark for this task, we apply the insights from the one-class-classification paradigm and Video Anomaly Detection and propose two novel datasets, metrics, and baseline methods for detecting errors in industrial procedural videos. Moreover, we present an innovative technique that exploits the emerging reasoning capabilities of Large Language Models to detect mistakes in procedural video sequences. This results in a novel multimodal approach that leverages an action recognition module to classify the steps of Egocentric procedural videos and couple it with a Language model to analyze the obtained procedural transcripts and detect mistakes. This work offers empirical validation through extensive testing on established and newly introduced datasets; bridging the gap between Video Anomaly Detection and Procedural Mistake Detection, it presents a robust foundation for future research and practical applications. We advance the understanding of procedural mistakes as open-set phenomena and emphasize the crucial need for online detection mechanisms, thus enhancing safety and operational efficiency in these environments. These findings lay the foundation for future research, shaping the development of safer, more adaptive industrial automatic systems

    Contracting skeletal kinematics for human-related video anomaly detection

    No full text
    Detecting the anomaly of human behavior is paramount to timely recognizing endangering situations, such as street fights or elderly falls. However, anomaly detection is complex since anomalous events are rare and because it is an open set recognition task, i.e., what is anomalous at inference has not been observed at training. We propose COSKAD, a novel model that encodes skeletal human motion by a graph convolutional network and learns to COntract SKeletal kinematic embeddings onto a latent hypersphere of minimum volume for Video Anomaly Detection. We propose three latent spaces: the commonly-adopted Euclidean and the novel spherical and hyperbolic. All variants outperform the state-of-the-art on the most recent UBnormal dataset, for which we contribute a human-related version with annotated skeletons. COSKAD sets a new state-of-the-art on the human-related versions of ShanghaiTech Campus and CUHK Avenue, , with performance comparable to video-based methods. Source code and dataset will be released upon acceptance

    Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

    No full text
    Anomalies are rare and anomaly detection is often therefore framed as One-Class Classification (OCC), ie trained solely on normalcy. Leading OCC techniques constrain the latent representations of normal motions to limited volumes and detect as abnormal anything outside, which accounts satisfactorily for the openset'ness of anomalies. But normalcy shares the same openset'ness property, since humans can perform the same action in several ways, which the leading techniques neglect. We propose a novel generative model for video anomaly detection (VAD), which assumes that both normality and abnormality are multimodal. We consider skeletal representations and leverage state-of-the-art diffusion probabilistic models to generate multimodal future human poses. We contribute a novel conditioning on the past motion of people and exploit the improved mode coverage capabilities of diffusion processes to generate different-but-plausible future motions. Upon the statistical aggregation of future modes, an anomaly is detected when the generated set of motions is not pertinent to the actual future. We validate our model on 4 established benchmarks: UBnormal, HR-UBnormal, HR-STC, and HR-Avenue, with extensive experiments surpassing state-of-the-art results

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Pose Forecasting in Industrial Human-Robot Collaboration

    No full text
    Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it only uses 1.72% of the parameters and it is ∼4 times faster, while still performing comparably in forecasting accuracy on Human3.6M at 1 s in the future, which enables cobots to be aware of human operators. As a second contribution, we present a new benchmark of Cobots and Humans in Industrial COllaboration (CHICO ). CHICO includes multi-view videos, 3D poses and trajectories of 20 human operators and cobots, engaging in 7 realistic industrial actions. Additionally, it reports 226 genuine collisions, taking place during the human-cobot interaction. We test SeS-GCN on CHICO for two important perception tasks in robotics: human pose forecasting, where it reaches an average error of 85.3 mm (MPJPE) at 1 sec in the future with a run time of 2.3 ms, and collision detection, by comparing the forecasted human motion with the known cobot motion, obtaining an F1-score of 0.64

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado
    corecore