1,721,187 research outputs found

    RGB-W Dataset

    No full text
    Abstract Inspired by the recent success of RGB-D cameras, we propose the enrichment of RGB data with an additional quasi-free modality, namely, the wireless signal emitted by individuals' cell phones, referred to as RGB-W. The received signal strength acts as a rough proxy for depth and a reliable cue on a person's identity. Although the measured signals are noisy, we demonstrate that the combination of visual and wireless data significantly improves the localization accuracy. We introduce a novel image-driven representation of wireless data which embeds all received signals onto a single image. We then evaluate the ability of this additional data to (i) locate persons within a sparsity-driven framework and to (ii) track individuals with a new confidence measure on the data association problem. Our solution outperforms existing localization methods. It can be applied to the millions of currently installed RGB cameras to better analyze human behavior and offer the next generation of high-accuracy location-based services. Conference Paper PDF: http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Alahi_RGB-W_When_Vision_ICCV_2015_paper.pdf Metadata +----------------+-----------------+-----------+-----------+--------------+----------+ | Sequence Name | Length (mm:ss) | # Frames | # People | # W Devices | Download | +----------------+-----------------+-----------+-----------+--------------+----------+ | conference-1 | 01:53 | 1,697 | 5 | 5 | 116 MiB | | conference-2 | 05:18 | 4,782 | 12 | 12 | 379 MiB | | conference-3 | 23:31 | 21,165 | 1 | 2 | 1.3 GiB | | conference-4 | 06:27 | 4,832 | 1 | 2 | 357 MiB | | conference-5 | 06:03 | 4,525 | 2 | 2 | 290 MiB | | patio-1 | 07:22 | 6,636 | 4 | 4 | 474 MiB | | patio-2 | 04:36 | 4,144 | 2 | 2 | 258 MiB | | Full Dataset | 55:10 | 47,781 | -- | -- | 3.2 GiB | +----------------+-----------------+-----------+-----------+--------------+----------+ Citation If you would like to cite our work, please use the following. Alahi A, Haque A, Fei-Fei L. (2015). RGB-W: When Vision Meets Wireless. International Conference on Computer Vision (ICCV). Santiago, Chile. IEEE. @inproceedings{alahi2015rgb, title={RGB-W: When vision meets wireless}, author={Alahi, Alexandre and Haque, Albert and Fei-Fei, Li}, booktitle={International Conference on Computer Vision}, year={2015}

    When Your AI Becomes a Target: AI Security Incidents and Best Practices

    No full text
    In contrast to vast academic efforts to study AI security, few real-world reports of AI security incidents exist. Released incidents prevent a thorough investigation of the attackers' motives, as crucial information about the company and AI application is missing. As a consequence, it often remains unknown how to avoid incidents. We tackle this gap and combine previous reports with freshly collected incidents to a small database of 32 AI security incidents. We analyze the attackers' target and goal, influencing factors, causes, and mitigations. Many incidents stem from non-compliance with best practices in security and privacy-enhancing technologies. In the case of direct AI attacks, access control may provide some mitigation, but there is little scientific work on best practices. Our paper is thus a call for action to address these gaps.VIT

    CrossFeat: Semantic Cross-modal Attention for Pedestrian Behavior Forecasting

    Full text link
    Forecasting pedestrian behaviors is essential for autonomous vehicles to ensure safety in urban scenarios. Previous works addressed this problem based on motion alone, omitting several additional behavioral cues helping understanding pedestrians' true intentions. We address the problem of forecasting pedestrian actions through joint reasoning about pedestrians' past behaviors and their surrounding environments. For this, we propose a Transformer-based feature fusion approach, where multi-modal inputs about pedestrians and environments are all mapped into a common space, then jointly processed through self and cross-attention mechanisms to take context into account. We also use a semantic segmentation map of the current input frame, rather than the full temporal visual stream, to further focus on semantic reasoning. We experimentally validate and analyze our approach on two benchmarks about pedestrian crossing and Stop&Go motion changes, which rely on several standard self-driving datasets centered around interactions with pedestrians (JAAD, PIE, TITAN), and show that our semantic joint reasoning yields state-of-the-art results

    A journey toward generalizable trajectory forecasting models

    No full text
    Autonomous driving is a revolutionary technology that has seen considerable advancements through the adoption of deep learning solutions. One of the major challenges in this field is the interaction with other road users. This interaction necessitates a "trajectory forecasting" component, \textit{i.e.,} to forecast the future positions of these users in the vicinity of the autonomous vehicle. There are two paradigms for trajectory forecasting models. The first one is knowledge-based models built upon the available domain-specific knowledge. These models, while generalizable to various environments, are limited in accuracy due to their lower capacity. The second paradigm is data-driven approaches, mostly in the form of deep learning-based models. These models have higher accuracy but often struggle to generalize to new, unseen environments. The first contribution of this thesis is to combine the two paradigms harnessing the strengths of both worlds. Through our research, we demonstrate that this hybrid approach yields superior generalization and accuracy compared to either knowledge-based or data-driven counterparts. Our experiments reveal that while the data-driven models' performance drops significantly in new environments, the existing evaluation pipelines are unable to demonstrate this shortcoming. The thesis then continues by investigating evaluation methodologies for trajectory forecasting and contributes to enhancing the evaluation pipeline in three significant ways. First, we introduce a framework that integrates multiple datasets, enabling cross-dataset generalization evaluation. Our evaluation shows that models cannot generalize to other datasets. This framework also opens the door to various research inquiries, including examining how data scaling affects model performance and analyzing different datasets. We then explore novel fine-grained evaluations to analyze models in more detail. As the second evaluation enhancement, we propose a methodology that focuses on evaluating the social understanding of forecasting models by employing an adversarial attack approach. Our findings reveal that existing models have a limited social understanding. Our third evaluation approach is a methodology that assesses models' scene understanding capabilities based on atomic scene generation functions. It reveals that the state-of-the-art forecasting models are still inefficient in scene reasoning, leaving room for further improvements. The final part of the thesis tackles a critical challenge in trajectory forecasting: dealing with imperfect perception systems. Errors in perception systems introduce noise into the inputs of trajectory forecasting models, leading to uncertainty in their error bounds. We provide certification methods for trajectory forecasting models which provide certified error bounds for the models given noisy input data.VIT

    A Reinforcement Learning Approach to Train Routing and Scheduling

    No full text
    Good train scheduling for a big network with many trains is very hard to achieve. As the trains are competing for the tracks with one another, the number of constraints grows rapidly. Trying to take advantage of emerging technologies in the areas of optimization and machine learning, the Swiss Federal Railways have created a challenge to find the best algorithm to solve this problem. Two algorithms to solve this task were implemented in this project. First, a greedy strategy trying to schedule all trains at their earliest possible and then solving conflict after conflict was reproduced. It is shown that this algorithm is able to keep up with the best methods existing. With the objective of generalizing and improving the result, a policy gradient method is then added to take on the most critical decisions that have to be made in every iteration of the first algorithm. Even though this second enhanced algorithm does not obtain optimal solutions, it is able to outperform the first one in a generalized task. Hence, it can be shown that reinforcement learning is applicable to the train routing and scheduling task and, more generally, in environments with an open structure.SGCVIT

    Future prediction with deep learning

    No full text
    Future prediction is a fundamental principle of intelligence in which the future state of an environment is predicted given its past states. Accurate future prediction is relevant for applications that require safety in planning such as autonomous driving, robot navigation, or surveillance systems. The complexity of the task stems from the integration of multiple sources of information such as the past states of dynamic agents, the interactions among agents and the semantic constraints of the environment. Moreover, as the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. The first part of the thesis proposes a sampling-fitting framework with a fully probabilistic output allowing the system to predict multiple future modes along with the uncertainty of each mode. It achieves the best trade-off between ensuring diversity of the prediction and matching the underlying true distribution of the future. The framework is applied to different settings of the task including the bird’s eye and egocentric views. For the latter, the framework is further extended to a multi-stages pipeline in which a general prior is generated, propagated to the future and finally refined to obtain the desired output. On both synthetic and large-scale real datasets, our framework triggers good estimates of multimodal distributions and avoids mode collapse. The second part of the thesis conducts an in-depth analysis of the task to uncover new challenges and seek better models. First, our analysis shows that easy scenarios dominate existing real datasets and the most critical ones are much less frequent, harder to learn and usually ignored by existing models. Therefore, we propose to reshape the learned feature space of existing models by pushing challenging scenarios closer to each other that triggers sharing relevant information and yields consequently better results. Second, given the black-box nature of existing models, it remains unclear which features are used to make a prediction. Therefore, we propose a method to quantify the contribution of different cues on the prediction. On common benchmarks, our analysis shows that existing methods are unable to reason about the interaction features between agents and the past state of the target agent is the only feature used for predicting its future

    Uncertainty-aware Model Inversion Networks

    No full text
    In this thesis, we assess a new framework called UMIN on a data-driven optimization problem. Such a problem happens recurrently in real life and can quickly become dicult to model when the input has a high dimensionality as images for instance. From the architecture of aircraft to the design of proteins, a great number of dierent techniques have already been explored. Based on former solutions, this work introduces a brand new Bayesian approach that updates previous frameworks. Former model architectures use generative adversarial networks on one side and a forward model on the other side to improve the accuracy of the results. However, employing a Bayesian network allows us to leverage its uncertainty estimates to enhance the accuracy of the results and also to reduce unrealistic samples output by the generator. By creating new experiments on a modern MNIST dataset and by reproducing former works taken as baseline, we show that the framework introduces in this work outperforms the previous method. The whole code is available at the following url: https://github.com/RomainGratier/Black-box_Optimization_via_Deep_ Generative-Exploratory_Networks.SGCVIT

    Precise Hand Finger Width Estimation via RGB-D Data

    No full text
    We propose a novel application of pose estimation to precisely measure the hand finger width via noisy RGB-D image. A framework is developed that estimates the finger width given data from TrueDepth camera as well as the target finger measure- ment location. Moreover, handPifPaf, a new bottom-up 2D hand pose estimator, is introduced and integrated with the width estimation pipeline. This network performs on a par with the state of the art hand pose estimators on public hand datasets. An extensive 2D annotated RGB hand dataset is built for the real-time application of handPifPaf on the width estimation pipeline. Finally, one unique large-scale hand RGB-D dataset is acquired for the finger width estimation pipeline validation. This set contains real hand data from various subjects, configurations and camera-object distances with exact ground truth finger width measurements at an especific target location.VIT

    Bridging Discrete Choice Modeling and Neural Networks: Unifying Statistical Rigor and Predictive Power

    No full text
    This thesis explores the innovative integration of Discrete Choice Modeling (DCM) and Machine Learning (ML), specifically Neural Networks (NN), within the transportation sector. DCM, with its highly interpretable and hand-designed models grounded in robust mathematics, contrasts with the data-driven and predictive capabilities of NN. By assessing the existing landscape of DCM and ML at the time, we identified a critical gap: while comparative studies of both fields were prevalent, few ventured into integrating their respective advantages. Our research participates in bridging this gap showcasing the complementary potential of DCM and NN. Central to our work is the development of a novel hybrid modeling framework, blending the structured clarity of DCM with the adaptive, data-driven nature of NNs. We elaborate a detailed set of conditions to preserve the interpretability of DCM parameters while leveraging the predictive power of NNs. Our research lays the groundwork for several studies building upon our methodology. We present the newfound literature as well as various parallel directions integrating ML methods for DCM, sharing our expert insights for hybrid modeling. We then proceed by studying our framework for Neural Network applications. Through three distinct investigative approaches, we have enhanced the explainability of NN predictions. Our topics include areas where a high degree of understanding may be critical, such as pedestrian trajectory forecasting and pedestrian crossing behavior. These methods demonstrate how the strategic incorporation of DCM into NNs can elucidate the various weighing factors in pedestrian decision-making. Finally, we contribute to initiating the era of high-dimensional data within Discrete Choice Modeling. We believe that by leveraging the strength of Neural Networks in handling various types of input, we can steer Discrete Choice Modeling towards promising new directions. To set the stage for future works, we study the case of multi-modal datasets with overlapping information. We discuss in which cases the integrity of interpretable parameters is at risk and how to circumvent this. Our research ends by sharing our future views and insights on this novel field. In summary, this thesis not only introduces a groundbreaking framework but also provides critical insights for the advancement of DCM and NN integration. It lays a foundation for future research, aimed at understanding and enhancing decision-making models in an age where data complexity and volume continue to grow exponentially.VIT
    corecore