1,720,962 research outputs found
Exploring Upper Limb Segmentation with Deep Learning for Augmented Virtuality
Sense of presence, immersion, and body ownership are among the main challenges concerning Virtual Reality (VR) and freehand-based interaction methods. Through specific hand tracking devices, freehand-based methods can allow users to use their hands for VE interaction. To visualize and make easy the freehand methods, recent approaches take advantage of 3D meshes to represent the user's hands in VE. However, this can reduce user immersion due to their unnatural correspondence with the real hands. We propose an augmented virtuality (AV) pipeline allows users to visualize their limbs in VE to overcome this limit. In particular, they were captured by a single monocular RGB camera placed in an egocentric perspective, segmented using a deep convolutional neural network (CNN), and streamed in the VE. In addition, hands were tracked through a Leap Motion controller to allow user interaction. We introduced two case studies as a preliminary investigation for this approach. Finally, both quantitative and qualitative evaluations of the CNN results were provided and highlighted the effectiveness of the proposed CNN achieving remarkable results in several real-life unconstrained scenarios
Solid and Effective Upper Limb Segmentation in Egocentric Vision
Upper limb segmentation in egocentric vision is a challenging and nearly unexplored task that extends the well-known hand localization problem and can be crucial for a realistic representation of users' limbs in immersive and interactive environments, such as VR/MR applications designed for web browsers that are a general-purpose solution suitable for any device. Existing hand and arm segmentation approaches require a large amount of well-annotated data. Then different annotation techniques were designed, and several datasets were created. Such datasets are often limited to synthetic and semi-synthetic data that do not include the whole limb and differ significantly from real data, leading to poor performance in many realistic cases. To overcome the limitations of previous methods and the challenges inherent in both egocentric vision and segmentation, we trained several segmentation networks based on the state-of-the-art DeepLabv3+ model, collecting a large-scale comprehensive dataset. It consists of 46 thousand real-life and well-labeled RGB images with a great variety of skin colors, clothes, occlusions, and lighting conditions. In particular, we carefully selected the best data from existing datasets and added our EgoCam dataset, which includes new images with accurate labels. Finally, we extensively evaluated the trained networks in unconstrained real-world environments to find the best model configuration for this task, achieving promising and remarkable results in diverse scenarios. The code, the collected egocentric upper limb segmentation dataset, and a video demo of our work will be available on the project page1
Human segmentation in surveillance video with deep learning
Advanced intelligent surveillance systems are able to automatically analyze video of surveillance data without human intervention. These systems allow high accuracy of human activity recognition and then a high-level activity evaluation. To provide such features, an intelligent surveillance system requires a background subtraction scheme for human segmentation that captures a sequence of images containing moving humans from the reference background image. This paper proposes an alternative approach for human segmentation in videos through the use of a deep convolutional neural network. Two specific datasets were created to train our network, using the shapes of 35 different moving actors arranged on background images related to the area where the camera is located, allowing the network to take advantage of the entire site chosen for video surveillance. To assess the proposed approach, we compare our results with an Adobe Photoshop tool called Select Subject, the conditional generative adversarial network Pix2Pix, and the fully-convolutional model for real-time instance segmentation Yolact. The results show that the main benefit of our method is the possibility to automatically recognize and segment people in videos without constraints on camera and people movements in the scene (Video, code and datasets are available at http://graphics.unibas.it/www/HumanSegmentation/index.md.html)
Egocentric upper limb segmentation in unconstrained real-life scenarios
The segmentation of bare and clothed upper limbs in unconstrained real-life environments has been less explored. It is a challenging task that we tackled by training a deep neural network based on the DeepLabv3+ architecture. We collected about 46 thousand real-life and carefully labeled RGB egocentric images with a great variety of skin tones, clothes, occlusions, and lighting conditions. We then widely evaluated the proposed approach and compared it with state-of-the-art methods for hand and arm segmentation, e.g., Ego2Hands, EgoArm, and HGRNet. We used our test set and a subset of the EgoGesture dataset (EgoGestureSeg) to assess the model generalization level on challenging scenarios. Moreover, we tested our network on hand-only segmentation since it is a closely related task. We made a quantitative analysis through standard metrics for image segmentation and a qualitative evaluation by visually comparing the obtained predictions. Our approach outperforms all comparing models in both tasks and proving the robustness of the proposed approach to hand-to-hand and hand-to-object occlusions, dynamic user/camera movements, different lighting conditions, skin colors, clothes, and limb/hand poses
A Preliminary Investigation on a Multimodal Controller and Freehand Based Interaction in Virtual Reality
In the last years, the synergy between VR and HCI has dramatically increased the user’s feeling of immersion within virtual scenes, improving VR applications’ user experience and usability. Two main aspects have emerged with the evolution of these technologies concerning immersion in the 3D scene: locomotion within the 3D scene and interaction with the components of the 3D scene. Locomotion with classical freehand approaches based on hand tracking can be stressful for the user due to the need to keep the hand still in a specific position for a long time to activate the locomotion gesture. Likewise, using a classic Head Mounted Display (HMD) controller for the interaction with the 3D scene components could be unnatural for the user, using the hand to pinch and grab the 3D objects. This paper proposes a multimodal approach, mixing the Leap Motion and 6-DOF controller to navigate and interact with the 3D scene to reduce the locomotion gesture stress based on the hand tracking and increase the immersion and interaction feeling using freehand to interact with the 3D scene objects
Revolutionizing Media and Gaming with AI: Advancements in Body Measurement Calculation, Motion Tracking, Gesture Recognition, and Upper Limb Segmentation
This contribution discusses the potential of Artificial Intelligence (AI) to enhance Human-Computer Interaction (HCI) methods. Researchers at the Laboratory of Computer Graphics and Parallel Computing at the University of Basilicata have developed several AI-based systems for HCI applications. These systems include an upper limb segmentation system, an XR gesture recognition system, and a virtual dressing room that utilizes Body Tracking and Anthropometric Measurement Systems. The systems use deep learning algorithms to accurately track body movements and interpret hand gestures in real-time, creating a more natural and intuitive interaction with XR environments. The virtual dressing room enables users to create a 3D model of themselves and try on virtual clothing and accessories, ensuring a perfect fit through Anthropometric Measurement System calculations. These AI-based systems have significant potential to enhance user experience and interaction in the HCI field
A Validation Approach for Deep Reinforcement Learning of a Robotic Arm in a 3D Simulated Environment
In recent years, deep reinforcement learning has increasingly contributed to the development of robotic applications and boosted research in robotics. Deep learning and model-free, off-policy, value-based reinforcement learning algorithms enabled agents to successfully learn complex robotic skills through trial and error process and visual inputs. The aim of this paper concerns the training of a robot in a simulation environment by designing a Deep Q-Network (DQN) that elaborates images acquired by an RGB vision sensor inside a 3D simulated environment and outputs a value for each action the robotic arm can execute given the current state. In particular, the robot has to push a ball into a soccer net without any knowledge of the environment and its own location. In addition, our further goal was to perform agent validation during training and assess its generalization level. Despite the many advances in reinforcement learning, it is still a challenge. Therefore, we devised a validation strategy similar to the method applied in supervised learning and tested the agent both on known and unknown experiences, achieving interesting and promising results
A Deep Learning approach for the Motion Picture Content Rating
The film industry brings thousands of films to life every year. Not all of them are suitable for everyone, especially those with violent content. A content rating system is designed for evaluating the content and reporting the suitability for children, teenagers, or adults. It assists content providers during the assignment of rating levels for movies and, on the other hand, it can be useful for users to block violent content directly on their devices. However, applying for content ratings for movies can be tedious, prone to personal judgment, and also impossible if we also consider the videos on video-sharing websites. This work provides a motion picture content rating model to automatically classify and censor violent scenes using a Deep Learning (DL) approach. We collect a large amount of data searching for visual elements, such as blood or weapons, and manually label them according to a rating scale. Then we employ the Convolutional Neural Network (CNN) Inception v3 for training and validating. The CNN is modified, and additional regularization techniques are adopted to avoid overfitting during the training step. Finally, we design a video post-processing algorithm to refine the network output. Preliminary results demonstrate the effectiveness of our automatic classifier for supporting content providers to assign the rating and encourage further investigations on the use of DL
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
