1,721,047 research outputs found
Behavior Analysis: Detecting and indexing moving objects for Behavior Analysis by Video and Audio Interpretation
Detecting and indexing moving objects for behavior analysis by video and audio interpretation
Advisors: Mario Vento, Luc Brun. Date and location of PhD thesis defense: 24 February 2014, University of SalernoIn this thesis a system for analyzing moving objects behavior for surveillance applications is proposed: videos are processed in order to extract and analyze moving objects trajectories for identifying abnormal trajectories, associated to abnormal behaviors. Whereas the information extracted from the videos are not sufficient or not sufficiently reliable, the proposed system is enriched by a module in charge of recognizing audio events of interest such as shoots, screams or broken glasses. Finally, all the extracted information are suitably stored in order to allow an efficient retrieval from the human operator. Five different standard datasets have been used for testing the different modules proposed in this thesis; the obtained results, both in terms of accuracy and computational efficiency, confirm the effectiveness and the real applicability of the proposed approach
Designing Huge Repositories of Moving Vehicles Trajectories for Efficient Extraction of Semantic Data
The rapid development of digital cameras equippedwith video analytics software is providing the availability of largeamount of traffic data describing the trajectories traced by eachvehicle and person within a scene. These data offer enormouspotential when coupled with a querying system able to extractsynthetic but meaningful information as those obtained by spatiotemporalqueries; the latter allow, for instance, to select all thosetrajectories passing through some parts of the scene, even ingiven sequences, and adding restrictions on the properties of theobjects (the category of the vehicles, their color and size, and soon).In this paper we propose a novel system for efficiently storingand querying large amounts of 3D data (trajectories over time),specifically designed for making possible the formulation of awide variety of spatio-temporal 3D queries. The method is basedon a novel 3D data schema which is reconducted to a set of 2Dschemata, being the latter the only ones available in currentlyready-to-use database environments.An implementation of the system over PostGIS is presentedin this paper, together with a performance assessment on ahuge trajectory database. The obtained results confirm theeffectiveness of the proposed approach and its applicability toreal applications
Real-time Fire Detection for Video Surveillance Applications using a Combination of Experts based on Color, Shape and Motion
In this paper, we propose a method that is able to detect fires by analyzing videos acquired by surveillance cameras. Two main novelties have been introduced. First, complementary information, based on color, shape variation, and motion analysis, is combined by a multiexpert system. The main advantage deriving from this approach lies in the fact that the overall performance of the system significantly increases with a relatively small effort made by the designer. Second, a novel descriptor based on a bag-of-words approach has been proposed for representing motion. The proposed method has been tested on a very large dataset of fire videos acquired both in real environments and from the web. The obtained results confirm a consistent reduction in the number of false positives, without paying in terms of accuracy or renouncing the possibility to run the system on embedded platforms
A human-like description of scene events for a proper UAV-based video content analysis
In Video Surveillance age, the monitoring activity, especially from unmanned vehicles, needs some degree of autonomy in the scenario interpretation. Video Analysis tasks are crucial for the target tracking and recognition; anyway, it would be desirable if a further level of understanding could provide a comprehensive, high-level scene description, by reflecting that human cognitive capability of providing a concise scene description that comes from the analysis of involved objects relationships and actions. This paper presents a smart system to identify mobile scene objects, such as people, vehicles, automatically, by analyzing the videos acquired by drones in flight, along with the activities they carried out, so as to depict what it happens in the scene from a high-level perspective. The system uses Artificial Vision methods to detect and track the mobile objects and the area where they move, and Semantic Web technologies to provide a high-level description of the scenario. Spatio/temporal relations among the tracked objects as well as simple object activities (events) are described. By semantic reasoning, the system is able to connect the simple activities into more complex activities, that better reflect a human-like description of a scenario portion. Tests conducted on several videos, showing scenarios set in different environments, return convincing results which affirm the effectiveness of the proposed approach
Gender recognition in the wild: a robustness evaluation over corrupted images
In the era of deep learning, the methods for gender recognition from face images achieve remarkable performance over most
of the standard datasets. However, the common experimental analyses do not take into account that the face images given
as input to the neural networks are often affected by strong corruptions not always represented in standard datasets. In this
paper, we propose an experimental framework for gender recognition “in the wild”. We produce a corrupted version of the
popular LFW+ and GENDER-FERET datasets, that we call LFW+C and GENDER-FERET-C, and evaluate the accuracy
of nine different network architectures in presence of specific, suitably designed, corruptions; in addition, we perform an
experiment on the MIVIA-Gender dataset, recorded in real environments, to analyze the effects of mixed image corruptions
happening in the wild. The experimental analysis demonstrates that the robustness of the considered methods can be further
improved, since all of them are affected by a performance drop on images collected in the wild or manually corrupted. Start-
ing from the experimental results, we are able to provide useful insights for choosing the best currently available architecture
in specific real conditions. The proposed experimental framework, whose code is publicly available, is general enough to be
applicable also on different datasets; thus, it can act as a forerunner for future investigations
A multi-task network for speaker and command recognition in industrial environments
In industrial environments, it is crucial to establish a strong collaboration between humans and robots to enhance productivity. However, the nature of the work demands that workers have the authority to provide specific instructions to the robots. The scientific community has extensively investigated these dual requirements, aiming to develop advanced systems capable of recognizing voice commands and implementing speaker authentication. Nevertheless, in the industrial context, these tasks should be executed simultaneously on low-cost and low-power embedded devices that can be mounted on board the robotic platform. To overcome this challenge, we propose a multi-task network for Speech-Command Recognition and Speaker Identification. Additionally, we employ the GradNorm adaptive algorithm to address the issue of task imbalance. To evaluate the proposed system, we introduce a new dataset, MIVIA-ISC, consisting of 20,857 samples uttered by 562 speakers for 31 distinct commands. Our approach significantly reduces the network size by 47% and its execution time by 48% compared to the commonly used methodology, which employs one network for each task. Furthermore, our approach demonstrates a significant improvement in the accuracy of the Speaker Identification task, achieving an 11% increase compared to the corresponding single-task network. Importantly, this enhancement is achieved without compromising the accuracy of the Speech-Command Recognition task, which experiences only a minimal 3% decrease in performance
Vehicles Detection for Smart Roads Applications on Board of Smart Cameras: A Comparative Analysis
Video analytics can be profitably adopted in smart roads environments to automatically detect abnormal situations. Within this context, vehicle detection is the first and foremost stage, and its accuracy is crucial, since any detection error will affect the performance of any subsequent step. Furthermore, in smart road environments it is often preferred to perform the video analysis directly on board of smart surveillance cameras, in order to reduce bandwidth usage and eliminate the cost of setup and maintenance of powerful processing servers; on the flip side, processing on board of smart cameras implies the detection algorithm to be fast and slim, since the resources available on this kind of embedded device are limited. In the era of deep learning, it seems that the question what is the best method for vehicle detection? may have a trivial answer, since this class of methods includes some very accurate ones. Anyway, according to the above consideration, the best suited method for this application is not necessarily the most accurate one, but for sure the most accurate one running on the available hardware at a given resolution and frame rate. Starting from the above considerations, in this paper we perform an analysis of the methods available in the literature for vehicle detection, by comparing them in terms of accuracy and computational burden, with the aim to answer the following question: what is the best method for vehicles detection when working with smart cameras
- …
