1,720,973 research outputs found

    CBPRS: A City Based Parking and Routing System

    Full text link
    Navigational systems assist drivers in finding a route between two locations that is time optimal in theory but seldom in practice due to delaying circumstances the system is unaware of, such as traffic jams. Upon arrival at the destination the service of the system ends and the driver is forced to locate a parking place without further assistance. We propose a City Based Parking Routing System (CBPRS) that monitors and reserves parking places for CBPRS participants within a city. The CBPRS guides vehicles using an ant based distributed hierarchical routing algorithm to their reserved parking place. Through means of experiments in a simulation environment we found that reductions of travel times for participants were significant in comparison to a situation where vehicles relied on static routing information generated by the well known Dijkstra’s algorithm. Furthermore, we found that the CBPRS was able to increase city wide traffic flows and decrease the number and duration of traffic jams throughout the city once the number of participants increased.information systems;computer simulation;dynamic routing

    Multi-modal aggression detection in trains

    No full text
    In many public places multiple sensing devices, such as cameras, are installed to help prevent unwanted situations such as aggression and violence. At the moment, the best solution to reach a safe environment requires human operators to monitor the camera images and take appropriate actions when necessary. In the wake of the terrorist attacks of September 11 2001, there has been a rapid growth in the volume of security cameras and other sensing devices for anti-terrorism and other security purposes. The increased application of these, often multi-modal, sensors has caused a digital data explosion that human operators have difficulty to keep up with. The need for a fully or partially automated system becomes all the more prevailing. The main aim of this thesis is to report on our work to address the complex challenges that arise within the context of multi-modal automatic surveillance applications. In this thesis work, a multi-modal aggression detection system was built that fuses audio and video data from sensors located in a train compartment. Compared to previous work, we adopt a more human centered approach to the detection problem by extracting knowledge and rules from security experts. The aggression detection system is based on many hours of observing and studying professional operators at work as they analyze and respond on surveillance data. Our aggression detection approach is essentially divided into two models: (1) the observation model which describes how low level features from observations are combined into high level concepts and (2) the reasoning model in which high level concepts are reasoned with in order to infer the presence of aggression. In the observation model, feature extraction algorithms are used to transform audio and video signals into features, which are combined by classification algorithms into high level concepts. In the thesis, an analysis is made of the train compartment in particular, on the objects and situations that may be encountered in the train compartment. This analysis is formalized in a train aggression ontology. In addition an overview of relevant audio and video feature extraction and classification algorithms is given. Also the JDL model is introduced as a way to structure the wide range of available algorithms. In the reasoning model knowledge of the human expert and high level reasoning is used to infer the presence of aggression. In essence this boils down to combining the results of the observation model to a description of the current scenario, and comparing this to known scenarios. If the current scenario is similar to a known unwanted scenario or if the current scenario deviates too much from a known normal scenario, an alarm situation may be announced. There are a number of different approaches to accomplish the inference. In this thesis, three different inference methods are explored for their merits in aggression detection: expert system based reasoning, Bayesian reasoning and self organization/emergent reasoning. To test and verify the results, several experimentswere conducted in a real train. During the experiments, actors had to perform scenarios as described in storyboards. The storyboards where previously validated by security experts for their realism. As the actors performed the scenarios data was captured using multiple cameras and microphones. The acquired data was annotated using the vocabulary from the train aggression ontology and used as ground truths for the evaluation of the aggression detection system.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    Knowledge driven facial modelling

    No full text
    This research aims at supporting users if not involved in computer graphics, facial physiology, or psychology and in need of generating realistic facial animations. Realism is to be understood in terms of the visual appeal of a single rendered image and focused on believable behaviour of the animated face. Our goal is to develop a system enabling semi-automatic facial animation, allowing an average user to generate facial animation in a simple manner. A system with knowledge about the communicative functions of facial expressions that would support an average user to generate facial animation valid from a psychological and physiological point of view.Electrical Engineering, Mathematics and Computer Scienc

    Dynamic Routing using Ant-Based Control

    No full text
    Currently most car drivers use static routing algorithms based on the shortest distance between start and end position. But the shortest route is different from the fastest route in time. Because existing routing algorithms lack the ability to react to dynamic changes in the road network, drivers are not optimally routed. In this thesis we present a multi-agent approach for routing vehicle drivers using historically-based traffic information. The general workings of our solution bears strong similarities with Ant Based Control (ABC) and AntNet, but an important modification has been made, namely the adaptation of ant-like agents for spatio-temporal routing. The dynamic routing algorithm proposed, routes self-interested drivers on an intersection to intersection basis via the fastest path between a proposed source and a destination. For this to happen, a time-expanded graph encodes variable road network costs. Ant-like agents are launched in this graph. They use a technique of collective learning based on locally dependent pheromone tables. Finally, we report results obtained for part of The Netherlands' GIS-based road network. In the established experiment setting, the new ABC makes a positive difference for drivers. An important reduction of the travelling time was observed in 53% of the cases. The experimental results also showed that ABC clearly outperforms Static Dijkstra's algorithm and Dynamic Dijkstra with updates.Media & Knowledge EngineeringMan Machine InteractionElectrical Engineering, Mathematics and Computer Scienc

    Building a visual speech recognizer

    No full text
    This thesis describes how an automatic lip reader was realized. Visual speech recognition is a precondition for more robust speech recognition in general. The development of the software comprised the following steps: gathering of training data, extracting meaningful features from the obtained video material, training the speech recognizer and finally evaluating the resulting product. First, research was done to gain insight on the theoretical aspects of automatic lip reading, the state of the art, speech corpus development, face tracking and feature extraction. Gathering training data came down to the recording and composing of a new audio-visual speech corpus for Dutch. With frontal and side images of 70 different speakers recorded at a frame rate of 100 frames per second this is the most diverse corpus currently in existence. Analysis of the new data corpus shows an increase in quality compared to other corpora. Visual information is obtained by searching the video footage. Using Active Appearance Models, points of an a priori defined model of the lower half of the face are tracked over time. Based on the model point coordinates, distance and area, features are computed that are used as input to the speech recognizer. Training was accomplished by presenting labeled training data to viseme-based Hidden Markov Models that model speech production. In a few steps the model parameters were adjusted, so that it could be used to perform recognition of visual speech signals from then on. The recognizer was implemented using tools from the Hidden Markov Model Toolkit. The results of a visual speech recognizer based on training data from a single person depend on the utterance type of the unlabeled data. For the simple word-level task of digit recognition 78% was recognized correctly with a word recognition rate of 68%. For letter recognition tasks it did not perform nearly as well, but considering the limitations that the use of visemes over phonemes imposes, these results are at the expected level. The data corpus and visual speech recognizer will be a valuable asset to future research.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    Towards Robust Visual Speech Recognition: Automatic Systems for Lip Reading of Dutch

    No full text
    In the last two decades we witnessed a rapid increase of the computational power governed by Moore's Law. As a side effect, the affordability of cheaper and faster CPUs increased as well. Therefore, many new “smart” devices flooded the market and made informational systems widely spread. The number of users of information systems has also increased many folds, and the user's characteristics have changed to include not only a small number of initiates but also a majority of non technical people. To make this transition possible systems' developers had to change the computer user interfaces in order to make it simpler and more intuitive. However, the interaction was still based on rather artificial devices such as mouse and keyboard. Since the Moore's Law continues to work over and over again we came to a critical moment when the current systems can easily cope with other input streams such as video and audio, to name the most important, and others. We can, therefore, envision systems with which we can communicate through speech and body movements and that can automatically and transparently adapt to the environment and user. This can be done for instance by recognizing the user affective state, by understanding the task of the user and recognizing the context of the interaction. Automatic speech recognition by capturing and processing the audio signal is one development in this direction. Even though in controlled settings automatic speech recognition has achieved spectacular results, its performance is still dependent on the context (for instance on the level of the background noise). Automatic lip reading has appeared in this context as a way to enhance automatic speech recognition in noisy environments. Even though it is still a relatively novel research domain, other applications were found which employ lip reading as stand alone: interfaces for hearing impaired persons, security applications, speech recovery from mute of deteriorated films, silence interfaces. With the advances in visual signal processing the research in lip reading was also revitalized. However, at the moment of writing of this thesis lip reading was still waiting for its great leap. This thesis investigates several techniques for directing lip reading towards more robust performances. The thesis starts by introducing the relevant methodologies that govern automatic lip reading. Next it introduces all the concepts needed to understand the technologies, experiments, results and discussions presented later on. It is, therefore, one of the most important parts of the thesis. The presentation of the state of the art in lip reading is setting the starting point of the research presented. Before, continuing to follow the lip reading process the thesis introduces the mathematical foundations that give the theoretical support for the analysis. All our systems are based on the Hidden Markov Models approach. This paradigm has proved to be very useful in similar problems and we successfully employed it for lip reading. The main idea behind it is the Bayesian rule which says that starting from some a-priori knowledge we can always improve our understanding of a system through observation. Observation translates into processing the video stream in a set of features that describe what is being said by the speaker. However, in order to appropriately train lip reading systems, a large amount of data is needed. The first important contribution of our research is a large data corpus for the Dutch language. This corpus, named “New Delft University of Technology Audio Visual Speech Corpus”, is at the date of writing this thesis one of the largest corpora for lip reading in Dutch. The corpus contains dual view high speed recordings (i.e. 100Hz) of continuous speech in Dutch. During the building of the corpus, we also produced an incipient set of guidelines for building a data corpus for lip reading which we hope to be useful for other researchers. However, the core of this thesis consists in the data parametrization. Data parametrization is the process that transforms the input video data in a set of features that are used later on for training and testing the resulting recognizer. The parametrization should reduce the size of the input data while preserving the most important information related with what the speaker says. We investigated three data parametrization techniques each coming from a different category of algorithms. We used Active Appearance Models which generate a combined geometric and appearance based set of features, we used optical flow analysis which is an appearance based approach that directly accounts for the apparent movement on the speaker's face and we used a statistical approach which generates the geometry of lips without starting from an a-priori fixed model. During the research presented in this thesis we investigated the performances of these data parametrization techniques and we pointed out their strengths and weaknesses. We also analysed the performance of lip reading starting from other points of view. We analysed the influence of the sampling rate of the video data, the performance of the lip readers as a function of the recognition task but also the performance as a function of the size of the corpus used. Answering to all these questions improved our understanding of the limitations and the possible improvements of lip reading.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    Modelling context in automatic speech recognition

    No full text
    Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the other hand, recognising speech is a daunting task. It has to deal with a large number of different voices "influenced, among other things, by emotion, moods and fatigue" the acoustic properties of different environments, dialects, a huge vocabulary and an unlimited creativity of speakers to combine words and to break the rules of grammar. Almost all existing automatic speech recognisers use statistics over speech sounds "what is the probability that a piece of audio is an a-sound" and statistics over word combinations to deal with this complexity. The results of those systems are impressive but unfortunately not good enough for most applications of speech recognition. This thesis proposes to put context information in the models of speech recognition to achieve better recognition results. Context is defined as knowledge of the speaker, such as gender and dialect, knowledge of the conversation and knowledge of the world. The influence of each of those categories is investigated using data analysis and case studies and new models for speech recognition are defined. In particular, a model that dynamically adapts the vocabulary of the recogniser to the topic of a conversation, which it can automatically determine, is presented.Electrical Engineering, Mathematics and Computer Scienc

    Multimodal recognition of emotions

    No full text
    This thesis proposes algorithms and techniques to be used for automatic recognition of six prototypic emotion categories by computer programs, based on the recognition of facial expressions and emotion patterns in voice. Considering the applicability in real-life conditions, the research is carried in the context of devising person independent methods that should be robust to various factors given the specificity of the considered modalities. An immediate focus represents the development of audio-visual algorithms and their implementation in form of software applications for automatic recognition of emotions.MediamaticaElectrical Engineering, Mathematics and Computer Scienc

    Human Handheld-Device Interaction: An Adaptive User Interface

    No full text
    The move to smaller, lighter and more powerful (mobile) handheld devices, whe-ther PDAs or smart-phones, looks like a trend that is building up speed. With numerous embedded technologies and wireless connectivity, the drift opens up unlimited opportunities in daily activities that are both more efficient and more exciting. Despite all these advancing possibilities, the shrinking size and the mobile use impose challenges for both technical and usability aspects of the devices and their applications. An adaptive user interface, that is able to autonomously adjust its display and available actions to current goals, contexts and emotions of its user, represents solutions for limited input options, various constraints of the output presentation, and user requirements due to mobility and attention shifting in human handheld-device interaction. The present work made preliminary steps in proposing a framework for a rapid construction of adaptive user interfaces that are multimodal, context-aware and affective, on handheld devices. The framework consists of predefined modules that are able to work in isolation but can also be connected in an ad hoc way as part of the framework. The modules deal with human handheld-device interaction, the interpretation of the user's actions, knowledge structure and management, the selection of appropriate responses and the presentation of feedback. Human language and visual perception models have been studied in formulating concepts or ideas as both text and visual language-based messages. An adaptive circular on-screen keyboard and visual language-based interfaces have been proposed as alternative input options for fast interaction. In particular, sentences in the visual language can be constructed using spatial arrangements of visual symbols, such as icons, lines, arrows and ellipses. As icons offer a potential across language barriers, any interaction using the visual language is suitable for language-independent contexts. Personalized predictive and language-based features have also been added to accelerate both input methods. An ontology has been chosen to represent knowledge of the user, the task and the world. The modeling and structure of the knowledge representation has been designed for sharing common semantics, integrating the communication inter-modules, and fulfilling the context aware requirement. It enables the framework to be developed into a widespread application for different domains. The context awareness is approached by interpreting both verbal and non-verbal aspects of user inputs to update the system's belief about the user, the task and the world. Methods and techniques to fuse multiple input modalities for multiple messages from multiple users into a coherence and context dependent interpretation have been developed. A simple approach to emotion analysis has been proposed to interpret the nonverbal aspect of the inputs. It is based on a keyword spotting approach by categorizing the emotional state into a certain valence orientation with intensity. The approach is suitable for a high uncertainties input recognition. Template-based interaction management and output generation methods have been developed. The templates have a direct link to concepts in the ontology-based knowledge representation. This approach supports a common semantic with other modules within the framework. It allows the development of a bigger scale system with consistent and easy to verify knowledge repositories. A multimodal, multi-user, and multi-device communication system in the field of crisis management built based on the framework has been developed as a proof of the proposed concepts. This system consists of comprehensive selected modules for reporting and collaborating observations using handheld devices in mobile ad-hoc network-based communication. It supports communication using the combination of text, visual language and graphics. The system is able to interpret user messages, construct knowledge of the user, the task and the world, and develop a crisis scenario. User tests were aimed at an assessment of whether or not users are capable of expressing their messages using the provided modalities. The tests also addressed usability issues on interacting with an adaptive user interface on handheld devices. The experimental results indicated that the adaptive user interface is able to support communication between users and between users and their handheld devices. Moreover, an explorative study within this research has also generated knowledge regarding (technical, social and usability aspects of) user requirements in adaptive user interfaces and (generally) human handheld-device interaction. The rationale behind our approaches, designs, empirical evaluations and implications for research on the framework for an adaptive user interface on handheld devices are also described in this thesis.Man Machine Interaction, MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    Electricity load modelling using computational intelligence

    No full text
    As a consequence of the liberalisation of the electricity markets in Europe, market players have to continuously adapt their future supply to match their customers' demands. This poses the challenge of obtaining a predictive model that accurately describes electricity loads, current in this thesis. Kernel machines are considered to be the state of the art of supervised learning methods. A Bayesian-framework based kernel-machine is extended to represent data in a way that is sparse in feature space and smooth in output space. It is argued that this leads to a higher degree of generalisation. Kernel machines can be tailored to better suit one's demands; electricity-demand-specific representations are designed for day types and for emphasising twilight periods. A multi-component setup is proposed to increase the orthogonality between input variables. For wind-power production forecasting, data from several weather stations is combined to refine the coarse resolution of wind-speed measurements. To put theory into practice, the kernel-machine library has been developed. It offers its users efficiency and flexibility. All proposed representations are implemented and tested for their embeddability in a real-world environment. The multi-component structureis filled with calendar, trend, temperature, radiation, and wind components. These components enable the electricity demands to be unravelled; several new explicit facts are discovered, such as the influence of Sinterklaas and a cloudburst. The resulting systems produce competitively accurate and detailed predictions of past and future electricity loads.Electrical Engineering, Mathematics and Computer Scienc
    corecore