1,721,032 research outputs found

    XML-Based Languages for Multimodality in Mobile Environments

    No full text
    The development of multimodal tools and mobile devices in particular is producing great interest, especially for accessing Web information, performing transactions, and use of services in general. This article considers the different markup languages proposed by the working groups of W3C (World Wide Web Consortium) to manage multimodal interaction and perspectives of multimodal applications and services. The trend toward the convergence of various methodologies and technologies has developed new devices providing complex services, contributing to the sharing of experiences, and promoting the inclusion of people as community members (Paternò, 2004). This trend is based on the development of mobile devices and their usability, accessibility, portability, and versatility (Kvale, Warakagoda, & Knudsen, 2003). The usefulness and usability of services, and the ability to access them and information, are the basic elements in the diffusion of Web systems and development of Web multimodal languages. The diffusion and implementation of multimodal services is supported by the activities of the World Wide Web Consortium, aimed at extending interaction modes for different devices and particularly devoted to solving various problems connected with: (1) multimodal Web interaction through the different devices, and (2) practice Web navigation from different devices. Some W3C working groups focus their activities on issues such as independence from devices, multimodal Web access, and types of contents for multimodal messaging. These specifications allow rich multimodal contents to be transmitted, and are based on the power and extensibility of XML (eXtensible Markup Language) (Bray, Paoli, Sperberg- McQueen, Maler, & Yergeau, 2004). XML is highly important in a mobile application environment, as many applications have to manage multimediacontents and need dedicated tools for this. SMIL (Synchronized Multimedia Integration Language) (Solon, McKevitt, & Curran, 2004) was proposed to achieve this goal. In the early years W3C-MMI (W3CMultiModal Interaction) focused on multimodal interaction modes such as speech and pen interaction, and providing users with W3C technologies. W3C develops these technologies by orienting individual interaction modes in order to create mixed-namespace XML documents, such as SVG (scalable vector graphics) (Chatty, Lemort, Sire, & Vinot, 2005) and XHTML (Extensible HyperText Markup Language) (Musciano & Kennedy, 2003) for visual interaction, and VoiceXML (Voice Extensible Markup Language) (Lucas, 2000) for voice interaction. However, many other XML-derived languages have helped in the development of mobile services. The next target is the consideration of the mobile network as an extension of the global Internet network. This article explains the importance of XML and its dialects in a mobile application environment to enable their use by the “various applications/services” (today available on the Web). In fact, different dialects may be needed for different mobile devices depending on their characteristics (such as memory, CPU speed, integrated software engine, etc.). For example, two SVG profiles are defined for cellular phones and PDAs (personal digital assistants): SVG Tiny (SVGT) is suitable for the next generation of cellular phones especially, while SVG Basic (SVGB) is aimed at high-tech devices such as PDAs or smart phones (Andersson et al., 2003). The pervasive use of mobile devices will be the target for the near future (Branco, 2001), given the trend towards considering the mobile network as an extension of the Internet global network. This scenario promotes the development of new dialects for multimodal interaction through mobile devices. The dialects developed for speech, sketch, and visual interaction are discussed next. An area for future development might focus on interaction through gestures. XML (eXtensible Markup Language) is a simple, flexible, and powerful markup language, based on text format that allows the development of a potentially unlimited number of innovative multimodal services and applications. It was derived from the more complex, complete SGML (Standard Generalized Markup Language, ISO 8879) (Chamberlin & Goldfarb, 1987), designed for more general purposes. However, XML language is easier to manage, and is genuinely Web oriented and mobile oriented. In other words, XML language is an optimal subset of SGML, constructed in consideration of the possible Web services and applications. XML can be used to develop several languages taking the specific working context into account. It also plays an important role in the exchange of a wide variety of data, making them available and accessible by Web using computers and mobile devices

    MAGI: Multistream aerial segmentation of ground images with small-scale drones

    No full text
    In recent years, small-scale drones have been used in heterogeneous tasks, such as border control, precision agriculture, and search and rescue. This is mainly due to their small size that allows for easy deployment, their low cost, and their increasing computing capability. The latter aspect allows for researchers and industries to develop complex machine-and deep-learning algorithms for several challenging tasks, such as object classification, object detection, and segmentation. Focusing on segmentation, this paper proposes a novel deep-learning model for semantic segmentation. The model follows a fully convolutional multistream approach to perform segmentation on different image scales. Several streams perform convolutions by exploiting kernels of different sizes, making segmentation tasks robust to flight altitude changes. Extensive experiments were performed on the UAV Mosaicking and Change Detection (UMCD) dataset, highlighting the effectiveness of the proposed method

    A Novel Multimodal Framework To Support Human-Computer Interaction

    No full text
    The new tendency in human-computer interaction is to exploit all humans ex- pressive forms to enable natural interaction with different applications and de- vices. Usually, hand gesture and speech modalities represent the best to imple- ment suitable interfaces in every application context. Developing a framework to define and recognise any set of hand gestures (with or without physical con- trollers) associating it to a set of vocal sentences is still challenging. In fact, on one side, different sets of gestures are characterised by different recogni- tion approaches derived from context, device or application needs. On the other hand, the matching process between a specific gesture and a particular sentence is prone to ambiguous interpretations, errors and coarse simplifications. This paper describes a novel gesture and speech based framework to generate a set of bi-modal interfaces designed to be plugged-in with XML compatible devices

    Design of a 3D platform for immersive neurocognitive rehabilitation

    No full text
    In recent years, advancements in human-computer interaction (HCI) have enabled the development of versatile immersive devices, including Head-Mounted Displays (HMDs). These devices are usually used for entertainment activities as video-gaming or augmented/virtual reality applications for tourist or learning purposes. Actually, HMDs, together with the design of ad-hoc exercises, can also be used to support rehabilitation tasks, including neurocognitive rehabilitation due to strokes, traumatic brain injuries, or brain surgeries. In this paper, a tool for immersive neurocognitive rehabilitation is presented. The tool allows therapists to create and set 3D rooms to simulate home environments in which patients can perform tasks of their everyday life (e.g., find a key, set a table, do numerical exercises). The tool allows therapists to implement the different exercises on the basis of a random mechanism by which different parameters (e.g., objects position, task complexity) can change over time, thus stimulating the problem-solving skills of patients. The latter aspect plays a key role in neurocognitive rehabilitation. Experiments obtained on 35 real patients and comparative evaluations, conducted by five therapists, of the proposed tool with respect to the traditional neurocognitive rehabilitation methods highlight remarkable results in terms of motivation, acceptance, and usability as well as recovery of lost skills

    FaceVision-GAN: A 3D Model Face Reconstruction Method from a Single Image Using GANs

    No full text
    Generative algorithms have been very successful in recent years. This phenomenon derives from the strong computational power that even consumer computers can provide. Moreover, a huge amount of data is available today for feeding deep learning algorithms. In this context, human 3D face mesh reconstruction is becoming an important but challenging topic in computer vision and computer graphics. It could be exploited in different application areas, from security to avatarization. This paper provides a 3D face reconstruction pipeline based on Generative Adversarial Networks (GANs). It can generate high-quality depth and correspondence maps from 2D images, which are exploited for producing a 3D model of the subject’s face

    SIRe-Networks: Convolutional neural networks architectural extension for information preservation via skip/residual connections and interlaced auto-encoders

    No full text
    Improving existing neural network architectures can involve several design choices such as manipulating the loss functions, employing a diverse learning strategy, exploiting gradient evolution at training time, optimizing the network hyper-parameters, or increasing the architecture depth. The latter approach is a straightforward solution, since it directly enhances the representation capabilities of a network; however, the increased depth generally incurs in the well-known vanishing gradient problem. In this paper, borrowing from different methods addressing this issue, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by preserving information from the input image through interlaced auto-encoders (AEs), and further refines the base network architecture by means of skip and residual connections. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on five collections, i.e., MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and Caltech-256; where the SIRe-extended architectures achieve significantly increased performances across all models and datasets, thus confirming the presented approach effectiveness

    Design of an Efficient Framework for Fast Prototyping of Customized Human-Computer Interfaces and Virtual Environments for Rehabilitation

    No full text
    Rehabilitation is often required after stroke, surgery, or degenerative diseases. It has to be specific for each patient and can be easily calibrated if assisted by human-computer interfaces and virtual reality. Recognition and tracking of different human body landmarks represent the basic features for the design of the next generation of human-computer interfaces. The most advanced systems for capturing human gestures are focused on vision-based techniques which, on the one hand, may require compromises from real-time and spatial precision and, on the other hand, ensure natural interaction experience. The integration of vision-based interfaces with thematic virtual environments encourages the development of novel applications and services regarding rehabilitation activities. The algorithmic processes involved during gesture recognition activity, as well as the characteristics of the virtual environments, can be developed with different levels of accuracy. This paper describes the architectural aspects of a framework supporting real-time vision-based gesture recognition and virtual environments for fast prototyping of customized exercises for rehabilitation purposes. The goal is to provide the therapist with a tool for fast implementation and modification of specific rehabilitation exercises for specific patients, during functional recovery. Pilot examples of designed applications and preliminary system evaluation are reported and discussed. © 2013 Elsevier Ireland Ltd

    Homography vs similarity transformation in aerial mosaicking: which is the best at different altitudes?

    No full text
    Aerial image mosaicking of an area of interest is the process of combining multiple images, of an area with overlapping regions, into a single comprehensive view. In this process, image registration, i.e., the operation of geometric transformation to align and overlay two or more images of the same scene taken from different viewpoints, starting from their common parts, plays a key role in terms of artifacts reduction. In the current state-of-the-art, image registration of aerial images is usually performed through the use of the homography transformation. This occurs because these images are frequently acquired at high altitudes (more than 100 meters) and the homography has always provided excellent performance. The recent widespread of Unmanned Aerial Vehicles (UAVs) has enabled the development of several applications where mosaics are used as reference images for high precision tasks, including Detection, Recognition, and Identification (hereinafter DRI) of people and objects. These tasks need to acquire images at very low altitudes (below 50 meters), in which the homography tends to introduce artifacts during the registration process. Therefore, a different transformation able to limit how an image can be morphed, i.e., the similarity transformation, is necessary to perform the image registration, thus improving the overall accuracy of the obtained mosaics. In this paper, for the first time in literature, a comparison between the homography and similarity transformations is performed. In particular, the comparison is carried out by using three recently released public datasets, i.e., NPU Drone-Map, senseFly, and UAV Mosaicking and Change Detection (UMCD), containing challenging aerial video sequences acquired at high and low altitudes. The experimental tests have pointed out the direct relationship among best image transformation, UAV altitude, and spatial resolution, required to accomplish the DRI tasks reported above

    Basis for the implementation of an EEG-based single-trial binary brain computer interface through the disgust produced by remembering unpleasant odors

    No full text
    In order to implement an EEG-based brain computer interface (BCI), a very large number of strategies (ranging from sensory – motor, p300, auditory based, visually based) can be used. However, no technique exists which is based on the olfactory stimulation or, better, based on the imagination of olfactory stimuli. The present paper describes an innovative paradigm, that is the voluntary brain activation with the disgust produced by remembering unpleasant odors, and a simple and robust classi fi cation method on which a single trial binary BCI can be implemented. In order to classify the signal, mainly the channels P4, C4, T8 and P8 have been used, by spanning the frequency band between 32 and 42 Hz, that is a subset of the gamma band external to the bands usually occupied by other tasks (the interval between 1 and 30 Hz), and the alpha band between 8 and 12 Hz. Right hemisphere of the brain and gamma band of frequencies are particularly sensitive when experiencing negative emotions, such as the disgust produced by smelling or remembering unpleasant odors, while the alpha band is usually modi fi ed with concentration. This constitutes an advantage for the proposed classi fi cation technique because it is made intrinsically easy by the localization into particular positions and frequencies: different features are mostly based on different frequency bands. The choice of disgust produced by remembering unpleasant odors is twofold: smelling is an ancestral sensation which is so strong that its EEG signal is produced also in persons affected by hyposmia when they imagine an olfactory situation; it can be used without external stimulation, that is the user can decide freely when and if activate it. The proposed method and the experimental setup are described and a series of experimental measurements are presented and discussed. The accuracy of the proposed method is also evaluated and the reached levels are about 90%. The proposed system can be a useful communication alternative for disabled people that cannot use other BCI paradigms

    Overall Design and Implementation of the Virtual Glove

    No full text
    Post-stroke patients and people suffering from hand diseases often need rehabilitation therapy. The recovery of original skills, when possible, is closely related to the frequency, quality, and duration of rehabilitative therapy. Rehabilitation gloves are tools used both to facilitate rehabilitation and to control improvements by an evaluation system. Mechanical gloves have high cost, are often cumbersome, are not re-usable and, hence, not usable with the healthy hand to collect patient-specific hand mobility information to which rehabilitation should tend. The approach we propose is the virtual glove, a system that, unlike tools based on mechanical haptic interfaces, uses a set of video cameras surrounding the patient hand to collect a set of synchronized videos used to track hand movements. The hand tracking is associated with a numerical hand model that is used to calculate physical, geometrical and mechanical parameters, and to implement some boundary constraints such as joint dimensions, shape, joint angles, and so on. Besides being accurate, the proposed system is aimed to be low cost, not bulky (touch-less), easy to use, and re-usable.Previous works described the virtual glove general concepts, the hand model, and its characterization including system calibration strategy. The present paper provides the virtual glove overall design, both in real-time and in off-line modalities. In particular, the real-time modality is described and implemented and a marker-based hand tracking algorithm, including a marker positioning, coloring, labeling, detection and classification strategy, is presented for the off-line modality. Moreover, model based hand tracking experimental measurements are reported, discussed and compared with the corresponding poses of the real hand. An error estimation strategy is also presented and used for the collected measurements. System limitations and future work for system improvement are also discussed. © 2013 Elsevier Ltd
    corecore