1,720,981 research outputs found

    Advantages and Limitations of Fully on-Chip CNN FPGA-Based Hardware Accelerator

    No full text
    Convolution Neural Networks are a class of deep neural networks commonly used in audio and video elaborations. Their implementation on the edge represents a complex task due to the limited computational power and low power consumption requirement that characterize these applications. In this paper, a fully on-chip Convolutional Neural Network Field Programmable Gate Array-based hardware accelerator is presented. This approach allows to reduce power consumption due to off-chip memory accesses and aims to reduce design time. Advantages and limitations of the proposed architecture are discussed and a trade-off analysis is provided to give intuitions about the feasibility of this method

    A low power Voice Activity Detector for portable applications

    No full text
    Voice Activity Detectors (VADs) are used to enhance performances and to reduce the activation rate of speech recognition and key-word spotting applications. The last aspect is crucial for portable applications because it allows to save energy, increasing battery life. During last decades, VADs have been realized through hardware solutions to increase their speed in processing and to reduce their power consumption. However, the hardware implementation often represents a limit on the choice of the features to use, limiting the performances on recognition. This paper shows a low-power and low-area serial logistic regression classffier which uses the frame-energy, the maximum absolute signal finite difference and the maximum absolute squared signal finite difference over a frame as features. The system has been implemented on IGLOO nano Field Programmable Gate Array (FPGA), leading to power consumption of 0.559 mW and offering acceptable performances for its use as a preprocessor for speech recognition systems or a more sophisticated software VAD

    Area and power consumption trade-off for Σ-Δ decimation filter in mixed signal wearable IC

    No full text
    Area and Power consumption are important design metrics in integrated circuit (IC), in particular in those targeted for wearable devices. Σ-Δ Analog to Digital Converter (ADC) are increasing in popularity in those devices thanks to the low bandwidth of a great number of sensors that permits to increase converter performances by the oversampling and noise shaping techniques. One of the most important part of the Σ-Δ ADC is the decimation filter, usually implemented as a Cascaded — Integrator — Comb (CIC). The various CIC architectures, in particular the Recursive and Non recursive — Polyphase ones, are well known in literature. However, filters on-chip performances are strictly related to the effective implementations. The aim of this paper is to evaluate the two architectures, with different values of the characteristic parameters, optimizing the −180 nm CMOS Standard Cell technology — design for a reduced area occupation or power consumption. Results prove that polyphase implementations, differently from theoretical analysis, are generally more power efficient than the recursive one only in a clock gated design, even with a higher area occupation. In addition, an estimation of the power consumption is provided using least squares regression

    Design and quantization limits of root raised cosine digital filter

    No full text
    The Root Raised Cosine digital filter is a widely used pulse-shaping FIR filter in digital baseband communication systems. The design parameters of the filter implementation are strongly bound to the overall performance of the communication system. In this paper, we focus on a design analysis of the filter taking into account the filter band attenuation, the oversampling symbol interpolation, the roll-off factor, the span truncation and the fixed-point quantization of the coefficients to draw an outline strategy of the filter implementation and to show the design performance bounds as function of design parameters. To verify the design limits of the filter a MATLAB numerical investigation is presented, showing the main results. Finally, results of the synthesis on a xc7a15t Xilinx Artix-7 FPGA of a polyphase implementation of the filter were presented

    Towards a deep learning based ASR system for users with dysarthria

    No full text
    In this paper, we investigate the benefits of deep learning approaches for the development of personalized assistive technology solutions for users with dysarthria, a speech disorder that leads to low intelligibility of users’ speaking. It prevents these people from using automatic speech recognition (ASR) solutions on computers and mobile devices. In order to address these issue, our effort is to leverage convolutional neural networks toward a speaker dependent ASR software solution intended for users with dysarthria, which can be trained according to particular user’s needs and preferences

    An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick

    Full text link
    During the last years, convolutional neural networks have been used for different applications, thanks to their potentiality to carry out tasks by using a reduced number of parameters when compared with other deep learning approaches. However, power consumption and memory footprint constraints, typical of on the edge and portable applications, usually collide with accuracy and latency requirements. For such reasons, commercial hardware accelerators have become popular, thanks to their architecture designed for the inference of general convolutional neural network models. Nevertheless, field-programmable gate arrays represent an interesting perspective since they offer the possibility to implement a hardware architecture tailored to a specific convolutional neural network model, with promising results in terms of latency and power consumption. In this article, we propose a full on-chip field-programmable gate array hardware accelerator for a separable convolutional neural network, which was designed for a keyword spotting application. We started from the model implemented in a previous work for the Intel Movidius Neural Compute Stick. For our goals, we appropriately quantized such a model through a bit-true simulation, and we realized a dedicated architecture exclusively using on-chip memories. A benchmark comparing the results on different field-programmable gate array families by Xilinx and Intel with the implementation on the Neural Compute Stick was realized. The analysis shows that better inference time and energy per inference results can be obtained with comparable accuracy at expenses of a higher design effort and development time through the FPGA solution

    Machine learning in assistive technology: A solution for people with dysarthria

    No full text
    Nowadays, dysarthric speech processing represents a challenge in assistive technology contexts. In this paper, we investigate the use of machine learning in conjunction with convolutional neural networks to implement a speaker dependent solution that is capable to detect just a few number of predefined keywords. The proposed system has been trained with utterances from Italian users with severe and mild dysarthria and it is configurable according to specific users' preferences

    Design Optimization for High Throughput Recursive Systematic Convolutional Encoders

    No full text
    Recursive Systematic Convolutional (RSC) codes are the building blocks of the modern communication systems. In this paper we propose a new analytical model to manipulate the modulo-2 algebraic operations and a finite state machine model describing the single-cycle RSC architecture to design high throughput RSC code with special emphasis for parallel implementation and a puncturing scheme embedded in the design. The new design approach is suitable for any RSC code and for almost any degree of parallelism implementations. We also present some case studies about the RSC code architecture and some simulation results for the Bit Error Rate, to compare commonly used RSC codes with different constraints on the length, and redesigned with the proposed methodology
    corecore