1,720,974 research outputs found

    Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

    Full text link
    Mixed-precision quantization, where a deep neural network’s layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the in- tractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6 % reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency

    J/inference end-to-end gesture recognition from dynamic vision sensor data using ternarized hybrid convolutional neural networks

    Full text link
    Dynamic vision sensor (DVS) cameras enable energy-activity proportional visual sensing by only propagating events produced by changes in the observed scene. Furthermore, by generating these events asynchronously, they offer s-scale latency while eliminating the redundant data transmission inherent to classical, frame-based cameras. However, the potential of DVS to improve the energy efficiency of IoT sensor nodes can only be fully realized with efficient and flexible systems that tightly integrate sensing, processing, and actuation capabilities. In this paper, we propose a complete end-to-end pipeline for DVS event data classification implemented on the Kraken parallel ultra-low power (PULP) system-on-chip and apply it to gesture recognition. A dedicated on-chip peripheral interface for DVS cameras aggregates the received events into ternary event frames. We process these video frames with a fully ternarized two-stage temporal convolutional network (TCN). The neural network can be executed either on Kraken’s PULP cluster of general-purpose RISC-V cores or on CUTIE, the on-chip ternary neural network accelerator. We perform extensive ablations on network structure, training, and data generation parameters. We achieve a validation accuracy of 97.7 % on the DVS128 11-class gesture dataset, a new record for embedded implementations. With in-silicon power and energy measurements, we demonstrate a classification energy of 7 J at a latency of 0.9 ms when running the TCN on CUTIE, a reduction of inference energy by when compared to the state of the art in embedded gesture recognition. The processing system consumes as little as 4.7 mW in continuous inference, enabling always-on gesture recognition and closing the gap between the efficiency potential of DVS cameras and application scenarios

    ColibriUAV: An Ultra-Fast, Energy-Efficient Neuromorphic Edge Processing UAV-Platform with Event-Based and Frame-Based Cameras

    No full text
    The interest in dynamic vision sensor (DVS)powered unmanned aerial vehicles (UAV) is raising, especially due to the microsecond-level reaction time of the bio-inspired event sensor, which increases robustness and reduces latency of the perception tasks compared to a RGB camera. This work presents ColibriUAV, a UAV platform with both frame-based and event-based cameras interfaces for efficient perception and near-sensor processing. The proposed platform is designed around Kraken, a novel low-power RISC-V System on Chip with two hardware accelerators targeting spiking neural networks and deep ternary neural networks.Kraken is capable of efficiently processing both event data from a DVS camera and frame data from an RGB camera. A key feature of Kraken is its integrated, dedicated interface with a DVS camera. This paper benchmarks the end-to-end latency and power efficiency of the neuromorphic and event-based UAV subsystem, demonstrating state-of-the-art event data with a throughput of 7200 frames of events per second and a power consumption of 10.7 mW, which is over 6.6 times faster and a hundred times less power-consuming than the widely-used data reading approach through the USB interface. The overall sensing and processing power consumption is below 50 mW, achieving latency in the milliseconds range, making the platform suitable for low-latency autonomous nano-drones as well

    ColibriES: A Milliwatts RISC-V Based Embedded System Leveraging Neuromorphic and Neural Networks Hardware Accelerators for Low-Latency Closed-loop Control Applications

    Full text link
    End-to-end event-based computation has the potential to push the envelope in latency and energy efficiency for edge AI applications. Unfortunately, event-based sensors (e.g., DVS cameras) and neuromorphic spike-based processors (e.g., Loihi) have been designed in a decoupled fashion, thereby missing major streamlining opportunities. This paper presents ColibriES, the first-ever neuromorphic hardware embedded system platform with dedicated event-sensor interfaces and full processing pipelines. ColibriES includes event and frame interfaces and data processing, aiming at efficient and long-life embedded systems in edge scenarios. ColibriES is based on the Kraken system-on-chip and contains a heterogeneous parallel ultra-low power (PULP) processor, frame-based and event-based camera interfaces, and two hardware accelerators for the computation of both event-based spiking neural networks and frame-based ternary convolutional neural networks. This paper explores and accurately evaluates the performance of event data processing on the example of gesture recognition on ColibriES, as the first step of full-system evaluation. In our experiments, we demonstrate a chip energy consumption of 7.7 \si{\milli\joule} and latency of 164.5 \si{\milli\second} of each inference with the DVS Gesture event data set as an example for closed-loop data processing, showcasing the potential of ColibriES for battery-powered applications such as wearable devices and UAVs that require low-latency closed-loop control

    TCN-CUTIE: A 1,036-TOp/s/W, 2.72-μJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology

    No full text
    Tiny machine learning (TinyML) applications impose μJ/inference constraints, with a maximum power consumption of tens of megawatt. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital ternary neural network (TNN) accelerator in a reduced instruction set computer-five (RISC-V)-based System-on-Chip (SoC). Besides supporting ternary convolutional neural networks, we introduce extensions to the accelerator design that enable the processing of time-dilated temporal convolutional neural networks (TCNs). The design achieves 5.5-μJ/inference, 12.2 mW, 8,000 inferences/s at 0.5 V for a dynamic vision sensor (DVS)-based TCN and an accuracy of 94.5%, and 2.72-μJ/inference, 12.2 mW, 3,200 inferences/s at 0.5 V for a nontrivial 9-layer, 96 channels-per-layer convolutional network with CIFAR-10 accuracy of 86%. The peak energy efficiency is 1,036 TOp/s/W, outperforming the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67× while achieving competitive accuracy

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore