1,720,997 research outputs found
Fast and Accurate Inference on Microcontrollers with Boosted Cooperative Convolutional Neural Networks (BC-Net)
Arithmetic precision scaling is mandatory to deploy Convolutional Neural Networks (CNNs) on resource-constrained devices such as microcontrollers (MCUs), and quantization via fixed-point or binarization are the most adopted techniques today. Despite being born by the same concept of bit-width lowering, these two strategies differ substantially each other, and hence are often conceived and implemented separately. However, their joint integration is feasible and, if properly implemented, can bring to large savings and high processing efficiency. This work elaborates on this aspect introducing a boosted collaborative mechanism that pushes CNNs towards higher performance and more predictive capability. Referred as BC-Net, the proposed solution consists of a self-adaptive conditional scheme where a lightweight binary net and an 8-bit quantized net are trained to cooperate dynamically. Experiments conducted on four different CNN benchmarks deployed on off-the-shelf boards powered with the MCUs of the Cortex-M family by ARM show that BC-Nets outperform classical quantization and binarization when applied as separate techniques (up to 81.49% speed-up and up to 3.8% of accuracy improvement). The comparative analysis with a previously proposed cooperative method also demonstrates BC-Nets achieve substantial savings in terms of both performance (+19%) and accuracy (+3.45%)
Optimization Tools for ConvNets on the Edge
The shift of Convolutional Neural Networks (ConvNets) into low-power devices with limited compute and memory resources calls for cross-layer strategies spanning from hardware to software optimization. This work answers to this need, presenting a collection of tools for efficient deployment of ConvNets on the edge
On The Efficiency of Sparse-Tiled Tensor Graph Processing For Low Memory Usage
The memory space taken to host and process large tensor graphs is a limiting factor for embedded ConvNets. Even though many data-driven compression pipelines have proven their efficacy, this work shows there is still room for optimization at the intersection with compute-oriented optimizations. We demonstrate that tensor pruning via weight sparsification can cooperate with a model-agnostic tiling strategy, leading ConvNets towards a new feasible region of the solution space. The collected results show for the first time fast versions of MobileNets deployed at full scale on an ARM M7 core with 512KB of RAM and 2MB of FLASH memory
Inferential Logic: A Machine Learning Inspired Paradigm for Combinational Circuits
Machine learning (ML) theories and tools suggest alternative forms to conceive and represent relationships among data. The same theories find their application in the Boolean domain, where logic functions can be described as inference rules. This paper introduces Inferential Logic, a novel paradigm that leverages the ML concept of statistical inference for the design of combinational logic circuits, the Inferential Logic Circuits (ILCs). This new design concept is conceived for low-power circuits that run quasi-exact computation in error-resilient applications, but it also provides an exact run-mode that can be dynamically enabled when accuracy scaling is not an option
Energy-efficient and Privacy-aware Social Distance Monitoring with Low-resolution Infrared Sensors and Adaptive Inference
Low-resolution infrared (IR) Sensors combined with machine learning (ML) can be leveraged to implement privacy-preserving social distance monitoring solutions in indoor spaces. However, the need of executing these applications on Internet of Things (IoT) edge nodes makes energy consumption critical. In this work, we propose an energy-efficient adaptive inference solution consisting of the cascade of a simple wake-up trigger and a 8-bit quantized Convolutional Neural Network (CNN), which is only invoked for difficult-to-classify frames. Deploying such adaptive system on a IoT Microcontroller, we show that, when processing the output of a 8×8 low-resolution IR sensor, we are able to reduce the energy consumption by 37-57% with respect to a static CNN-based approach, with an accuracy drop of less than 2% (83% balanced accuracy)
Efficacy of topology scaling for temperature and latency constrained embedded convnets
Embedded Convolutional Neural Networks (ConvNets) are driving the evolution of ubiquitous systems that can sense and understand the environment autonomously. Due to their high complexity, aggressive compression is needed to meet the specifications of portable end-nodes. A variety of algorithmic optimizations are available today, from custom quantization and filter pruning to modular topology scaling, which enable fine-tuning of the hyperparameters and the right balance between quality, performance and resource usage. Nonetheless, the implementation of systems capable of sustaining continuous inference over a long period is still a primary source of concern since the limited thermal design power of general-purpose embedded CPUs prevents execution at maximum speed. Neglecting this aspect may result in substantial mismatches and the violation of the design constraints. The objective of this work was to assess topology scaling as a design knob to control the performance and the thermal stability of inference engines for image classification. To this aim, we built a characterization framework to inspect both the functional (accuracy) and non-functional (latency and temperature) metrics of two ConvNet models, MobileNet and MnasNet, ported onto a commercial low-power CPU, the ARM Cortex-A15. Our investigation reveals that different latency constraints can be met even under continuous inference, yet with a severe accuracy penalty forced by thermal constraints. Moreover, we empirically demonstrate that thermal behavior does not benefit from topology scaling as the on-chip temperature still reaches critical values affecting reliability and user satisfaction
Dataflow Restructuring for Active Memory Reduction in Deep Neural Networks
The volume reduction of the activation maps produced by the hidden layers of a Deep Neural Network (DNN) is a critical aspect in modern applications as it affects the on-chip memory utilization, the most limited and costly hardware resource. Despite the availability of many compression methods that leverage the statistical nature of deep learning to approximate and simplify the inference model, e.g., quantization and pruning, there is room for deterministic optimizations that instead tackle the problem from a computational view. This work belongs to this latter category as it introduces a novel method for minimizing the active memory footprint. The proposed technique, which is data-, model-, compiler-, and hardware-agnostic, does implement a functional-preserving, automated graph restructuring where the memory peaks are suppressed and distributed over time, leading to flatter profiles with less memory pressure. Results collected on a representative class of Convolutional DNNs with different topologies, from Vgg16 and SqueezeNetV1.1 to the recent MobileNetV2, ResNet18, and InceptionV3, provide clear evidence of applicability, showing remarkable memory savings (62.9% on average) with low computational overhead (8.6% on average)
CoopNet: Cooperative convolutional neural network for low-power MCUs
Fixed-point quantization and binarization are two reduction methods adopted to deploy Convolutional Neural Networks (CNN) on end-nodes powered by low-power micro-controller units (MCUs). While most of the existing works use them as stand-alone optimizations, this work aims at demonstrating there is margin for a joint cooperation that leads to inferential engines with lower latency and higher accuracy. Called CoopNet, the proposed heterogeneous model is conceived, implemented and tested on off-the-shelf MCUs with small on-chip memory and few computational resources. Experimental results conducted on three different CNNs using as test-bench the low-power RISC core of the Cortex-M family by ARM validate the CoopNet proposal by showing substantial improvements w.r.t. designs where quantization and binarization are applied separately
Energy-Driven Precision Scaling for Fixed-Point ConvNets
Data precision scaling is a well-known technique for power/energy minimization in error-resilient applications. It has proven particularly suited for embedded Convolutional Neural Networks (ConvNets) made run on fixed-point arithmetic coprocessors. The key observation is that methods that only account for accuracy during the precision assignment process may lead to sub-optimal energy minimization. This work introduces an energy-driven optimization that delivers per-layer quantization under a user-defined accuracy constraint. The tool is conceived for accelerators that dynamically adapt their energy and accuracy through software-programmable multiprecision Multiply&Accumulate (MAC) units. Simulation results collected on different ConvNets trained with public data-set show substantial energy savings and improved energy-accuracy tradeoffs w.r.t. conventional fixed-point methods
Axp: A hw-sw co-design pipeline for energy-efficient approximated convnets via associative matching
The reduction in energy consumption is key for deep neural networks (DNNs) to ensure usability and reliability, whether they are deployed on low-power end-nodes with limited resources or high-performance platforms that serve large pools of users. Leveraging the over-parametrization shown by many DNN models, convolutional neural networks (ConvNets) in particular, energy efficiency can be improved substantially preserving the model accuracy. The solution proposed in this work exploits the intrinsic redundancy of ConvNets to maximize the reuse of partial arithmetic results during the inference stages. Specifically, the weight-set of a given ConvNet is discretized through a clustering procedure such that the largest possible number of inner multiplications fall into predefined bins; this allows an off-line computation of the most frequent results, which in turn can be stored locally and retrieved when needed during the forward pass. Such a reuse mechanism leads to remarkable energy savings with the aid of a custom processing element (PE) that integrates an associative memory with a standard floating-point unit (FPU). Moreover, the adoption of an approximate associative rule based on a partial bit-match increases the hit rate over the pre-computed results, maximizing the energy reduction even further. Results collected on a set of ConvNets trained for computer vision and speech processing tasks reveal that the proposed associative-based hw-sw co-design achieves up to 77% in energy savings with less than 1% in accuracy loss
- …
