1,721,512 research outputs found

    An End-to-End Embedded Neural Architecture Search and Model Compression Framework for Healthcare Applications and Use-Cases

    No full text
    Deep learning has had a major impact in a wide range of research domains across the world, including healthcare and medicine. From aiding radiologists, by acting as clinical assistants, to analyzing electronic health records, deep learning models have proved to be beneficial in identifying health abnormalities and aiding diagnostics. This chapter discusses a framework that can be used to explore the design space of embedded neural network models for healthcare applications and use-cases, given the user quality requirements, such as accuracy or precision, and hardware constraints of the target execution platform. The models explored by the framework are successful in reducing the hardware overhead of network by a factor of 53 × while achieving a quality loss of <0.2% compared to state of the art

    Embedded Neuromorphic Using Intel’s Loihi Processor

    No full text
    Recently, spiking neural networks (SNNs) have demonstrated great success due to their high-performance and low-energy consumption, which makes them suitable for being implemented on embedded devices, such as neuromorphic chips. This chapter presents an overview of event-based SNNs on neuromorphic hardware and their applications. It provides outlooks on the neuromorphic computing platforms, with a special focus on the Intel Loihi research chip. Afterward, a case study on a “car vs. background” classifier implemented on Loihi is discussed in detail

    A Design Methodology for Energy-Efficient Embedded Spiking Neural Networks

    No full text
    Spiking Neural Networks (SNNs) bear the potential for achieving high accuracy with unsupervised learning settings and ultra-low-energy consumption due to their bio-plausible sparse computations. The unsupervised learning capabilities enable the SNNs to efficiently learn unlabeled data, which is desired for real-world applications, as gathering unlabeled data is cheaper than the labeled one. These advantages make SNNs suitable for solving Machine Learning (ML) tasks on resource- and energy-constrained embedded platforms. However, state-of-the-art SNN models require large memory and high energy consumption to achieve high accuracy, thereby making it challenging to employ SNNs on embedded platforms. In this chapter, we discuss our design methodology to improve the energy efficiency of SNNs for enabling their embedded implementations, while maintaining accuracy through unsupervised learning settings and meeting the memory and energy constraints. The key ideas of our design methodology are reducing the neuron operations, improving the learning quality, quantizing the network parameters, and employing approximate DRAM while considering the memory and energy budgets

    Adversarial ML for DNNs, CapsNets, and SNNs at the Edge

    No full text
    Recent studies have shown that Machine Learning (ML) algorithm suffers from several vulnerability threats. Among them, adversarial attacks represent one of the most critical issues. This chapter provides an overview of the ML vulnerability challenges, with a focus on the security threats for Deep Neural Networks, Capsule Networks, and Spiking Neural Networks. Moreover, it discusses the current trends and outlooks on the methodologies for enhancing the ML models’ robustness

    An Off-Chip Memory Access Optimization for Embedded Deep Learning Systems

    No full text
    Implementations of Deep Neural Networks (DNNs) or Deep Learning (DL) for embedded applications may improve the users’ quality of life, as DL has become a prominent solution for many machine learning (ML) tasks, like personalized healthcare assistance. Such implementations require high energy efficiency since embedded applications usually have tight operational constraints, such as small memory and low operational power/energy. Therefore, specialized hardware accelerators are typically employed to expedite the DL inference. However, previous works have shown that DL accelerators still suffer from high energy consumption from the DRAM-based off-chip memory accesses, thereby hindering the embedded DL implementations. In this chapter, we discuss our design methodology for optimizing the energy consumption of DRAM accesses for the DL accelerators targeting embedded applications. Our design methodology employs an exploration technique to find the data partitioning and scheduling that offer minimum DRAM accesses for the given DNN model and exploits the low latency DRAMs to efficiently perform data accesses that incur minimum DRAM access energy

    Considering the Impact of Noise on Machine Learning Accuracy

    No full text
    Modern day smart cyber-physical systems (CPS) and Internet of Things (IoTs), including those deployed in critical devices such as wearables, often use embedded machine learning (ML). Owing to the consistent improvement in the overall performance of artificial neural networks (ANNs), the reliance of these systems on ANNs as an integral component has seen a constant rise. However, ANNs are known to be considerably vulnerable to noise. This, along with the noise being a ubiquitous component of the real-world environment, jeopardizes the accuracy of embedded ML-based systems. This calls for analyzing the impacts of noise on ANNs prior to their deployment in real-world ML-based system, to ensure acceptable ML accuracy. This chapter deals with the issue of analyzing the impacts of noise on trained ANNs. Multiple approaches for studying the impacts and possible noise models are discussed. Various impacts of noise, along with their formalization, on trained ANNs are elaborated. The chapter also provides a suitable framework for analyzing the impacts of noise. To demonstrate the impact of noise on an ANN trained on real-world data quantitatively, the framework is then used for the analysis of a binary classifier trained on genetic attributes of Leukemia patients

    Massively Parallel Neural Processing Array (MPNA): A CNN Accelerator for Embedded Systems

    No full text
    Many hardware accelerators for Convolutional Neural Networks (CNNs) focus on accelerating only the convolutional layers but do not prioritize accelerating the fully connected layers. Therefore, they lack a synergistic optimization of the hardware architecture and various dataflows for the complete CNN model, hence hindering the accelerators from achieving higher performance/energy efficiency. Such problems are more challenging when the CNN acceleration is performed for resource- and energy-constrained embedded systems. Toward this, we propose a novel Massively Parallel Neural Processing Array (MPNA) accelerator that integrates two heterogeneous systolic arrays and highly optimized dataflows to expedite both the convolutional and fully connected layers. Our optimized dataflows fully exploit the available off-chip memory bandwidth and data reuse of all data types (i.e., weights, input and output activations), thereby enabling our MPNA to operate under low power, while achieving high performance and energy efficiency. We synthesize our MPNA accelerator using the ASIC design flow for a 28-nm technology and perform functional and timing validation using real-world CNNs. Our MPNA achieves 149.7GOPS/W at 280 MHz and consumes 239 mW. The experimental results show that our MPNA accelerator provides up to 2× performance improvement and 51% energy saving compared to the baseline accelerator, thereby making our MPNA suitable for embedded systems

    AccelAT: A Framework for Accelerating the Adversarial Training of Deep Neural Networks through Accuracy Gradient

    Full text link
    Adversarial training is exploited to develop a robust Deep Neural Network (DNN) model against the malicious altered data. These attacks may have catastrophic effects on DNN models but are indistinguishable for a human being. For example, an external attack can modify an image adding noises invisible for a human eye, but a DNN model misclassifies the image. A key objective for developing robust DNN models is to use a learning algorithm that is fast but can also give model that is robust against different types of adversarial attacks. Especially for adversarial training, enormously long training times are needed for obtaining high accuracy under many different types of adversarial samples generated using different adversarial attack techniques. This paper aims at accelerating the adversarial training to enable fast development of robust DNN models against adversarial attacks. The general method for improving the training performance is the hyperparameters fine-tuning, where the learning rate is one of the most crucial hyperparameters. By modifying its shape (the value over time) and value during the training, we can obtain a model robust to adversarial attacks faster than standard training. First, we conduct experiments on two different datasets (CIFAR10, CIFAR100), exploring various techniques. Then, this analysis is leveraged to develop a novel fast training methodology, AccelAT , which automatically adjusts the learning rate for different epochs based on the accuracy gradient. The experiments show comparable results with the related works, and in several experiments, the adversarial training of DNNs using our AccelAT framework is conducted up to 2×2\times faster than the existing techniques. Thus, our findings boost the speed of adversarial training in an era in which security and performance are fundamental optimization objectives in DNN-based applications. To facilitate reproducible research this is the AccelAT open-source framework: https://github.com/Nikfam/AccelAT

    ISMatch: A real-time hardware accelerator for inexact string matching of DNA sequences on FPGA

    No full text
    Since DNA strings suffer from variations like mutation, noisy sampling, and transmission, instead of searching for the exact match, the inexact string matching (ISM) of DNA sequences is preferred. Due to the large amount of data and massive data-dependency, the ISM algorithm is not suitable for being implemented into a general-purpose hardware. Towards this, we propose ISMatch, a novel specialized hardware architecture for computing the ISM in a fast and energy-efficient way. Our implementation on a Xilinx Ultrascale+ FPGA shows up to 70× and 2.2× clock cycles reduction compared to the ARM-based and the HLS implementations, respectively

    LaneSNNs: Spiking Neural Networks for Lane Detection on the Loihi Neuromorphic Processor

    Full text link
    Autonomous Driving (AD) related features represent important elements for the next generation of mobile robots and autonomous vehicles focused on increasingly intelligent, autonomous, and interconnected systems. The applications involving the use of these features must provide, by definition, real-time decisions, and this property is key to avoid catastrophic accidents. Moreover, all the decision processes must require low power consumption, to increase the lifetime and autonomy of battery-driven systems. These challenges can be addressed through efficient implementations of Spiking Neural Networks (SNNs) on Neuromorphic Chips and the use of event-based cameras instead of traditional frame-based cameras.In this paper, we present a new SNN-based approach, called LaneSNN, for detecting the lanes marked on the streets using the event-based camera input. We develop four novel SNN models characterized by low complexity and fast response, and train them using an offline supervised learning rule. Afterward, we implement and map the learned SNNs models onto the Intel Loihi Neuromorphic Research Chip. For the loss function, we develop a novel method based on the linear composition of Weighted binary Cross Entropy (WCE) and Mean Squared Error (MSE) measures. Our experimental results show a maximum Intersection over Union (IoU) measure of about 0.62 and very low power consumption of about 1 W. The best IoU is achieved with an SNN implementation that occupies only 36 neurocores on the Loihi processor while providing a low latency of less than 8 ms to recognize an image, thereby enabling real-time performance. The IoU measures provided by our networks are comparable with the state-of-the-art, but at a much low power consumption of 1 W
    corecore