1,721,234 research outputs found
Towards efficient code generation for exposed datapath architectures
Coarse-grained reconfigurable architectures and other exposed datapath architectures such as transport-triggered architectures come with a high energy efficiency promise for accelerating data oriented workloads. Their main drawback results from the push of complexity from the architecture to the programmer; compiler techniques that allow starting from a higher-level programming language and generate code efficiently to such architectures robustly is still an open research area. In this article we survey the known main sources of challenges and outline a generic processor architecture template that covers the most common architecture variations along with a proposal for a common code generation framework for such challenging architectures
Reviewing inference performance of state-of-the-art deep learning frameworks
Deep learning models have replaced conventional methods for machine learning tasks. Efficient inference on edge devices with limited resources is key for broader deployment. In this work, we focus on the tool selection challenge for inference deployment. We present an extensive evaluation of the inference performance of deep learning software tools using state-of-the-art CNN architectures for multiple hardware platforms. We benchmark these hardware-software pairs for a broad range of network architectures, inference batch sizes, and floating-point precision, focusing on latency and throughput. Our results reveal interesting combinations for optimal tool selection, resulting in different optima when considering minimum latency and maximum throughput
Exploiting specification modularity to prune the optimization-space of manufacturing systems
In this paper we address the makespan optimization of industrial-sized manufacturing systems. We introduce a framework which species functional system requirements in a compositional way and automatically computes makespan optimal solutions respecting these requirements. We show the optimization problem to be NP-Hard. To scale towards systems of industrial complexity, we propose a novel approach based on a subclass of compositional requirements which we call constraints. We prove that these constraints always prune the worst-case optimization-space thereby increasing the odds of nding an optimal solution (with respect to the additional constraints). We demonstrate the applicability of the framework on an industrial-sized manufacturing system
CIM-SIM: computation in Memory SIMuIator
Computation-in-memory reverses the trend in von-Neumann processors by bringing the computation closer to the data, to even within the memory array, as opposed to introducing new memory hierarchies to keep (frequently used) data closer to a central processing unit (CPU). In recent years, new non-volatile memory (NVM) technologies, e.g., memristor, PCM, etc., have proven that they can function as memories and perform computations on the stored data as well. In particular, when they are combined with a modest set of (digital) peripheral modules, a wider range of operations can be supported, e.g., vector matrix multiply and Boolean logic. In this paper, we are introducing the CIM-SIM, an open source simulator written in SystemC, which is capable of simulating the functional behaviour of such architectures. The architecture includes the definition of a set of technology-agnostic nano-instructions
Memory and parallelism analysis using a platform-independent approach
Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, we extend the state-of-the-art platform-independent software analysis tool with NMC related metrics such as memory entropy, spatial locality, data-level, and basic-block-level parallelism. These metrics help to identify the applications more suitable for NMC architectures
Real-time audio processing for hearing aids using a model-based Bayesian inference framework
Development of hearing aid (HA) signal processing algorithms entails an iterative process between two design steps, namely algorithm development and the embedded implementation. Algorithm designers favor high-level programming languages for several reasons including higher productivity, code readability and, perhaps most importantly, availability of state-of-the-art signal processing frameworks that open new research directions. Embedded software, on the other hand, is preferably implemented using a low-level programming language to allow finer control of the hardware, an essential trait in real-time processing applications. In this paper we present a technique that allows deploying DSP algorithms written in Julia, a modern high-level programming language, on a real-time HA processing platform known as openMHA. We demonstrate this technique by using a model-based Bayesian inference framework to perform real-time audio processing
Data dependent energy modeling for worst case energy consumption analysis
Safely meeting Worst Case Energy Consumption (WCEC) criteria requires accurate energy modeling of software. We investigate the impact of instruction operand values upon energy consumption in cacheless embedded processors. Existing instruction-level energy models typically use measurements from random input data, providing estimates unsuitable for safe WCEC analysis. We examine probabilistic energy distributions of instructions and propose a model for composing instruction sequences using distributions, enabling WCEC analysis on program basic blocks. The worst case is predicted with statistical analysis. Further, we verify that the energy of embedded benchmarks can be characterised as a distribution, and compare our proposed technique with other methods of estimating energy consumption
- …
