1,721,069 research outputs found

    On-Chip Transparent Wire Pipelining (invited paper)

    Full text link
    Wire pipelining has been proposed as a viable mean to break the discrepancy between decreasing gate delays and increasing wire delays in deep-submicron technologies. Far from being a straightforwardly applicable technique, this methodology requires a number of design modifications in order to insert it seamlessly in the current design flow. In this paper we briefly survey the methods presented by other researchers in the field and then we thoroughly analyze the solutions we recently proposed, ranging from system-level wire pipelining to physical design aspects

    Multi-objective Framework for Training and Hardware Co-optimization in FPGAs

    Full text link
    Although several works have recently addressed the problem of performance co-optimization for hardware and network training for Convolutional Neural Networks, most of them considered either a fixed network or a given hardware architecture. In this work, we propose a new framework for joint optimization of network architecture and hardware configurations based on Bayesian Optimization (BO) on top of High Level Synthesis. The multi-objective nature of this framework allows for the definition of various hardware and network performance goals as well as multiple constraints, and the multi-objective BO allows to easily obtain a set of Pareto points. We evaluate our methodology on a network optimized for an FPGA target and show that the Pareto set obtained by the proposed joint-optimization outperforms other methods based on a separate optimization or random search

    Adaptive Latency Insensitive Protocols

    Full text link
    Latency-insensitive design copes with excessive delays typical of global wires in current and future IC technologies. It achieves its goal via encapsulation of synchronous logic blocks in wrappers that communicate through a latency-insensitive protocol (LIP) and pipelined interconnects. Previously proposed solutions suffer from an excessive performance penalty in terms of throughput or from a lack of generality. This article presents an adaptive LIP that outperforms previous static implementations, as demonstrated by two relevant cases — a microprocessor and an MPEG encoder — whose components we made insensitive to the latencies of their interconnections through a newly developed wrapper. We also present an informal exposition of the theoretical basis of adaptive LIPs, as well as implementation detail

    A Reconfigurable 2D-Convolution Accelerator for DNNs Quantized with Mixed-Precision

    Full text link
    Mixed-precision uses in each layer of a Deep Neural Network the minimum bit-width that preserves accuracy. In this context, our new Reconfigurable 2D-Convolution Module (RCM) computes N =1, 2 or 4 Multiply-and-Accumulate operations in parallel with configurable precision from 1 to 16/N bits. Our design-space exploration via high-level synthesis obtains the best points in the latency vs area space, varying the size of the tensor tile handled by our RCM and its parallelism. A comparison with a non-configurable module on a 28-nm technology shows many reconfigurable Pareto points for low bit-width configurations, making our RCM a promising mixed-precision accelerator for inference

    A new system design methodology for wire pipelined SoC

    Full text link
    Wire Pipelining (WP) has been proposed in order to limit the impact of increasing wire delays. In general, the added pipeline elements alters the system such that architectural changes are needed to preserve functionality. We illustrate a proposal that, while allowing the use of IP blocks without modification, takes advantage of a minimal knowledge of the IP's communication profile to dramatically increase the performances. We showed the formal equivalence between WP and original system and proved the higher performance achievable through a relevant case study
    corecore