1,720,980 research outputs found

    3D NoCs — Unifying inter & intra chip communication

    No full text
    Networks-on-chip have been developed in the last few years to address the scalability challenges of global on-chip communication. VLSI technology is now rapidly moving into vertical stacking to overcome fundamental communication and integration bottlenecks, however this technology is not mature yet, and significant reliability challenges must be overcome. In this paper we describe our effort in establishing a 3DNoC design flow and in designing circuits and architectural solutions for variability and reliability characterization and tolerance

    MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect

    Full text link
    A key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 cores) configurations is to ensure low-latency and efficient access to the L1 memory. In this work we demonstrate that it is possible to scale up the shared-L1 architecture: We present MemPool, a 32 bit many-core system with 256 fast RV32IMA 'Snitch' cores featuring application-tunable execution units, running at 700 MHz in typical conditions (TT/0.80 V/25 °C). MemPool is easy to program, with all the cores sharing a global view of a large L1 scratchpad memory pool, accessible within at most 5 cycles. In MemPool's physical-aware design, we emphasized the exploration, design, and optimization of the low-latency processor-to-L1-memory interconnect. We compare three candidate topologies, analyzing them in terms of latency, throughput, and back-end feasibility. The chosen topology keeps the average latency at fewer than 6 cycles, even for a heavy injected load of 0.33 request/core/cycle. We also propose a lightweight addressing scheme that maps each core private data to a memory bank accessible within one cycle, which leads to performance gains of up to 20 % in real-world signal processing benchmarks. The addressing scheme is also highly efficient in terms of energy consumption since requests to local banks consume only half of the energy required to access remote banks. Our design achieves competitive performance with respect to an ideal, non-implementable full-crossbar baseline

    Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

    Full text link
    This paper presents Mr. Wolf, a parallel ultra-low power (PULP) system on chip (SoC) featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class RISC-V core augmented with an autonomous IO subsystem for efficient data transfer from a wide set of peripherals. The small core can offload compute-intensive kernels to an eight-core floating-point capable of processing engine available on demand. The proposed SoC, implemented in a 40-nm LP CMOS technology, features a 108-mu W fully retentive memory (512 kB). The IO subsystem is capable of transferring up to 1.6 Gbit/s from external devices to the memory in less than 2.5 mW. The eight-core compute cluster achieves a peak performance of 850 million of 32-bit integer multiply and accumulate per second (MMAC/s) and 500 million of 32-bit floating-point multiply and accumulate per second (MFMAC/s) -1 GFlop/s-with an energy efficiency up to 15 MMAC/s/mW and 9 MFMAC/s/mW. These building blocks are supported by aggressive on-chip power conversion and management, enabling energy-proportional heterogeneous computing for always-on IoT end nodes improving performance by several orders of magnitude with respect to traditional single-core MCUs within a power envelope of 153 mW. We demonstrated the capabilities of the proposed SoC on a wide set of near-sensor processing kernels showing that Mr. Wolf can deliver performance up to 16.4 GOp/s with energy efficiency up to 274 MOp/s/mW on real-life applications, paving the way for always-on data analytics on high-bandwidth sensors at the edge of the Internet of Things

    Automatic synthesis of near-threshold circuits with fine-grained performance tunability

    No full text
    Near-Threshold Circuits achieve ultra-low energy operating with significant performance improvement and noise immunity as compared to sub-threshold circuits. However, near-threshold circuit performance is highly sensitive to static and dynamic threshold voltage variations. This makes designing circuits for a target performance very difficult, and post-silicon tunability is required to achieve performance targets without taking huge design margins. In this work, we tackle this problem by proposing a novel dual-Vdd technique for near-threshold operation and show that one can tune the performance of a circuit in a fine-grained manner by powering an optimal sub-set of rows with a slightly higher supply voltage than the rest, without incurring the large cost of distributed level shifters. By varying the percentage of rows at a slightly higher voltage, one can trade-off performance and power in a fine-grained manner. Experimental results show that by employing our dual-Vdd technique, we can improve the performance of several benchmarks up-to 45% while achieving more than 50% lower power as compared to single-Vdd implementations

    Independent body-biasing of P-N transistors in an 28nm UTBB FD-SOI ULP near-threshold multi-core cluster

    No full text
    Advanced Ultra-Low Power (ULP) computing platforms can be affected by large performance variations. This phenomenon is mainly caused by process and ambient temperature variations, and it is magnified by the strong Temperature Effect Inversion (TEI) that characterizes devices when operating Near-Threshold (NT) in highly scaled nodes. 28nm UTBB FD-SOI technology supports an extended range of both forward and reverse Body-Bias (BB) voltage. This feature can be efficiently used to reduce margins at design time and compensate variations at runtime. In this paper we propose a BB voltage controller capable to independently probe the maximum frequency of P and N transistors, and leverage a BB voltage adjustment to achieve a user-specified target frequency, minimizing the leakage current. Compared to the case where zero BB is applied to the transistors, the controller achieves up to 23% power reduction exploiting the performance increase originated by TEI, further reducing power by 12% with respect to a symmetric BB approach

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    A Floorplan-aware Interactive Tool Flow for NoC Design and Synthesis

    No full text
    In this paper we present a floorplan-aware toolchain for NoC design and synthesis integrated with a graphical front-end. The resulting design methodology is highly automated yet entails rich interaction with the user, spanning across traffic flow specification, topology synthesis and physical floorplanning, with back-annotation capabilities and opportunities for incremental design. We exploit the proposed tool to implement some NoC-based case studies. We show that not only a great amount of time and effort can be saved thanks to the easy-to-use proposed environment, but also that the quality of the final netlist improves due to the optimizations unlocked by the early-stage interaction among the designer and the proposed toolchain

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore