1,720,964 research outputs found

    New Adaptive Encoding Schemes for Switching Activity Balancing in On-Chip Buses

    No full text
    Thermal Spreading has shown to be a successful approach to bus temperature minimization. The idea at the basis of this technique is that of periodically permuting the routing of input bitstreams to the various bus lines, with the objective of temporally and spatially distributing the number of transitions over the entire bit-width, thus avoiding high switching activities to occur always on a few lines, which obviously causes an unnatural increase in temperature. In this paper, we propose new encoding schemes which improve the capabilities of the Thermal Spreading approach of balancing the switching activities over the bus wires. The solutions we introduce are adaptive and dynamic in nature, as they select what bitstream goes to what bus line based on the actual bus traffic, thanks to some on-line monitoring capabilities which is offered by some ad-hoc hardware unit which runs in parallel at the transmitting and receiving ends of the bus. The experimental results show that, on average, the proposed encoding schemes improve the transition balancing capabilities of the Thermal Spreading technique by a significant amount

    Exploring the Impact of Architectural Parameters on Energy Efficiency of Application-Specific Block-Enabled SRAMs

    No full text
    Application-Specific Block-Enabled (ASBE) SRAMs represent a viable solution for reducing energy consumption in embedded memories. The basic idea behind ASBE architectures is that of partitioning the memory array into a number of non-uniformly sized blocks, such that memory access cost is reduced. The number and sizes of the partitions yielding a minimum power implementation of the SRAM macro is determined by the partitioning algorithm based on the memory access profile obtained as a result of the application (or application mix) executed by the processor. Given the complexity of the design space we are dealing with, there are several degrees of freedom that the partitioning engine may exploit to come up with the most energy-efficient memory architecture. In this paper, we investigate how the quality of the partitioned memory depends on the architectural parameters that define the memory structure (e.g., min and max number of lines per partition, min and max number of words per line, granularity of the partitions); such parameters, in turn, are constrained by the technology and process of choice. We believe that the results presented in this work will provide very useful guidelines for a succesfull adoption of the ASBE approach in practice, as this design paradigm is gaining a lot of attention for the new generations of embedded systems

    Design and Implementation of a Memory Generator for Low-Energy Application-Specific Block-Enabled SRAMs

    No full text
    Memory partitioning has proved to be a promising solution to reduce energy consumption in complex SoCs. Memory partitioning comes in different flavors, depending on the specific domain of usage and design constraints to be met. In this paper, we consider a technique that allows us to customize the architecture of physically partitioned SRAM macros according to the given application to be executed. We present design solutions for the various components of the partitioned memory architecture, and develop a memory generator for automatically generating layouts and schematics of the optimized memory macros. Experimental results, collected for two different case studies, demonstrate the efficiency of the architecture and the usability of the prototype memory generator. In fact, the achieved energy savings w.r.t. implementations featuring monolithic architectures, are around 43% for a memory macro of 1KByte, and around 45% for a memory macro of 8KByte

    Design Exploration of a Thermal Management Unit for Dynamic Control of Temperature-Induced Clock Skew

    No full text
    Power densities and temperatures in today's high performance circuits have reached alarmingly high levels due to increased scaling in feature sizes. Subsequently, the various techniques used to keep them under control have also created "zones" of varying temperatures, thus contributing to temperature gradients inside the chip. These gradients have detrimental effects on the delay of wires, as resistance in metals increases with temperature. Clock nets are extremely susceptible to this effect, since they run through the entire chip. Different techniques have been proposed to counter the impact of temperature on clock speed; they range from re-designing the clock network assuming a stationary profile to more adaptive solutions that allow to dynamically compensate the clock skew through replacement of the original buffers with a specially designed counterpart, called tunable delay buffers (TDBs). Dynamic skew management based on TDBs calls for the presence on the chip of a thermal management unit (TMU), whose purpose is that of periodically choosing the actual delay that each TDB must provide in order to achieve skew optimization. Preliminary implementations of such a unit for basic assumptions on the distribution of sensors and their accuracy have indicated negligible impact on the original design. This work aims at exploring in detail several issues related to TMU design, pivoting on the fact that sensor distribution and its accuracy could in fact impact the design in a significant way depending on the design. We provide the results of a careful exploration we have performed on a meaningful case study, quantifying values for area and power consumptio

    Thermal Resilient Bounded-Skew Clock-Tree Optimization Methodology

    No full text
    The existence of non-uniform thermal gradients on the substrate in high performance IC's can significantly impact the performance of global on-chip interconnects. This issue is further exacerbated by the aggressive scaling and other factors such as dynamic power management schemes and non-uniform gate level switching activity. In high-performance systems, one of the most important problems is clock skew minimization since it has a direct impact on the maximum operating frequency of the system. Since clocks are routed across the entire chip, the presence of thermal gradients can significantly alter their characteristics because wire resistance increases linearly as the temperature increases. This often results in failure to meet original timing constraints thereby rendering the original topology unusable. Therefore it is necessary to perform a temperature aware re-embedding of the original topology to meet timing under these temperature effects. This work primarily explores these issues by proposing two algorithms that re-structure an existing clock tree topology to compensate for such temperature effects and as a result also meet timing constraint

    Implications of Ultra Low-Voltage Devices on Design Techniques for Controlling Leakage in NanoCMOS Circuits

    No full text
    Enabled by technology scaling, ultra low-voltage devices have now found wide application in modern VLSI circuits. While low-voltage implies reduced dynamic power, it also signifies increased leakage power, as lower supply voltages are usually paired with lower threshold voltages in order to preserve circuit speed. This originates an increase in sub-threshold leakage currents that constitute, today, one of the most serious bottlenecks to further technology and supply voltage scaling. The need of controlling leakage power in nanometric devices is imposing a significant shift in the way integrated circuits are designed and manufactured. The behavior of devices with nanometric feature sizes is much more sensitive to parameters such as the operating temperature of the circuit, which in the past were neglected. In this paper we quantitatively analyze the leakage control capabilities of some well-established circuit-level design techniques, and assess how the effectiveness of such techniques scales with respect to decreased supply voltages (as induced by technology scaling) and temperature variations, thus providing an interesting insight on how leakage control solutions that are in use today is applicable in future design

    Thermal-Aware Design Techniques for Nanometer CMOS Circuits

    No full text
    Increase in chip power density results in higher operating temperatures, and thermal gradients (spatial and temporal) arise due to areas of the die with different power consumption. Thermal variations affect normal operation of nanoelectronic circuits in various dimensions, including reliability, leakage power and delay. And the picture will get more complicated (possibly worse) for CMOS devices with feature size below 45 nm. This paper provides an overview of some of the most recent design and synthesis techniques that will help in reducing the run-time temperature, as well as governing the effects of on-chip thermal gradients in the future generations of CMOS integrated circuit

    Thermal-Aware Clock Tree Design to Increase Timing Reliability of Embedded SoCs

    No full text
    Chip heating and nonuniform distribution of hot and cool zones on the die negatively affect reliability and robustness to failures of nanometer integrated circuits. In fact, signal propagation on interconnects slows down as temperature rises; for long wires crossing regions at different temperatures, such as the clock network, thermally induced delay and skew get altered and may result in timing faults. Failures of this kind are difficult to face due to their transient nature. This paper focuses on clock tree design for the class of embedded systems-on-chip with spatially nonuniform but temporally stationary thermal profiles. We contribute two algorithms for the thermal-aware clock network design that take into account on-chip temperature variations of this nature. The experimental results that we have collected on a number of examples and for different thermal profiles show that, in the presence of on-chip spatial temperature gradients, clock trees designed using a standard methodology incur very significant skew violations, thus originating circuit failures. Instead, clock networks designed using the algorithms presented in this paper always satisfy the initial skew bound
    corecore