1,720,973 research outputs found

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    No full text
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    Variation tolerant NoC design by means of self-calibrating links

    No full text
    We present the implementation and analysis of a variation tolerant version of a switch-to-switch link in a NoC. The goal is to tolerate the effects of process variations on NoC architectures using self-correcting links that automatically detect delay variations and compensate them. The correction is applied without increasing the switch-to-switch latency by substituting the output flip-flops of the sending switch with a self-correcting flip-flop followed by an adaptive voltage swing selector. Higher delay variations will result in a smaller slack in the switch-to-switch path, but the adaptive voltage swing selector could mitigate its impact on the NoC communication by increasing the voltage swing on the link, thus allowing a compensation of the delay variation. As a result, it is possible to tolerate delay variations at the cost of additional power consumption

    Performance, Area and Power Breakdown Analysis for NoC switches in 65nm technology

    No full text
    Leveraging a development effort of the synthesis backend for Networks-on-Chip (NoCs), this work contributes an analysis of the performance, area and power/energy breakdown of NoC switches in 65nm technology. In particular, an architecture-level technique (control- and data-path decoupling) is deployed to derive switch implementation variants optimized for different design objectives, and is then validated against placement-aware logic synthesis

    Tight Integration of GALS Interfaces into the NoC Architecture

    No full text
    This poster illustrates deep integration of of the synchronizer in the switch architecture of networks-on-chip, thus merging key tasks such as synchronization, buffering and flow control into a unique architecture block. The poster compares the integrated and the loosely coupled solutions from a performance and area viewpoint, while devoting special attention to their robustness with respect to physical design parameters

    Efficient Implementation of Distributed Routing Algorithms for NoCs

    No full text
    Chip multiprocessors (CMPs) are gaining momentum in the high-performance computing domain. Networks-on-chip (NoCs) are key components of CMP architectures, in that they have to deal with the communication scalability challenge while meeting tight power, area and latency constraints. 2D mesh topologies are usually preferred by designers of general purpose NoCs. However, manufacturing faults may break their regularity. Moreover, resource management frameworks may require the segmentation of the network into irregular regions. Under these conditions, efficient routing becomes a challenge. Although the use of routing tables at switches is flexible, it does not scale in terms of latency and area due to its memory requirements. Logic-based distributed routing (LBDR) is proposed as a new routing method that removes the need for routing tables at all. LBDR enables the implementation of many routing algorithms on most of the practical topologies we may find in the near future in a multi-core system. From an initial topology and routing algorithm, a set of three bits per switch/output port is computed. Evaluation results show that, by uysing a small logic, LBDR mimics the performance of routing algorithms when implemented with routing tables, both in regular and irregular topologies. LBDR implementation in a real NoC switch is also explored, proving its smooth integration in the architecture and its negligible hardware and performance overhead

    Network Interface Sharing Techniques for Area Optimized NoC Architectures

    No full text
    Although preliminary analysis frameworks point out the performance speed-ups achievable by on-chip networks with respect to state-of-the-art interconnects, the area concern remains one of the most daunting challenges to make this interconnect technology mainstream. A common approach to relieve the problem consists of sharing most of network interface resources among a number of processor cores. However, buffering resources need to be replicated and control logic reaches a complexity that limits maximum achievable frequency. This paper proposes full sharing of network interface resources, including buffers, thus trading performance for area. While area improvements are significant, a number of physical and system-level effects might mitigate performance degradation, making our technique a promising solution for area efficient network-on-chip realizations across a range of operating conditions

    Improved Utilization of NoC Channel Bandwidth by Switch Replication for Cost-Effective Multi-processor Systems-on-Chip

    No full text
    Virtual channels are an appealing flow control technique for on-chip interconnection networks (NoCs), in that they can potentially avoid deadlock and improve link utilization and network throughput. However, their use in the resource constrained multi-processor system-on-chip (MPSoC) domain is still controversial, due to their significant overhead in terms of area, power and cycle time degradation. This paper proposes a simple yet efficient approach to VC implementation, which results in more area- and power-saving solutions than conventional design techniques. While these latter replicate only buffering resources for each physical link, we replicate the entire switch and prove that our solution is counter intuitively more area/power efficient while potentially operating at higher speeds. This result builds on a well-known principle of logic synthesis for combinational circuits (the area-performance trade-off when inferring a logic function into a gate-level netlist), and proves that when a designer is aware of this, novel architecture design techniques can be conceived

    Power-Optimal RTL Arithmetic Unit Soft-Macro Selection Strategy for Leakage-Sensitive Technologies

    No full text
    With the advent of nanoscale technologies, developing power efficient ASICs increasingly requires consideration of static power. An effective approach to make RTL synthesis algorithms and tools leakage-aware consists of the smart inference of RTL macros based on design constraints and optimization directives. This involves exploring the new trade-offs spanned by the design of RTL functional units, as an effect of the features of nanoscale technologies and ofthe power optimizations performed by commercial synthesis tools. This work explores these new trade-offs and proves that making RTL macro selection strategies aware of them results in power savings as high as 43%

    Designing Regular Network-on-Chip Topologies under Technology, Architecture and Software Constraints

    No full text
    Regular multi-core processors are appearing in the embedded system market as high performance software programmable solutions. The use of regular interconnect fabrics for them allows fast design time, ease of routing, predictability of electrical parameters and good scalability. k-ary n-mesh topologies are candidate solutions for these systems, borrowed from the domain of off-chip interconnection networks. However, the on-chip integration has to deal with unique challenges at different levels of abstraction. From a technology viewpoint, interconnect reverse scaling causes critical paths to go across global links. Poor interconnect performance might also impact IP core speed depending on the synchronization mechanism at the interface. Finally, this might also conflict with the requirements that communication libraries employed in the MPSoC domain pose on the underlying interconnect fabric. This paper provides a comprehensive overview of these topics, by characterizing physical feasibility of representative k-ary n-mesh topologies and by providing silicon-aware system-level performance figures

    Interactive presentation: Capturing the interaction of the communication, memory and I/O subsystems in memory-centric industrial MPSoC platforms

    No full text
    Industrial MPSoC platforms exhibit increasing communication needs while not yet reverting to revolutionary solutions such as networks-on-chip. On one hand, the limited scalability of shared busses is being overcome by means of multi-layer communication architectures, which are stressing the role of bridges as key contributors to system performance. On the other hand, technology limitations, data footprint and cost constraints lead to platform instantiations with only few on-chip memory devices and with a global performance bottleneck: the memory controller for access to the off-chip SDRAM memory. The complex interaction among system components and the dependency of macroscopic performance metrics on fine-grain architectural features stress the importance of highly accurate modelling and analysis tools. This paper takes its steps from an extensive modelling effort of a complete industrial MPSoC platform for consumer electronics, including the off-chip memory sub-system. Based on this, relevant design issues concerning the communication, memory and I/O architecture and their interaction are addressed, resulting in guidelines for designers of industry-relevant MPSoCs
    corecore