1,720,973 research outputs found
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
• The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
• Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
• NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
• Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout
Variation tolerant NoC design by means of self-calibrating links
We present the implementation and analysis of a variation tolerant version of a switch-to-switch link in a NoC. The goal is to tolerate the effects of process variations on NoC architectures using self-correcting links that automatically detect delay variations and compensate them. The correction is applied without increasing the switch-to-switch latency by substituting the output flip-flops of the sending switch with a self-correcting flip-flop followed by an adaptive voltage swing selector. Higher delay variations will result in a smaller slack in the switch-to-switch path, but the adaptive voltage swing selector could mitigate its impact on the NoC communication by increasing the voltage swing on the link, thus allowing a compensation of the delay variation. As a result, it is possible to tolerate delay variations at the cost of additional power consumption
Performance, Area and Power Breakdown Analysis for NoC switches in 65nm technology
Leveraging a development effort of the synthesis backend for Networks-on-Chip (NoCs), this work
contributes an analysis of the performance, area and power/energy breakdown of NoC switches in 65nm
technology. In particular, an architecture-level technique (control- and data-path decoupling) is deployed
to derive switch implementation variants optimized for different design objectives, and is then
validated against placement-aware logic synthesis
Tight Integration of GALS Interfaces into the NoC Architecture
This poster illustrates deep integration of of the synchronizer in the switch architecture of networks-on-chip, thus merging key tasks such as synchronization, buffering and flow control into a unique architecture block. The poster compares the integrated and the loosely coupled solutions from a performance and area viewpoint, while devoting special attention to their robustness with respect to physical design parameters
Efficient Implementation of Distributed Routing Algorithms for NoCs
Chip multiprocessors (CMPs) are gaining momentum in the high-performance computing domain. Networks-on-chip (NoCs) are key components of CMP architectures, in that they have to deal with the communication scalability challenge while meeting tight power, area and latency constraints. 2D mesh topologies are usually preferred by designers of general purpose NoCs. However, manufacturing faults may break their regularity. Moreover, resource management frameworks may require the segmentation of the network into irregular regions. Under these conditions, efficient routing becomes a challenge. Although the use of routing tables at switches is flexible, it does not scale in terms of latency and area due to its memory requirements. Logic-based distributed routing (LBDR) is proposed as a new routing method that removes the need for routing tables at all. LBDR enables the implementation of many routing algorithms on most of the practical topologies we may find in the near future in a multi-core system. From an initial topology and routing algorithm, a set of three bits per switch/output port is computed. Evaluation results show that, by uysing a small logic, LBDR mimics the performance of routing algorithms when implemented with routing tables, both in regular and irregular topologies. LBDR implementation in a real NoC switch is also explored, proving its smooth integration in the architecture and its negligible hardware and performance overhead
Network Interface Sharing Techniques for Area Optimized NoC Architectures
Although preliminary analysis frameworks point out the performance speed-ups achievable by on-chip networks with respect to state-of-the-art interconnects, the area concern remains one of the most daunting challenges to make this interconnect technology mainstream. A common approach to relieve the problem consists of sharing most of network interface resources among a number of processor cores. However, buffering resources need to be replicated and control logic reaches a complexity that limits maximum achievable frequency. This paper proposes full sharing of network interface resources, including buffers, thus trading performance for area. While area improvements are significant, a number of physical and system-level effects might mitigate performance degradation, making our technique a promising solution for area efficient network-on-chip realizations across a range of operating conditions
Improved Utilization of NoC Channel Bandwidth by Switch Replication for Cost-Effective Multi-processor Systems-on-Chip
Virtual channels are an appealing flow control technique for on-chip interconnection networks (NoCs), in that they can potentially avoid deadlock and improve link utilization and network throughput. However, their use in the resource constrained multi-processor system-on-chip (MPSoC) domain is still controversial, due to their significant overhead in terms of area, power and cycle time degradation. This paper proposes a simple yet efficient approach to VC implementation, which results in more area- and power-saving solutions than conventional design techniques. While these latter replicate only buffering resources for each physical link, we replicate the entire switch and prove that our solution is counter intuitively more area/power efficient while potentially operating at higher speeds. This result builds on a well-known principle of logic synthesis for combinational circuits (the area-performance trade-off when inferring a logic function into a gate-level netlist), and proves that when a designer is aware of this, novel architecture design techniques can be conceived
Power-Optimal RTL Arithmetic Unit Soft-Macro Selection Strategy for Leakage-Sensitive Technologies
With the advent of nanoscale technologies, developing power efficient ASICs increasingly requires consideration of static power. An effective approach to make RTL synthesis algorithms and tools leakage-aware consists of the smart inference of RTL macros based on design constraints and optimization directives. This involves exploring the new trade-offs spanned by the design of RTL functional units, as an effect of the features of nanoscale technologies and ofthe power optimizations performed by commercial synthesis tools. This work explores these new trade-offs and proves that making RTL macro selection strategies aware of them results in power savings as high as 43%
Designing Regular Network-on-Chip Topologies under Technology, Architecture and Software Constraints
Regular multi-core processors are appearing in the
embedded system market as high performance software programmable
solutions. The use of regular interconnect fabrics
for them allows fast design time, ease of routing, predictability
of electrical parameters and good scalability. k-ary n-mesh
topologies are candidate solutions for these systems, borrowed
from the domain of off-chip interconnection networks. However,
the on-chip integration has to deal with unique challenges at
different levels of abstraction. From a technology viewpoint,
interconnect reverse scaling causes critical paths to go across
global links. Poor interconnect performance might also impact
IP core speed depending on the synchronization mechanism at the
interface. Finally, this might also conflict with the requirements
that communication libraries employed in the MPSoC domain
pose on the underlying interconnect fabric. This paper provides
a comprehensive overview of these topics, by characterizing
physical feasibility of representative k-ary n-mesh topologies and
by providing silicon-aware system-level performance figures
Interactive presentation: Capturing the interaction of the communication, memory and I/O subsystems in memory-centric industrial MPSoC platforms
Industrial MPSoC platforms exhibit increasing communication needs while not yet reverting to revolutionary solutions such as networks-on-chip. On one hand, the limited scalability of shared busses is being overcome by means of multi-layer communication architectures, which are stressing the role of bridges as key contributors to system performance. On the other hand, technology limitations, data footprint and cost constraints lead to platform instantiations with only few on-chip memory devices and with a global performance bottleneck: the memory controller for access to the off-chip SDRAM memory. The complex interaction among system components and the dependency of macroscopic performance metrics on fine-grain architectural features stress the importance of highly accurate modelling and analysis tools. This paper takes its steps from an extensive modelling effort of a complete industrial MPSoC platform for consumer electronics, including the off-chip memory sub-system. Based on this, relevant design issues concerning the communication, memory and I/O architecture and their interaction are addressed, resulting in guidelines for designers of industry-relevant MPSoCs
- …
