1,720,963 research outputs found
Dynamically Reconfigurable NoC using a deadlock-free flexible routing algorithm with a low hardware implementation cost
NoC-Based Dynamic Reconfigurable Systems (DRSs) implemented over FPGA devices change their configuration at the run time by re-positioning or replacing the existing processing modules into the network. Several Dynamically Reconfigurable NoCs (DRNoCs) in the literature, propose adaptive routing algorithms in order to handle the network structure alteration. Nevertheless, their implementation cost is severe in terms of chip area and time required to reconfigure the routing scheme, which results in non well-scalable solutions for DRSs. In this work, we propose an alternative DRNoC approach, based on a traditional 2-D mesh, using a logic-based implementation of the Flexible Direction Order Routing (FDOR) algorithm, thus inheriting its simplicity and deadlock-freedom. Several scenarios have been considered in order to prove the applicability of the FDOR algorithm in the context of a DRNoC accompanied by performance and synthesis results. In conclusion, we demonstrate that FDOR is a suitable solution for DRNoCs
A SystemC-based Simulator for Design Space Exploration of Smart Wireless Systems
Smart wireless techniques are at the core of many today's telecommunication and networked embedded systems where performance are enhanced by intertwining radio frequency (RF) and digital aspects. Therefore their design requires to focus on both domains. Traditional approaches for their simulation rely either on different domain-specific tools or on analog-mixed-signal modeling languages. In the former case, the simulation of the whole platform in the same run is not possible while in the latter case, simulation performance are limited by the computationally most intensive domain (usually RF). We present an extension of the SystemC Network Simulation Library that allows to simulate antenna details and node position together with digital hardware and software. The validation on a real wearable system shows that the proposed simulation approach achieves a good trade-off between accuracy and speed thus allowing fast exploration of various configurations in the early phase of the design flow without recurring to the expensive and time-consuming creation of physical prototypes
Increasing Impartiality and Robustness in High-Performance N-Way Asynchronous Arbiters
Arbiters are the most critical element to manage a shared resource. Many arbiters in the literature are asynchronous, in order to improve concurrency and make the performance independent from the working frequency of the requesting clients. However, in asynchronous designs, architectural imbalances or variability can affect impartiality, such as latency equalization and arbitration fairness. Such a problem has largely not been taken into account in previous designs and experimental results. This work aims to perform an accurate rebalancing for N-way arbiters, using a new architecture, based on a tree structure. The proposed architecture drastically mitigates various forms of impartiality, as well as enhances overall robustness. The design is also scalable and highly performance-oriented. A detailed comparison of several post-layout N-way arbiter models is included. Results show significant benefits over most critical design costs
A built-in self-testing framework for asynchronous bundled-data NoC switches resilient to delay variations
Most multi- and many-core integrated systems are currently designed by following a globally asynchronous locally synchronous paradigm. Asynchronous interconnection networks are promising candidates to interconnect IP cores operating at potentially different frequencies. Nevertheless, post-fabrication testing is a big challenge to bring asynchronous NoCs to the market due to a lack of testing methodologies and support for them. In particular, the unpredictable delay variability introduced by the manufacturing process may differentiate the delay of nominally-balanced I/O timing paths, thus making the order of the input patterns unpredictable and precluding the correct behaviour of signature-based test compactors. This paper tackles this challenge by proposing a testing framework for asynchronous NoCs which works effectively despite delay variations in and across timing paths of the NoC under test. Moreover, in order to mitigate the growing test application costs in modern ICs, we come up with a built-in self-testing infrastructure which automatically controls and delivers the outcome of the testing process without the intervention of an external automatic test equipment (ATE)
Cost-Effective and Flexible Asynchronous Interconnect Technology for GALS Networks-on-Chip
Fine-grained power management of largely-integrated manycore systems is becoming mainstream in order to deal with tight power budgets. As a result, some level of asynchrony is becoming inevitable for efficient system-level operation. Asynchronous interconnection networks naturally provide such asynchrony, however their wide industrial uptake depends on the capability to overcome two fundamental barriers: their area and dynamic power overhead as well as the limited computer-aided design (CAD) tool support for their automated design. This paper presents a novel design point (i.e., a switch architecture and a hierarchical synthesis toolflow for network assembly) for on-chip asynchronous communication, combining design flexibility with small footprint and cost effectiveness
System interconnect extensions for fully transparent demand paging in low-cost MMU-less embedded systems
MMU-less embedded systems are the state of the art solution for deeply embedded computing environments. Thanks to the rapid evolution of such devices, nowadays applications that run on top of them are evolving from simple control tasks to more complex applications that involve an Operating System (OS). At the same time, cost budget remains unchanged in spite of the growing performance requirements. For this reason, traditional code loading and execution techniques like full code shadowing or execute-in-place may lead to a performance bottleneck. Even demand paging strategies lack consensus due to the customization and the complexity of the software infrastructure dealing with the memory management. The objective of this work is to implement a transparent hardware-based demand paging strategy for code loading and execution, targeting MMU-less embedded systems. This approach consists of making the system interconnect aware of the memory map, without burdening on the legacy OS code, application code and on the compilation framework. This approach materializes lower boot-up latency and shorter application execution time with respect to traditional loading and executing schemes
An asynchronous NoC router in a 14nm FinFET library: Comparison to an industrial synchronous counterpart
An asynchronous high-performance low-power 5-port network-on-chip (NoC) router is introduced. The proposed router integrates low-latency input buffers using a circular FIFO design, and a novel end-to-end credit-based virtual channel (VC) flow control for a replicated switch architecture. This asynchronous router is then compared to an AMD synchronous router, in a realistic advanced 14nm FinFET library. This is the first such comparison, to the best of our knowledge, using a real synchronous router baseline already fabricated in several commercial products. Initial post-synthesis pre-layout experiments show dominating results for the asynchronous router, when compared to the synchronous router. In particular, 55% less area and 28% latency improvement are observed for the asynchronous implementation. Also, 88% and 58% savings in idle and active power, respectively, are obtained
Crossbar replication vs. sharing for virtual channel flow control in asynchronous NoCs: A comparative study
In on-chip interconnection networks, performance optimization techniques can be often achieved in two opposite ways: by making control logic more complex inside switches, or by pushing design complexity to the switch boundaries. The implementation of virtual channel (VC) flow control is an important application domain of this design trade-off. The data path of VC switches typically exhibits replicated buffers. The underlying philosophy (i.e., resource replication) can be pushed to the limit, thus incuring an apparently high area cost, while simplifying the switch control path. On the other hand, unreplicated resources require complex control logic for the sake of their efficient sharing among virtual networks. Investigating this design tradeoff is especially important for asynchronous networks, where the synthesis of complex control circuits is a challenge. This paper is a first step toward a design space exploration of VC implementation techniques for transition-signalling bundled-dat...In on-chip interconnection networks, performance optimization techniques can be often achieved in two opposite ways: by making control logic more complex inside switches, or by pushing design complexity to the switch boundaries. The implementation of virtual channel (VC) flow control is an important application domain of this design trade-off. The data path of VC switches typically exhibits replicated buffers. The underlying philosophy (i.e., resource replication) can be pushed to the limit, thus incuring an apparently high area cost, while simplifying the switch control path. On the other hand, unreplicated resources require complex control logic for the sake of their efficient sharing among virtual networks. Investigating this design tradeoff is especially important for asynchronous networks, where the synthesis of complex control circuits is a challenge. This paper is a first step toward a design space exploration of VC implementation techniques for transition-signalling bundled-data asynchronous NoCs, and contrasts a VC switch with replicated crossbars against a unified-crossbar architecture relying on multistage switch allocation
DyAFNoC: Dynamically Reconfigurable NoC Characterization Using a Simple Adaptive Deadlock-Free Routing Algorithm with a Low Implementation Cost
NoC-Based Dynamic Reconfigurable Systems (DRSs) implemented over FPGA devices change their configuration during operation time by positioning or replacing new processing modules over the network structure, being known as Dynamically Reconfigurable NoCs (DRNoCs). In the literature, there are different proposals of DRNoCs implementing adaptive routing algorithms in order to handle the network structure alteration. Nevertheless, their implementation cost is high in terms of chip area, and in the time required to reconfigure the routing algorithm, which result in non-well-scalable solutions for DRSs. In this work, we propose an alternative DRNoC, based on a traditional 2-D mesh, using a logic-based implementation of the Flexible Direction Order Routing (FDOR) algorithm, characterized by its simplicity, low complexity and deadlock-freeness. Simulation examples were made in order to test the feasibility of the FDOR algorithm for a DRNoC, accompanied by performance and synthesis results
- …
