1,721,074 research outputs found
Bridging the Gap between Software and Hardware Designers Using High-Level Synthesis
Modern Systems-on-Chip (SoC) architectures and CPU+FPGA computing platforms are moving towards heterogeneous systems featuring an increasing number of hardware accelerators. These specialized components can deliver energy-efficient high performance, but their design from high-level specifications is usually very complex. Therefore, it is crucial to understand how to design and optimize such components to implement the desired functionality. This paper discusses the challenges between software programmers and hardware designers, focusing on the state-of-the-art methods based on high-level synthesis (HLS). It also highlights the future research lines for simplifying the creation of complex accelerator-based architectures
A Simulation-Based Framework for the Exploration of Mapping Solutions on Heterogeneous MPSoCs
Performance Estimation of Task Graphs Based on Path Profiling
Correctly estimating the speed-up of a parallel embedded application is crucial to efficiently compare different parallelization techniques, task graph transformations or mapping and scheduling solutions. Unfortunately, especially in case of control-dominated applications, task correlations may heavily affect the execution time of the solutions and usually this is not properly taken into account during performance analysis. We propose a methodology that combines a single profiling of the initial sequential specification with different decisions in terms of partitioning, mapping, and scheduling in order to better estimate the actual speed-up of these solutions. We validated our approach on a multi-processor simulation platform: experimental results show that our methodology, effectively identifying the correlations among tasks, significantly outperforms existing approaches for speed-up estimation. Indeed, we obtained an absolute error less than 5 % in average, even when compiling the code with different optimization levels
Bambu: A Modular Framework for the High Level Synthesis of Memory-Intensive Applications
The Case for Polymorphic Registers in Dataflow Computing
Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9×9
or larger mask sizes, even in bandwidth-constrained systems
System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip
In modern system-on-chip architectures, specialized accelerators are increasingly used to improve performance and energy efficiency. The growing complexity of these systems requires the use of system-level design methodologies featuring high-level synthesis (HLS) for generating these components efficiently. Existing HLS tools, however, have limited support for the system-level optimization of memory elements, which typically occupy most of the accelerator area. We present a complete methodology for designing the private local memories (PLMs) of multiple accelerators. Based on the memory requirements of each accelerator, our methodology automatically determines an area-efficient architecture for the PLMs to guarantee performance and reduce the memory cost based on technology-related information. We implemented a prototype tool, called Mnemosyne, that embodies our methodology within a commercial HLS flow. We designed 13 complex accelerators for selected applications from two recently-released benchmark suites (Perfect and CortexSuite). With our approach we are able to reduce the memory cost of single accelerators by up to 45%. Moreover, when reusing memory IPs across accelerators, we achieve area savings that range between 17% and 55% compared to the case where the PLMs are designed separately
Black-Hat High-Level Synthesis: Myth or Reality?
Hardware Trojans are a major concern for integrated circuits. All parts of the electronics supply chain are vulnerable to this threat. Trojans can be inserted directly by a rogue employee or through a compromised computer-aided design tool at each step of the design cycle, including an alteration of the design files in the early stages and the fabrication process in a third-party malicious foundry. While Trojan insertion during the latter stages has been largely investigated, we focus on high-level synthesis (HLS) tools as a likely attack vector. HLS tools are used to generate intellectual property blocks from high-level specifications. To demonstrate the threat, we compromised an open-source HLS tool to inject three examples of HLS-aided hardware Trojans with functional and nonfunctional effects. Our results show that a black-hat HLS tool can be successfully used to maliciously alter electronic circuits to add latency, drain energy, or undermine the security of cryptographic hardware cores. This threat is an important security concern to address
Combined architecture and hardening techniques exploration for reliable embedded system design
- …
