1,720,978 research outputs found
System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip
In modern system-on-chip architectures, specialized accelerators are increasingly used to improve performance and energy efficiency. The growing complexity of these systems requires the use of system-level design methodologies featuring high-level synthesis (HLS) for generating these components efficiently. Existing HLS tools, however, have limited support for the system-level optimization of memory elements, which typically occupy most of the accelerator area. We present a complete methodology for designing the private local memories (PLMs) of multiple accelerators. Based on the memory requirements of each accelerator, our methodology automatically determines an area-efficient architecture for the PLMs to guarantee performance and reduce the memory cost based on technology-related information. We implemented a prototype tool, called Mnemosyne, that embodies our methodology within a commercial HLS flow. We designed 13 complex accelerators for selected applications from two recently-released benchmark suites (Perfect and CortexSuite). With our approach we are able to reduce the memory cost of single accelerators by up to 45%. Moreover, when reusing memory IPs across accelerators, we achieve area savings that range between 17% and 55% compared to the case where the PLMs are designed separately
Accelerators for Breast Cancer Detection
Algorithms used in microwave imaging for breast cancer detection require hardware acceleration to speedup execution time and reduce power consumption. In this paper we present the hardware implementation of two accelerators for two alternative imaging algorithms that we obtain entirely from SystemC specifications via high-level synthesis. The two algorithms present opposite characteristics that stress the design process and the capabilities of commercial HLS tools in different ways: the first is communication-bound and requires overlapping and pipelining of communication and computation in order to maximize the application throughput; the second is computation-bound and uses complex mathematical functions that HLS tools do not directly support. Despite these difficulties, thanks to HLS in the span of four months only we were able to explore a large design space and derive about one hundred implementations with different cost-performance profiles, targeting both an FPGA platform and a 32-nm standard-cell ASIC library. In addition, we could obtain results that outperform a previous RTL implementation, which confirms the remarkable progress of HLS tools
Acceleration of Microwave Imaging Algorithms for Breast Cancer Detection via High-Level Synthesis
We present the system-level design of two accelerators for two microwave imaging algorithms for breast cancer detection. The accelerators were designed in SystemC and optimized via High-Level Synthesis (HLS). The two algorithms stress the capabilities of commercial HLS tools in different ways: the first is communication-bound and requires careful pipelining of communication and computation; the second is computation-bound and requires the implementation of mathematical functions that are not properly supported by HLS tools. Still, in the span of four months we were able to design and validate about one hundred alternative implementations, targeting a Zynq SoC platform. Furthermore, we were pleased to obtain results that are superior to a previous RTL implementation, which confirms the remarkable progress of HLS tool
System-level memory optimization for high-level synthesis of component-based SoCs
The design of specialized accelerators is essential to the success of many modern Systems-on-Chip. Electronic system-level design methodologies and high-level synthesis tools are critical for the efficient design and optimization of an accelerator. Still, these methodologies and tools offer only limited support for the optimization of the memory structures, which are often responsible for most of the area occupied by an accelerator. To address these limitations, we present a novel methodology to automatically derive the memory subsystems of SoC accelerators. Our approach enables compositional design-space exploration and promotes design reuse of the accelerator specifications. We illustrate its effective-ness by presenting experimental results on the design of two accelerators for a high-performance embedded application. Copyright 2014 ACM
Enhanced Machine-Learning Flow for Microwave-Sensing Systems for Contaminant Detection in Food
Combining data-driven machine learning (ML) with microwave sensing (MWS) makes it possible to analyze packaged food in real time without any contact and spot low-density contaminants (e.g., plastics or glass splinters) that current industrial food safety methods cannot detect. This is achieved by training ML classifiers on the scattered signal reflected by the target food product exposed to MWs. In this article, we present an enhanced ML flow to boost foreign body detection accuracy of ML classifiers in MWS systems. Innovations include assessing the performance of a multiclass classifier, training it with MW frequency pairs as features, data augmentation, a new preprocessing scaler suitable for the feature distributions in the datasets, quantization, and pruning. The final test results, obtained using our previously designed MWS system and collected dataset of contaminated hazelnut-cocoa spread jars, show a multiclass accuracy for the floating-point model of 96.5%, which corresponds to a binary-equivalent accuracy of 97.3%. This is an improvement of +3.3% against the binary classifier of the previous work. The quantized and pruned model, instead, reached a multiclass accuracy of 94.2%, or a binary accuracy of 95.4%, thus still improving the previous work by +1.4%. Also, we achieved a latency of 26us on an AMD/Xilinx Kria K26 field programmable gate array (FPGA), a result which is ideal for high-throughput food production lines. Furthermore, we expand our latest work with supplementary details and experiments to further validate the proposed ML flow, including a comparative analysis against our prior results. Lastly, we share our datasets publicly on OpenML
A design methodology for compositional high-level synthesis of communication-centric SoCs
Systems-on-chip are increasingly designed at the system level by combining synthesizable IP components that operate concurrently while interacting through communication channels. CAD-tool vendors support this System-Level Design approach with high-level synthesis tools and libraries of interface primitives implementing the communication protocols. These interfaces absorb timing differences in the hardware-component implementations, thus enabling compositional design. However, they introduce also new challenges in terms of functional correctness and performance optimization. We propose a methodology that combines performance analysis and optimization algorithms to automatically address the issues that SoC designers may accidentally introduce when assembling components that are specified at the system level. Copyright 2014 ACM
Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip
Local memory is a key factor for the performance of accelerators in SoCs. Despite technology scaling, the gap between on-chip storage and memory footprint of embedded applications keeps widening. We present a solution to preserve the speedup of accelerators when scaling from small to large data sets. Combining specialized DMA and address translation with a software layer in Linux, our design is transparent to user applications and broadly applicable to any class of SoCs hosting high-throughput accelerators. We demonstrate the robustness of our design across many heterogeneous workload scenarios and memory allocation policies with FPGA-based SoC prototypes featuring twelve concurrent accelerators accessing up to 768MB out of 1GB-addressable DRAM
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
On the design of scalable and reusable accelerators for big data applications
Accelerators are becoming key elements of computing platforms for both data centers and mobile devices as they deliver energyefficient high performance for key computational kernels. However, the design and integration of such components is complex, especially for Big Data applications where they have very large workloads to elaborate. Properly customizing the accelerators' private local memories (PLMs) is of critical importance. To analyze this problem we design an accelerator for Collaborative Filtering by applying a system-level design methodology that allows us to synthesize many alternative micro-Architectures as we vary the PLM sizes. We then evaluate the resulting accelerators in terms of resource requirements for both embedded architectures and data centers as we vary the size and density of the workloads
- …
