1,721,052 research outputs found
Modeling DVFS and Power-Gating Actuators for Cycle-Accurate NoC-Based Simulators
Networks-on-chip (NoCs) are a widely recognized viable interconnection paradigm to support the multi-core revolution. One of the major design issues of multicore architectures is still the power, which can no longer be considered mainly due to the cores, since the NoC contribution to the overall energy budget is relevant. To face both static and dynamic power while balancing NoC performance, different actuators have been exploited in literature, mainly dynamic voltage frequency scaling (DVFS) and power gating. Typically, simulation-based tools are employed to explore the huge design space by adopting simplified models of the components. As a consequence, the majority of state-of-the-art on NoC power-performance optimization do not accurately consider timing and power overheads of actuators, or (even worse) do not consider them at all, with the risk of overestimating the benefits of the proposed methodologies. This article presents a simulation framework for power-performance analysis of multicore architectures with specific focus on the NoC. It integrates accurate power gating and DVFS models encompassing also their timing and power overheads. The value added of our proposal is manyfold: (i) DVFS and power gating actuators are modeled starting from SPICE-level simulations; (ii) such models have been integrated in the simulation environment; (iii) policy analysis support is plugged into the framework to enable assessment of different policies; (iv) a flexible GALS (globally asynchronous locally synchronous) support is provided, covering both handshake and FIFO re-synchronization schemas. To demonstrate both the flexibility and extensibility of our proposal, two simple policies exploiting the modeled actuators are discussed in the article
HLS-based acceleration of the BIKE post-quantum KEM on embedded-class heterogeneous SoCs
An effective transition to post-quantum cryptography mandates its deployment on embedded-class devices, guaranteeing adequate performance while satisfying their strict area constraints. This work accelerates BIKE, a QC-MDPC code-based post-quantum KEM, through HLS on embedded-class heterogeneous SoCs that couple a CPU with FPGA programmable logic. The proposed methodology implements HLS-generated accelerators to compute the most time-consuming operations of BIKE, identified by analyzing the software-only execution. The mix of accelerators instantiated in hardware and operations executed in software, as well as the configurable architectural parameters of the former, are then determined, depending on the resources available on the target SoC, to minimize BIKE’s execution time. Experiments on AMD Zynq-7000 SoCs highlight a speedup of up to 3.34 times compared to the reference software execution and up to 1.98 times over state-of-the-art HW/SW implementations targeting the same chips
Hound: Locating Cryptographic Primitives in Desynchronized Side-Channel Traces using Deep-Learning
Blink: Fast Automated Design of Run-Time Power Monitors on FPGA-Based Computing Platforms
The current over-provisioned heterogeneous multicores require effective run-time optimization strategies, and the run-time power monitoring subsystem is paramount for their success. Several state-of-the-art methodologies address the design of a run-time power monitoring infrastructure for generic computing platforms. However, the power model's training requires time-consuming gate-level simulations that, coupled with the everincreasing complexity of the modern heterogeneous platforms, dramatically hinder the usability of such solutions. This paper introduces Blink, a scalable framework for the fast and automated design of run-time power monitoring infrastructures targeting computing platforms implemented on FPGA. Blink optimizes the time-to-solution to deliver the run-time power monitoring infrastructure by replacing traditional methodologies' gate-level simulations and power trace computations with behavioral simulations and direct power trace measurements. Applying Blink to multiple designs mixing a set of HLS-generated accelerators from a state-of-the-art benchmark suite demonstrates an average time-to-solution speedup of 18 times without affecting the quality of the run-time power estimates
An FPGA-Based Open-Source Hardware-Software Framework for Side-Channel Security Research
Attacks based on side-channel analysis (SCA) pose a severe security threat to modern computing platforms, further exacerbated on IoT devices by their pervasiveness and handling of private and critical data. Designing SCA-resistant computing platforms requires a significant additional effort in the early stages of the IoT devices’ life cycle, which is severely constrained by strict time-to-market deadlines and tight budgets. This manuscript introduces a hardware-software framework meant for SCA research on FPGA targets. It delivers an IoT-class system-on-chip (SoC) that includes a RISC-V CPU, provides observability and controllability through an ad-hoc debug infrastructure to facilitate SCA attacks and evaluate the platform's security, and streamlines the deployment of SCA countermeasures through dedicated hardware and software features such as a DFS actuator and FreeRTOS support. The open-source release of the framework includes the SoC, the scripts to configure the computing platform, compile a target application, and assess the SCA security, as well as a suite of state-of-the-art attacks and countermeasures. The goal is to foster its adoption and novel developments in the field, empowering designers and researchers to focus on studying SCA countermeasures and attacks while relying on a sound and stable hardware-software platform as the foundation for their research
A Prototype-Based Framework to Design Scalable Heterogeneous SoCs with Fine-Grained DFS
Frameworks for the agile development of modern system-on-chips are crucial to dealing with the complexity of designing such architectures. The open-source Vespa framework for designing large, FPGA-based, multi-core heterogeneous system-on-chips enables a faster and more flexible design space exploration of such architectures and their run-time optimization. Vespa, built on ESP, introduces the capabilities to instantiate multiple replicas of the same accelerator in a single network-on-chip node and to partition the system-on-chips into frequency islands with independent dynamic frequency scaling actuators, as well as a dedicated run-time monitoring infrastructure. Experiments on 4-by-4 tile-based system-on-chips demonstrate the possibility of effectively exploring a multitude of solutions that differ in the replication of accelerators, the clock frequencies of the frequency islands, and the tiles' placement, as well as monitoring a variety of statistics related to the traffic on the interconnect and the accelerators' performance at run time
A DVFS Cycle Accurate Simulation Framework with Asynchronous NoC Design for Power-Performance Optimizations
Network-on-Chip (NoC) is a flexible and scalable solution to interconnect multi-cores, with a strong influence on the performance of the whole chip. On-chip network affects also the overall power consumption, thus requiring accurate early-stage estimation and optimization methodologies. In this scenario, the Dynamic Voltage Frequency Scaling (DVFS) technique have been proposed both for CPUs and NoCs. The promise is to be a flexible and scalable way to jointly optimize power-performance, addressing both static and dynamic power sources. Being simulation a de-facto prime solution to explore novel multi-core architectures, a reliable full system analysis requires to integrate in the toolchain accurate timing and power models for the DVFS block and for the resynchronization logic between different Voltage and Frequency Islands (VFIs). In such a way, a more accurate validation of novel optimization methodologies which exploit such actuator is possible, since both architectural and actuator overheads are considered at the same time. This work proposes a complete cycle accurate framework for multi-core design supporting Global Asynchronous Local Synchronous (GALS) NoC design and DVFS actuators for the NoC. Furthermore, static and dynamic frequency assignment is possible with or without the use of the voltage regulator. The proposed framework sits on accurate analytical timing model and SPICE-based power measures, providing accurate estimates of both timing and power overheads of the power control mechanisms
All-digital control-theoretic scheme to optimize energy budget and allocation in multi-cores
The Internet-of-Things (IoT) revolution fueled new challenges and opportunities to achieve computational efficiency goals. Embedded devices are required to execute multiple applications for which a suitable distribution of the computing power must be adapted at run-time. Such complex hardware platforms have to sustain the continuous acquisition and processing of data under severe energy budget constraints, since most of them are battery powered. The state-of-the-art offers several ad-hoc contributions to selectively optimize the performance considering aspects like energy, power, thermal or reliability. However, there is a need for a generic coordinated management strategy able to cope with all of these dimensions, while allowing the Operating System (OS) and the applications to “suggest” or constrain the actuation. This paper proposes a unified control-theoretic scheme to coordinate the design of energy-budget and energy allocation solutions for multi-cores. The proposed controller can work with any actuator and it can interact, at run-time, with both the applications and the OS to optimize the actuation signals steering the computing platform. Such control scheme offers the possibility to integrate any performance related policy in the form of an energy-allocation strategy, still ensuring the theoretic exponential stability of the overall controller if the actuation of the policy, coming from the OS and the applications, “is not too fast”. To demonstrate the feasibility of our solution, we implemented the controller into a RISC multi-core running on the Xilinx Artix 100t FPGA device, available in the the Digilent Nexys4-DDR board. Results considering two actuators and both the quad- and the eight-core version of the considered computing platform, highlight the scalability of the proposed solution as well as an area overhead for the –all digital, on chip– controller limited to 0.86% (FFs) and 5.3% (LUTs) of the FPGA chip. We also considered a dynamic scenario validating the speed of the controller, where our framework has to face with modifications to the energy-allocation control policy carried out by the OS and the applications. The obtained results are collected by executing a huge mix of benchmarks and the statistical significance is accounted by executing each scenario 30 times. Such results are analyzed considering three quality metrics. First, the efficiency in exploiting the imposed budget (EFFg) that is on average 98.27%. Second, the overflow of the actual average power consumption with respect to the assigned budget (OV Fg), which is limited to 1.43 mW on average. Last, the performance utility loss due to the control scheme that is limited to 1.87% on average
A Control-based Methodology for Power-performance Optimization in NoCs Exploiting DVFS
Networks-on-Chip (NoCs) are considered a viable solution to fully exploit the computational power of multi- and many-cores, but their non negligible power consumption requires ad hoc power-performance design methodologies. In this perspective, several proposals exploited the possibility to dynamically tune voltage and frequency for the interconnect, taking steps from traditional CPU-based power management solutions. However, the impact of the actuators, i.e. the limited range of frequencies for a PLL (Phase Locked Loop) or the time to increase voltage and frequency for a Dynamic Voltage and Frequency Scaling (DVFS) modules, are often not carefully accounted for, thus overestimating the benefits. This paper presents a control-based methodology for the NoC power-performance optimization exploiting the Dynamic Frequency Scaling (DFS). Both timing and power overheads of the actuators are considered, thanks to an ad hoc simulation framework. Moreover the proposed methodology eventually allows for user and/or OS interactions to change between different high level power-performance modes, i.e. to trigger performance oriented or power saving system behaviors. Experimental validation considered a 16-core architecture comparing our proposal with different settings of threshold-based policies. We achieved a speedup up to 3 for the timing and a reduction up to 33.17% of the power ∗ time product against the best threshold-based policy. Moreover, our best control-based scheme provides an averaged power-performance product improvement of 16.50% and 34.79% against the best and the second considered threshold-based policy setting
- …
