1,720,990 research outputs found

    Dataset supporting "Fused: Closed-Loop Performance and Energy Simulation of Embedded Systems"

    No full text
    This dataset supports the article entitled &quot;Fused: Closed-Loop Performance and Energy Simulation of Embedded Systems&quot; accepted for publication in the 2020 IEEE International Symposium on Performance Analysis of Systems and Software.(ISPASS&#39;20)</span

    Complementary dataset to &quot;BRB: Mitigating Branch Predictor Side-Channels.&quot;

    No full text
    This dataset supports the article &quot;BRB: Mitigating Branch Predictor Side-Channels.&quot;, accepted for publication at HPCA &#39;19, The 25th International Symposium on High-Performance Computer Architecture.</span

    Dataset supporting the journal article &quot;Pragmatic Memory-System Support for Intermittent Computing using Emerging Non-Volatile Memory&quot;

    No full text
    Sivert T. Sliper, William Wang, Nikos Nikoleris, (2022) Pragmatic Memory-System Support for Intermittent Computing using Emerging Non-Volatile Memory. (Accepted/In press) In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 14 p All files are in csv or ods format, both of which can be opened in spreadsheet programs like Libre Office Sheet or proprietary alternatives such as Microsoft Excel.</span

    Pragmatic memory-system support for intermittent computing using emerging non-volatile memory

    No full text
    Intermittent computing (IC) is a key enabler for the vision of a trillion Internet of Things devices. By harvesting energy from the environment, and leveraging non-volatile memory (NVM) to retain computational progress across power cycles, IC enables untethered and battery-free devices to perform computation whenever ambient energy is available. The backbone of state retention is NVM, and recent advances in energy-efficient NVM have the potential to expand the application domain of IC significantly. Utilizing emerging NVM at the level of bitcells, researchers have proposed non-volatile processors. However, these do not leverage hardware-software co-design, which can be used to overcome hardware limitations and to provide support for application-level constraints such as atomicity. In this paper, we propose MEMIC, a memory architecture tailored for IC devices with byte-addressable NVM. A core focus of MEMIC is to combine volatile-and non-volatile memory in such a way that the operations of IC are as efficient as possible, while also maximizing computational performance per joule. MEMIC uses volatile memory for energy efficiency, and nonvolatile memory for data retention. To avoid double-buffered checkpoints and costly roll-backs when code needs to be reexecuted, MEMIC is designed to track and minimize writes to non-volatile memory during failure-atomic sections. Our evaluation shows that MEMIC&amp;#x2019;s instruction cache reduces workload completion time under intermittent operation by 41-70% and its data cache provides a further reduction of 13-39%.</p

    Dataset supporting &quot;Efficient State Retention through Paged Memory Management for Reactive Transient Computing&quot;

    No full text
    This dataset supports the article entitled &quot;Efficient State Retention through Paged Memory Management for Reactive Transient Computing&quot; accepted for publication in the 56th ACM/IEEE Design Automation Conference, DAC 2019.</span

    BRB: mitigating branch predictor side-channels

    No full text
    Modern processors use branch prediction as an optimization to improve processor performance. Predictors have become larger and increasingly more sophisticated in order to achieve higher accuracies which are needed in high performance cores. However, branch prediction can also be a source of side channel exploits, as one context can deliberately change the branch predictor state and alter the instruction flow of another context. Current mitigation techniques either sacrifice performance for security, or fail to guarantee isolation when retaining the accuracy. Achieving both has proven to be challenging.In this work we address this by, (1) introducing the notions of steady-state and transient branch predictor accuracy, and (2) showing that current predictors increase their misprediction rate by as much as 90% on average when forced to flush branch prediction state to remain secure. To solve this, (3) we introduce the branch retention buffer, a novel mechanism that partitions only the most useful branch predictor components to isolate separate contexts. Our mechanism makes thread isolation practical, as it stops the predictor from executing cold with little if any added area and no warm-up overheads. At the same time our results show that, compared to the state-of-the-art, average misprediction rates are reduced by 15-20% without increasing area, leading to a 2% performance increase

    Fused: closed-loop performance and energy simulation of embedded systems

    No full text
    Energy-driven computing is an emerging paradigm that aims to fuel the proliferation of tiny and low-cost IoT sensing and monitoring devices. Energy-driven computers are generally powered by energy harvesting sources, and adapt their operation at runtime according to energy availability; thus, they must be designed and tested according to the expected dynamics of their power source. However, today’s processor simulators and debuggers typically assume that power is always available, so they are unable to correctly model the interactions between power supply, power consumption and energy-driven execution. To address this shortcoming, we propose Fused, an open source full-system simulator for energy-driven computers. Fused models execution, power consumption, and power supply in a closed loop, thus correctly models the interaction between them. It targets energy-driven embedded systems, and employs SystemC for digital and mixed-signal simulation to model a microcontroller and mixed-signal circuitry, enabling hardware-software codesign and design space exploration. Fused includes a high-level power modelling methodology, whereby events recorded during simulation are correlated to power measurements of real hardware to extract features for power modelling. Results show that Fused can model the execution time and power consumption of a commercially available microcontroller with a geometric mean error of 0.2% and 3.4% respectively, across a wide range of workloads. Through a case-study, we demonstrate that Fused can accurately model a state-of-the art intermittent computing system, where execution is heavily dependent on energy availability: although up to 70 power cycles were needed to complete the tested workload on the constrained energy supply, Fused modelled the completion time with less than 7% error

    Efficient state retention through paged memory management for reactive transient computing

    No full text
    Reactive transient computing systems preserve computational progress despite frequent power failures by suspending (saving state to nonvolatile memory) when detecting a power failure, and restoring once power returns. Existing methods inefficiently save and restore all allocated memory. We propose lightweight memory management that applies the concept of paging to load pages only when needed, and save only modified pages. We then develop a model that maximises available execution time by dynamically adjusting the suspend and restore voltage thresholds. Experiments on an MSP430FR5994 microcontroller show that our method reduces state retention overheads by up to 86.9% and executes algorithms up to 5.3 times faster than the state-of-the-art.<br/

    Efficient Memory Modeling During Simulation and Native Execution

    No full text
    Application performance on computer processors depends on a number of complex architectural and microarchitectural design decisions. Consequently, computer architects rely on performance modeling to improve future processors without building prototypes. This thesis focuses on performance modeling and proposes methods that quantify the impact of the memory system on application performance. Detailed architectural simulation, a common approach to performance modeling, can be five orders of magnitude slower than execution on the actual processor. At this rate, simulating realistic workloads requires years of CPU time. Prior research uses sampling to speed up simulation. Using sampled simulation, only a number of small but representative portions of the workload are evaluated in detail. To fully exploit the speed potential of sampled simulation, the simulation method has to efficiently reconstruct the architectural and microarchitectural state prior to the simulation samples. Practical approaches to sampled simulation use either functional simulation at the expense of performance or checkpoints at the expense of flexibility. This thesis proposes three approaches that use statistical cache modeling to efficiently address the problem of cache warm up and speed up sampled simulation, without compromising flexibility. The statistical cache model uses sparse memory reuse information obtained with native techniques to model the performance of the cache. The proposed sampled simulation framework evaluates workloads 150 times faster than approaches that use functional simulation to warm up the cache. Other approaches to performance modeling use analytical models based on data obtained from execution on native hardware. These native techniques allow for better understanding of the performance bottlenecks on existing hardware. Efficient resource utilization in modern multicore processors is necessary to exploit their peak performance. This thesis proposes native methods that characterize shared resource utilization in modern multicores. These methods quantify the impact of cache sharing and off-chip memory sharing on overall application performance. Additionally, they can quantify scalability bottlenecks for data-parallel, symmetric workloads.UPMAR
    corecore