INESC-ID RCAAP Portal
Not a member yet
    15883 research outputs found

    DataMaestro: A Versatile and Efficient Data Streaming Engine Bringing Decoupled Memory Access To Dataflow Accelerators

    No full text
    This research introduces DataMaestro, a versatile and efficient data streaming engine that brings decoupled memory access to DNN dataflow accelerators. By separating data access and computation processes into two independent streams, DataMaestro achieves nearly 100% PE array utilization, outperforming state-of-the-art solutions by 1.05-21.39× while minimizing area and energy consumption. The work demonstrates the effectiveness of DataMaestro through its integration with a Tensor Core-like GeMM accelerator and a Quantization accelerator in a RISC-V host system, showcasing its potential to address performance and energy challenges in DNN inference execution

    Real-Time Orb Accelerator with ROS Integration for Embedded FPGA SoCs

    No full text
    This research paper proposes a novel real-time ORB (Oriented FAST and Rotated BRIEF) accelerator designed for low-power embedded systems, addressing the limitations of existing hardware implementations. The proposed system achieves significant energy efficiency improvements, up to 16.2x, while requiring fewer hardware resources compared to state-of-the-art solutions. By introducing a novel resource-efficient architecture that exploits quantization of the feature orientation angle, the authors demonstrate the feasibility of real-time ORB acceleration in low-power embedded environments.</p

    Stream-Driven Acceleration for Embedded RISC-V SoCs

    No full text
    This research paper proposes a stream-driven computational model that expands the recent stream vectorization paradigm into a full dataflow-driven computing model, exploiting spatial computation and time-multiplexing to manage data access patterns. The proposed architecture abstracts kernel loops into stream data-flow graphs and maps them onto a processing element array, leveraging both spatial and temporal parallelism across various computational tasks. Experimental results demonstrate the potential of this approach to develop high-efficiency accelerators in data-intensive applications, achieving performance gains of up to 6&times; compared with an ARM Cortex-A53 CPU.</p

    Dynamic Reconfigurable FPU for Next-Generation Transprecision Computing

    No full text
    This research presents a novel dynamically reconfigurable Floating-Point Unit (FPU) architecture that supports all IEEE 754 data types and lower-precision formats such as bfloat16 and DLFloat. The proposed FPU enables dynamic precision tuning of operands, allowing for increased throughput through vectorization and improved energy efficiency. By dynamically adjusting operand precision, the FPU achieves a peak energy efficiency of 152 GOPS/W, demonstrating its potential to optimize hardware utilization in next-generation transprecision computing applications.</p

    0

    full texts

    15,883

    metadata records
    Updated in last 30 days.
    INESC-ID RCAAP Portal
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇