INESC-ID RCAAP Portal
Not a member yet
15883 research outputs found
Sort by
DataMaestro: A Versatile and Efficient Data Streaming Engine Bringing Decoupled Memory Access To Dataflow Accelerators
This research introduces DataMaestro, a versatile and efficient data streaming engine that brings decoupled memory access to DNN dataflow accelerators. By separating data access and computation processes into two independent streams, DataMaestro achieves nearly 100% PE array utilization, outperforming state-of-the-art solutions by 1.05-21.39× while minimizing area and energy consumption. The work demonstrates the effectiveness of DataMaestro through its integration with a Tensor Core-like GeMM accelerator and a Quantization accelerator in a RISC-V host system, showcasing its potential to address performance and energy challenges in DNN inference execution
A Review of Multimodal AI in Veterinary Diagnosis: Current Trends, Challenges, and Future Directions
Real-Time Orb Accelerator with ROS Integration for Embedded FPGA SoCs
This research paper proposes a novel real-time ORB (Oriented FAST and Rotated BRIEF) accelerator designed for low-power embedded systems, addressing the limitations of existing hardware implementations. The proposed system achieves significant energy efficiency improvements, up to 16.2x, while requiring fewer hardware resources compared to state-of-the-art solutions. By introducing a novel resource-efficient architecture that exploits quantization of the feature orientation angle, the authors demonstrate the feasibility of real-time ORB acceleration in low-power embedded environments.</p
Stream-Driven Acceleration for Embedded RISC-V SoCs
This research paper proposes a stream-driven computational model that expands the recent stream vectorization paradigm into a full dataflow-driven computing model, exploiting spatial computation and time-multiplexing to manage data access patterns. The proposed architecture abstracts kernel loops into stream data-flow graphs and maps them onto a processing element array, leveraging both spatial and temporal parallelism across various computational tasks. Experimental results demonstrate the potential of this approach to develop high-efficiency accelerators in data-intensive applications, achieving performance gains of up to 6× compared with an ARM Cortex-A53 CPU.</p
A 0.5V Programmable Voltage Reference with Integrated Differential-Pair Trimming achieving 28.1ppm/◦C Temperature Coefficient
A Forward-Looking Assessment of Robotized Operation and Maintenance Practices for Offshore Wind Farms
The Federated European Genome–Phenome Archive as a global network for sharing human genomics data
Dynamic Reconfigurable FPU for Next-Generation Transprecision Computing
This research presents a novel dynamically reconfigurable Floating-Point Unit (FPU) architecture that supports all IEEE 754 data types and lower-precision formats such as bfloat16 and DLFloat. The proposed FPU enables dynamic precision tuning of operands, allowing for increased throughput through vectorization and improved energy efficiency. By dynamically adjusting operand precision, the FPU achieves a peak energy efficiency of 152 GOPS/W, demonstrating its potential to optimize hardware utilization in next-generation transprecision computing applications.</p