1,720,961 research outputs found

    Flexible Precision Vector Extension for Energy Efficient Coarse-Grained Reconfigurable Array AI-Engine

    No full text
    The rapid development of Artificial Intelligence (AI) algorithms has created a need for a resource-optimised hardware accelerator. Among various platforms, Coarse-Grained Reconfigurable Array (CGRA) have gained importance as on-edge accelerators. They comprise of heterogeneous Processing Element (PE) matrix, which allows for high flexibility and parallelisation of calculations. They are mainly used for speeding up Data Flow Graph (DFG) execution. We aim to provide a general purpose, highly parameterised, and flexible architecture for AI on-edge data crunching. We propose a CGRA with a vector extension which allows for dynamically adjustable precision of calculation while maintaining a desired performance-power-area optimisation. It targets 4 bits integer (INT4) and 8 bits integer (INT8) quantization for fast and efficient Neural Network (NN) processing. In this paper, we examined hardware costs required to support the vector extension functionality. We synthesised the design on the 40nm Standard-Cell technology from TSMC. The obtained results show that the proposed extension attains on average 28.2% decrease in power consumption and 21.6% decrease in area compared to a reference design of the same computation power

    Exploring Key Aspects of Soft GPGPU Computing for On-board Acceleration of Artificial Intelligence Algorithms in Space Applications

    No full text
    Artificial Intelligence has gained widespread adoption across different industrial sectors, serving as a versatile tool to carry out a diverse array of tasks, ranging from image classification and traffic forecasting to natural language processing and speech recognition. In the space domain, however, a special focus must be placed on area overhead, power consumption, and fault-tolerant solutions. In this particular scenario, soft General-Purpose Computing on Graphic Processing Units has the potential to revolutionise space-related activities. Indeed, by leveraging both Field Programmable Gate Array technology and Graphic Processing Unit computing, it becomes feasible to achieve high-performance capabilities without compromising neither power consumption nor radiation tolerance features. Moreover, the use of reconfigurable hardware can facilitate the acceleration of a wide range of Machine Learning algorithms, avoiding the drawbacks of excessive specialisation. This paper explores the State-of-the-Art in terms of hardware platforms for on-the-edge acceleration of Artificial Intelligence algorithms and compares it with a possible System-on-Chip implementation based on a soft-Graphic Processing Unit. Then, the attention is shifted towards the investigation of key aspects for future space missions, such as reliability and Dynamic Partial Reconfiguration. We point out the lack of European technological solutions, emphasising the promising potential offered by NanoXplore devices. We also discuss the importance of fault detection and mitigation techniques in space applications, covering the most commonly employed hardware methods for reliability enhancement and highlighting the lack of work in the field of General-Purpose Computing for Graphic Processing Units, especially in the space sector. Furthermore, we briefly examine the implementation of Dynamic Partial Reconfiguration over a System-on-Chip featuring a soft-Graphic Processing Unit IP. Finally, in the last section of the paper, we hint at future development of the project and conclude the work

    SystemVerilog UVM-based Verification Environment for a SpaceFibre Router

    No full text
    The number of space missions has seen continuous growth in the last years. Accordingly, satellite communications traffic and onboard spacecraft technologies have also increased. To manage high data flows in high-bandwidth communication protocols the European Cooperation for Space Standardization has released the SpaceFibre protocol in 2019, whose features for data handling and the introduction of routing capabilities increase the complexity of the infrastructures in such networks. Consequently, a crucial implementation step towards the realisation of a satellite high speed data-handling network is the design and the verification of a routing switch at different levels. As far as the network layer is concerned, the literature and the market lack verification environments for this type of devices. This work proposes a reusable and customisable network-level verification environment for SpaceFibre routing switches, that leverages the capabilities of SystemVerilog and the Universal Verification Methodology standard. In particular, the environ-ment can be connected with any network topology, verifying any router individually. Furthermore, our environment requires network-level compatibility with the hardware interfaces, being independent from the implemented data-layer

    Inference and Evaluation of Deep Convolutional Neural Networks on Microchip's Hardware Accelerator VectorBlox

    No full text
    The exponential growth of artificial intelligence in recent years has created new opportunities across various industries, including the space sector. The increasing significance of CNNs has played a pivotal role in shaping modern AI advancements. As CNNs become more intricate, there arises a pressing need for efficient and automated toolflows to deploy them. FPGA-based solutions offer a promising avenue for acceleration due to their balanced performance, power efficiency, and programmability. This relevance is even more pronounced when considering space applications, where radiation-tolerant FPGAs can play a pivotal role in supporting AI tasks within the unique challenges of the space environment. Toolflows provide automation for intricate design tasks, substantially reducing complexity and effort. Within this context, the focus of this paper is to conduct an exploration of CNN-to-FPGA toolflows, with a particular emphasis on VectorBlox, a toolflow developed by Microchip. The study aims to conduct a comparative analysis between VectorBlox and similar toolflows, evaluating essential metrics to assess its performance, adaptability, and efficiency. Afterward, some tests were conducted to evaluate VectorBlox's performance. First, the inference times of some of the most common convolutional and fully connected neural network patterns are evaluated. Next, a comprehensive analysis of the inference of four NNs is reported, three of which were created by transfer learning with some of the best-known deep CNNs architectures (MobileNet, ResNet, and Inception), and trained on the EuroSAT dataset from ESA's Sentinel 2 mission. The concluding section presents a summary of the major limitations encountered when trying to infer unsupported nns

    Exploiting FPGA Dynamic Partial Reconfiguration for a Soft GPU-based System-on-Chip

    No full text
    For many years, General Purpose Computing on Graphic Processing Units has been widely exploited in different fields of application. The hardware architectures enabling this kind of computation are increasingly complex, and their use for on-the-edge applications is often constrained by the limited resources that characterise the systems involved. As such, implementing Graphic Processing Units as soft architectures on Field Programmable Gate Arrays could permit to tune their size, performance and resource usage accordingly to the application requirements. Exploiting the so-called Dynamic Partial Reconfiguration technology can allow specialisation of part of the system architecture, creating heterogeneous computing systems with better resource utilisation and lower power consumption. In this work, we describe the implementation on Field Programmable Gate Arrays of a System-on-Chip featuring a soft-Graphic Processing Unit, whose size and performance have been tuned by means of Partial Reconfiguration. Considering the Sobel Filter as a reference kernel, we discuss some results for reconfiguration time and throughput. Furthermore, we identify the minimum task sizes for which initiating the reconfiguration process gives an advantage in terms of execution time

    Highly Parameterised CGRA Architecture for Design Space Exploration of Machine Learning Applications Onboard Satellites

    No full text
    The adoption of Machine Learning solutions directly onboard in satellite missions is becoming more and more attractive for the space sector. Among the various kinds of hardware accelerators, ranging from highly efficient yet inflexible COTS to more versatile FPGA-based solutions, Coarse-Grained Reconfigurable Array architectures are gaining importance in the field. CGRAs find applications in various domains, including digital signal processing, image and video processing, and cryptography, thus being considered also for space-related applications. They comprise an array of Processing Elements, whose complexity is between FPGA logic cells and general-purpose processors, interconnected through a Network on Chip. They excel in handling data-flow graphs and can be more efficient than FPGAs for the execution of specific tasks. Their versatility hinges on various architectural aspects of the coarse-grained array of Processing Elements. Among them, the supported operations, the possible interconnections, and the pipeline stages impact the functionality, the area, the power consumption, and the maximum frequency of the accelerator. In this work, we present a highly parameterised CGRA-based accelerator that we developed for an extensive Design Space Exploration on these architectures. The description starts from the CGRA building blocks, the Functional Units, and progresses towards the top level of the architecture, represented by the Node component, which is composed of an NxM matrix of Processing Elements. For each level of the hierarchy, we describe the HDL design parameters affecting the run-time reconfigurability of the accelerator, delving deeper into the functionality of the architecture. In the last section, we present synthesis results on the 40nm Standard-Cell technology from TSMC, highlighting performance, power consumption and area occupation for many different combinations of design parameters

    VLSI Design of Advanced-Features AES Cryptoprocessor in the Framework of the European Processor Initiative

    Full text link
    This article presents a cryptographic hardware (HW) accelerator supporting multiple advanced encryption standard (AES)-based block cipher modes, including the more advanced cipher-based MAC (CMAC), counter with CBC-MAC (CCM), Galois counter mode (GCM), and XOR-encrypt-XOR-based tweaked-codebook mode with ciphertext stealing (XTS) modes. The proposed design implements advanced and innovative features in HW, such as AES key secure management, on-chip clock randomization, and access privilege mechanisms. The system has been tested in a RISC-V-based system-on-chip (SoC), specifically designed for this purpose, on an Ultrascale + Xilinx FPGA, analyzing resource and power consumption, together with system performances. The cryptoprocessor has been then synthesized on a 7-nm CMOS standard-cells technology; performances, complexity, and power consumption information are analyzed and compared with the state of the art. The proposed cryptoprocessor is ready to be embedded within the innovative European Processor Initiative (EPI) chip

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    A RISC-V Post Quantum Cryptography Instruction Set Extension for Number Theoretic Transform to Speed-Up CRYSTALS Algorithms

    Full text link
    In recent years, public-key cryptography has become a fundamental component of digital infrastructures. Such a scenario has to face a new and increasing threat, represented by quantum computers. It is well known that quantum computers in the next years will be able to run algorithms capable of breaking the security of currently widespread cryptographic schemes used for public-key cryptography. Post-quantum cryptography aims to define and execute algorithms on classical computer architectures, able to withstand attacks from quantum computers. The National Institute of Standards and Technology is currently running a selection process to define one or more quantum-resistant public-key algorithms and lattice-based cryptographic constructions are considered one of the leading candidates. However, such algorithms require non-negligible computational resources to be executed. One viable solution is to accelerate them totally or partially in hardware, to alleviate the workload of the main processing unit. In this paper, we investigate a solution trading-off performance and complexity to execute the lattice-based algorithms CRYSTALS-Kyber and -Dilithium: we introduce a dedicated Post-Quantum Arithmetic Logic Unit, embedded directly in the pipeline of a RISC-V processor. This results in an almost negligible area overhead with a large impact on the algorithms speed-up and a consistent reduction in the energy required per single operation
    corecore