1,721,012 research outputs found
Exploring Single Source Shortest Path Parallelization on Shared Memory Accelerators
Single Source Shortest Path (SSSP) algorithms are widely used in embedded systems for several applications. The emerging trend towards the adoption of heterogeneous designs in embedded devices, where low-power parallel accelerators are coupled to the main processor, opens new opportunities to deliver superior performance/watt, but calls for efficient parallel SSSP implementation. In this work we provide a detailed exploration of the Î -stepping algorithm performance on a representative heterogeneous embedded system, TI Keystone II, considering the impact of several parallelization parameters (threading, load balancing, synchronization)
3D CV descriptor on parallel heterogeneous platforms
Embedded three-dimensional (3D) Computer Vision (CV) is considered a technology enabler for future consumer applications, attracting a wide interest in academia and industry. However, 3D CV processing is a computation-intensive task. Its high computational cost is directly related to the processing of 3D point clouds, with the 3D descriptor computation representing one of the main bottlenecks. Understanding the main computational challenges of 3D CV applications, as well as the key characteristics, enabling features, and limitations of current computing platforms, is clearly strategic to identify the directions of evolution for future embedded processing systems targeting 3D CV. In this work, an innovative and complex 3D descriptor (called SHOT) has been ported on a high-end and an embedded computing platform. The high-end system is composed by a high-performance Intel CPU coupled with a Nvidia GPU. The embedded platform is, instead, composed by an ARM-based processor, coupled with the STHORM accelerator. STHORM is a many-core low-power accelerator developed by ST Microelectronics, featuring up to 64 computational units. The SHOT descriptor has been parallelized using the OpenCL programming model for both platforms. Finally, we have performed an in-depth performance comparison and analysis between general-purpose processors and accelerators in both high-end and embedded domains, discussing and highlighting the main differences in the Hardware/Software (HW/SW) design methodologies and approaches between high-end and embedded systems targeting 3D CV applications
Ultra low-power visual odometry for nano-scale unmanned aerial vehicles
One of the fundamental functionalities for autonomous navigation of Unmanned Aerial Vehicles (UAVs) is the hovering capability. State-of-the-art techniques for implementing hovering on standard-size UAVs process camera stream to determine position and orientation (visual odometry). Similar techniques are considered unaffordable in the context of nano-scale UAVs (i.e. few centimeters of diameter), where the ultra-constrained power-envelopes of tiny rotor-crafts limit the onboard computational capabilities to those of low-power microcontrollers. In this work we study how the emerging ultra-low-power parallel computing paradigm could enable the execution of complex hovering algorithmic flows onto nano-scale UAVs. We provide insight on the software pipeline, the parallelization opportunities and the impact of several algorithmic enhancements. Results demonstrate that the proposed software flow and architecture can deliver unprecedented GOPS/W, achieving 117 frame-per-second within a power envelope of 10 mW
Accelerated Visual Context Classification on a Low-Power Smartwatch
Data produced by wearable sensors is key in con- texts such as performance enhancement and training help for sports and fitness, continuous monitoring for aging people and for chronic disease management, and in gaming and entertainment. Unfortunately, wearable devices currently on the market are either incapable of complex functionality or severely impaired by short battery lifetime. In this work, we present a smartwatch platform based on an ultra-low power (ULP) heterogeneous system composed by a TI MSP430 microcontroller, the PULP programmable parallel accelerator and a set of ULP sensors, including a camera. The embedded PULP accelerator enables state-of-the-art context classification based on Convolutional Neu- ral Networks (CNNs) to be applied within a sub-10mW system power envelope. Our methodology enables to reach high accuracy in context classification over 5 classes (up to 84%, with 3 classes over 5 reaching more than 90% accuracy), while consuming 2.2mJ per classification, or an ultra-low energy consumption of less than 91uJ per classification with an accuracy of 64% - 3.2× better than chance. Our results suggest that the proposed heterogeneous platform can provide up to 500× speedup with respect to the MSP430 within a similar power envelope, which would enable complex computer vision algorithms to be executed in highly power-constrained scenarios
Energy-Efficient, Precise UWB-Based 3-D Localization of Sensor Nodes With a Nano-UAV
Smart interaction between autonomous centimeter-scale unmanned aerial vehicles (i.e., nano-UAVs) and Internet of Things (IoT) sensor nodes is an upcoming high-impact scenario. This work tackles precise 3-D localization of indoor edge nodes with an autonomous nano-UAV without prior knowledge of their position. We employ ultrawideband (UWB) and wake-up radio (WUR) technologies: we perform UWB-based ranging and data exchange between the nano-UAV and the nodes, while the WUR minimizes the sensors' power consumption. UWB-based precise localization requires addressing multiple sources of error, such as UWB-ranging noise and UWB antennas' uneven radiation pat-tern. The limited computational resources aboard a nano-UAV further complicate this scenario, requiring real-time execution of the localization algorithm within a microcontroller unit (MCU). We propose a novel UWB-based localization system for nano-UAVs, composed by: 1) a lightweight localization algorithm; 2) an optimal flight strategy; and 3) a ranging-error-correction model. Our 3-D flight policy requires only five UWB measurements to feed the localization algorithm, which bounds the localization error within 28 cm and runs in 1.2 ms on a Cortex-M4 MCU. Localization accuracy is improved by an additional 25% thanks to a novel error-correction model. Leveraging the WUR, the entire localization/data-exchange cycle costs only 24 mJ at the sensor node, which is 50 times more energy efficient than the state of the art with comparable localization accuracy
Extending the Lifetime of Nano-Blimps via Dynamic Motor Control
Nano-sized unmanned aerial vehicles (UAVs), e.g. quadcopters, have received significant attention in recent years. Although their capabilities have grown, they continue to have very limited flight times, tens of minutes at most. The main constraints are the battery’s energy density and the engine power required for flight. In this work, we present a nano-sized blimp platform, consisting of a helium balloon and a rotorcraft. Thanks to the lift provided by helium, the blimp requires relatively little energy to remain at a stable altitude. This lift, however, decreases with time as the balloon inevitably deflates requiring additional control mechanisms to keep the desired altitude. We study how duty-cycling high power actuators can further reduce the average energy requirements for hovering. With the addition of a solar panel, it is even feasible to sustain tens or hundreds of flight hours in modest lighting conditions. Furthermore, we study how a balloon’s deflation rate affects the blimp’s energy budget and lifetime. A functioning 68-gram prototype was thoroughly characterized and its lifetime was measured under different harvesting conditions and different power management strategies. Both our system model and the experimental results indicate our proposed platform requires less than 200 mW to hover indefinitely with an ideal balloon. With a non-ideal balloon the maximum lifetime of ∼400 h is bounded by the rotor’s maximum thrust. This represents, to the best of our knowledge, the first nano-size UAV for long term hovering with low power requirements
On the accuracy of near-optimal CPU-based path planning for UAVs
Path planning is one of the key functional blocks for any autonomous aerial vehicle (UAV). The goal of a path planner module is to constantly update the route of the vehicle based on information sensed in real-time. Given the high computational requirements of this task, heterogeneous many-cores are appealing candidates for its execution. Approximate path computation has proven a promising approach to reduce total execution time, at the cost of a slight loss in accuracy. In this work we study performance and accuracy of state-of-the-art, near-optimal parallel path planning in combination with program transformations aimed at ensuring efficient use of embedded GPU resources. We propose a profile-based algorithmic variant which boosts GPU execution by up to â 7Ã , while maintaining the accuracy loss below 5%
Ultra Low Power Deep-Learning-powered Autonomous Nano Drones
Flying in dynamic, urban, highly-populated environments represents an open problem in robotics. State-of-the-art (SoA) autonomous Unmanned Aerial Vehicles (UAVs) employ advanced computer vision techniques based on computationally expensive algorithms, such as Simultaneous Localization and Mapping (SLAM) or Convolutional Neural Networks (CNNs) to navigate in such environments. In the Internet-of-Things (IoT) era, nano-size UAVs capable of autonomous navigation would be extremely desirable as self-aware mobile IoT nodes. However, autonomous flight is considered unaffordable in the context of nano-scale UAVs, where the ultra-constrained power envelopes of tiny rotor-crafts limit the on-board computational capabilities to low-power microcontrollers. In this work, we present the first vertically integrated system for fully autonomous deep neural network-based navigation on nano-size UAVs. Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and deployed on a 27 g commercial, opensource CrazyFlie 2.0 nano-quadrotor. We discuss a methodology and software mapping tools that enable the SoA CNN presented in [1] to be fully executed on-board within a strict 12 fps realtime constraint with no compromise in terms of flight results, while all processing is done with only 94 mW on average - 1% of the power envelope of the deployed nano-aircraft.NCCR-ROBOTIC
Target following on nano-scale Unmanned Aerial Vehicles
Unmanned Aerial Vehicles (UAVs) with high level autonomous navigation capabilities are a hot topic both in industry and academia due to their numerous applications. However, autonomous navigation algorithms are demanding from the computational standpoint, and it is very challenging to run them on-board of nano-scale UAVs (i.e., few centimeters of diameter) because of the limited capabilities of their MCU-based controllers. This work focuses on the object tracking capability, (i.e., target following capability) on such nano-UAVs. We present a lightweight hardware-software solution, bringing autonomous navigation on a commercial platform using only on-board computational resources. Furthermore, we evaluate a parallel ultra-low-power (PULP) platform that enables the execution of even more sophisticated algorithms. Experimental results demonstrate the benefits of our solution, achieving accurate target following using an ARM Cortex M4 microcontroller consuming â 130mW. Our evaluation on a PULP architecture shows the proposed solution running up-To 60 frame-per second in a power envelope of â 30mW leaving more than 70% of the computational resources free for further on-board processing of more complex algorithms
A Relative Infrastructure-less Localization Algorithm for Decentralized and Autonomous Swarm Formation
Decentralized and autonomous control of Unmanned Aerial Vehicle (UAV) swarms is a key enabler for cooperative systems and infrastructure-less formation flights. However, UAVs often lack reliable heading angle measurements, especially in indoor scenarios, space, and GNSS-denied environments, posing an additional observability challenge on range-based relative localization. We tackle this problem by proposing a novel solution enhancing the classical tag-and-anchor trilateration. The proposed solution relies on Ultra-wideband range measurements and addresses the relative pose estimation between pairs of UAVs under relative motion. Furthermore, it does not require any explicit motion pattern or initialization procedure and leverages an approximate maximum-likelihood algorithm to recursively solve the relative localization problem with constant computational complexity. The method has been implemented and demonstrated through field experiments, where a swarm of nano-UAVs positioned themselves with respect to a leader in a nearly-static formation with an average error of 38.5 cm and a convergence time of 25 s. The achieved formation accuracy is similar to the one achieved by the state-of-the-art EKF-based leader-follower methods
- …
