Search CORE

1,721,024 research outputs found

AgrUNet: A Multi-GPU UNet Based Model for Crops Classification

Author: Calore Enrico
Schifano Sebastiano Fabio
Miola Andrea
Publication venue
Publication date: 01/01/2024
Field of study

Agriculture acts as a catalyst for comprehensive economic growth, boosting income levels, mitigating poverty, and contrasting hunger. For these reasons, it is important to monitor agricultural practices and the use of parcels carefully and automatically to support the development of sustainable use of natural resources. The deployment of high-resolution satellite missions, like LandSat and Copernicus Sentinel, combined with AI Deep Learning (DL) methodologies has revolutionized Earth Observation science, enabling studies on yield predictions, soil classifications, and crop mappings on large areas, and the analysis and processing of Big Data using innovative approaches. This approach requires high-performance computing systems since DL algorithms are known to be very computing-heavy, and recent multi-GPU HPC systems can boost by one or two orders of magnitude the processing power of classical computing systems based only on CPUs. In this study, we develop AgrUNet, a scalable, fast, and reliable UNet-based architecture DL model to perform crop classification on multispectral multitemporal satellite data, implemented and optimized to run on single and multi-GPU HPC systems. Our model achieves a Dice score of approximately 0.90, a peak throughput of 59 and 605 /s for the train and inference steps respectively, improving by approximately a factor 7X the best results reported in the literature and quite ideal speedup running both on a 4X V100 and 8X A100 GPU systems

Archivio istituzionale della ricerca - Università di Ferrara

A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures

Author: Tripiccione Raffaele
Calore Enrico
Schifano Sebastiano Fabio
Publication venue
Publication date: 01/01/2014
Field of study

AbstractThe architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one must carefully assess the relative costs of portability versus computing efficiency, and find a reasonable tradeoff point. In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL. We analyze its performance on several different state-of-the-art processors: NVIDIA GPUs and Intel Xeon-Phi many-core accelerators, as well as more traditional Ivy Bridge and Opteron multi-core commodity CPUs. We also compare with results obtained with codes specifically optimized for each of these systems. Our work shows that a properly structured OpenCL code runs on many different systems reaching performance levels close to those obtained by architecture-tuned CUDA or C codes

Elsevier - Publisher Connector

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Energy-Efficiency Evaluation of FPGAs for Floating-Point Intensive Workloads

Author: Calore Enrico
SCHIFANO Sebastiano Fabio
Schifano Sebastiano Fabio
Calore Enrico
Publication venue
Publication date: 01/01/2020
Field of study

In this work we describe a method to measure the computing performance and energy-efficiency to be expected of an FPGA device. The motivation of this work is given by their possible usage as accelerators in the context of floating-point intensive HPC workloads. In fact, FPGA devices in the past were not considered an efficient option to address floating-point intensive computations, but more recently, with the advent of dedicated DSP units and the increased amount of resources in each chip, the interest towards these devices raised. Another obstacle to a wide adoption of FPGAs in the HPC field has been the low level hardware knowledge commonly required to program them, using Hardware Description Languages (HDLs). Also this issue has been recently mitigated by the introduction of higher level programming framework, adopting so called High Level Synthesis approaches, reducing the development time and shortening the gap between the skills required to program FPGAs wrt the skills commonly owned by HPC software developers. In this work we apply the proposed method to estimate the maximum floating-point performance and energy-efficiency of the FPGA embedded in a Xilinx Zynq Ultrascale+ MPSoC hosted on a Trenz board

Archivio istituzionale della ricerca - Università di Ferrara

Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters

Author: Enrico Calore
Filippo Mantovani
Mantovani Filippo
Calore Enrico
Publication venue
Publication date: 01/01/2018
Field of study

Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version

Multidisciplinary Digital Publishing Institute

Crossref

UPCommons. Portal del coneixement obert de la UPC

Archivio istituzionale della ricerca - Università di Ferrara

UPCommons (Universitat Politècnica de Catalunya)

Porting a Lattice Boltzmann Simulation to FPGAs Using OmpSs

Author: Schifano Sebastiano Fabio
Calore Enrico
Publication venue
Publication date: 01/01/2020
Field of study

Reconfigurable computing, exploiting Field Programmable Gate Arrays (FPGA), has become of great interest for both academia and industry research thanks to the possibility to greatly accelerate a variety of applications. The interest has been further boosted by recent developments of FPGA programming frameworks which allows to design applications at a higher-level of abstraction, for example using directive based approaches. In this work we describe our first experiences in porting to FPGAs an HPC application, used to simulate Rayleigh-Taylor instability of fluids with different density and temperature using Lattice Boltzmann Methods. This activity is done in the context of the FET HPC H2020 EuroEXA project which is developing an energyefficient HPC system, at exa-scale level, based on Arm processors and FPGAs. In this work we use the OmpSs directive based programming model, one of the models available within the EuroEXA project. OmpSs is developed by the Barcelona Supercomputing Center (BSC) and allows to target FPGA devices as accelerators, but also commodity CPUs and GPUs, enabling code portability across different architectures. In particular, we describe the initial porting of this application, evaluating the programming efforts required, and assessing the preliminary performances on a Trenz development board hosting a Xilinx Zynq UltraScale+ MPSoC embedding a 16nm FinFET+ programmable logic and a multi-core Arm CPU

Archivio istituzionale della ricerca - Università di Ferrara

Optimization of lattice Boltzmann simulations on heterogeneous computers

Author: Gabbana Alessandro
Tripiccione Raffaele
Calore Enrico
Schifano Sebastiano Fabio
Publication venue
Publication date: 01/01/2019
Field of study

High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach, in which hosts offload almost all compute-intensive sections of the code onto accelerators; this approach only marginally exploits the computational resources available on the host CPUs, limiting overall performances. The obvious step forward is to run compute-intensive kernels in a concurrent and balanced way on both hosts and accelerators. In this paper, we consider exactly this problem for a class of applications based on lattice Boltzmann methods, widely used in computational fluid dynamics. Our goal is to develop just one program, portable and able to run efficiently on several different combinations of hosts and accelerators. To reach this goal, we define common data layouts enabling the code to exploit the different parallel and vector options of the various accelerators efficiently, and matching the possibly different requirements of the compute-bound and memory-bound kernels of the application. We also define models and metrics that predict the best partitioning of workloads among host and accelerator, and the optimally achievable overall performance level. We test the performance of our codes and their scaling properties using, as testbeds, HPC clusters incorporating different accelerators: Intel Xeon Phi many-core processors, NVIDIA GPUs, and AMD GPUs

Archivio istituzionale della ricerca - Università di Ferrara

Energy-Performance Tradeoffs for HPC Applications on Low Power Processors

Author: Sebastiano Fabio Schifano
TRIPICCIONE Raffaele
Enrico Calore
Raffaele Tripiccione
Calore Enrico
SCHIFANO Sebastiano Fabio
Publication venue
Publication date: 01/01/2015
Field of study

Energy efficiency is becoming more and more important in the HPC field; high-end processors are quickly evolving towards more advanced power-saving and power-monitoring technologies. On the other hand, low-power processors, designed for the mobile market, attract interest in the HPC area for their increasing computing capabilities, competitive pricing and low power consumption. In this work we study energy and computing performances of a Tegra K1 mobile processor using an HPC Lattice Boltzmann application as a benchmark. We run this application on the ARM Cortex-A15 CPU and on the GK20A GPU, both available in this processor. Our analysis uses time-accurate measurements, obtained by a simple custom-developed current monitor. We discuss several energy and performance metrics, interesting per se and also in view of a prospective use of these processors in a HPC context

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Advanced Performance Analysis of HPC Workloads on Cavium ThunderX

Author: Enrico Calore
Daniel Ruiz
Filippo Mantovani
Mantovani Filippo
Calore Enrico
Ruiz Daniel
Publication venue
Publication date: 01/01/2018
Field of study

The interest towards Arm based platforms as HPC solutions increased significantly during the last 5 years. In this paper we show that, in contrast to the early days of pioneer tests, several application performance analysis techniques can now be applied also to Arm based SoCs. To show the possibilities offered by the available tools, we provide as an example, the analysis of a Lattice Boltzmann HPC production code, highly optimized for several architectures and now ported also to Armv8. We tested it on a system based on a production silicon, Cavium CN8890 SoC. In particular, as performance analysis tools we adopt Extrae and Paraver, making use of the PAPI support, initially developed by us for the ThunderX platform, and now available also upstream. The contribution of this paper is twofold: first, we demonstrate that performance analysis tools available on standard HPC platforms, independently from the CPU providers, are nowadays available also for Arm SoCs; second, we actually optimize an HPC application for this platforms, showing similarities with other architectures.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [15], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara - dichiarazione dei redditi dell’anno 2014”. Cavium Inc. has kindly supported this research providing access to documentation and platforms.Postprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Archivio istituzionale della ricerca - Università di Ferrara

UPCommons (Universitat Politècnica de Catalunya)

Multi-Node Advanced Performance and Power Analysis with Paraver

Author: Enrico Calore
Filippo Mantovani
Mantovani Filippo
Calore Enrico
Publication venue
Publication date: 01/01/2018
Field of study

Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a preliminary performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. Moreover we show how the same analysis techniques are applicable on different architectures, analyzing the same HPC application running on two clusters, based respectively on Intel Haswell and Arm Cortex-A57 CPUs.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects, grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Universit`a degli Studi di Ferrara - dichiarazione dei redditi dell’anno 2014”.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Archivio istituzionale della ricerca - Università di Ferrara

UPCommons (Universitat Politècnica de Catalunya)

Energy-Efficiency Evaluation of Intel KNL for HPC Workloads

Author: Gabbana Alessandro
Tripiccione Raffaele
Calore Enrico
Schifano Sebastiano Fabio
Publication venue
Publication date: 01/01/2018
Field of study

In this work we focus on energy performance of the Knights Landing Xeon Phi, the latest many-core architecture processor introduced by Intel for the HPC market. We take into account the 64-core Xeon Phi 7230, and analyze the computing and energy efficiency using both the on-chip MCDRAM and the off-chip DDR4 memory as main storage for the application data domain. As a benchmark application we use a Lattice Boltzmann code heavily optimized for this architecture, and implemented using different memory data layouts to store the data-domain. We then assess the energy consumption using different data-layouts, memory configurations (DDR4 or MCDRAM), and number of threads per core

Archivio istituzionale della ricerca - Università di Ferrara