1,721,090 research outputs found
Synchronization-free algorithms for exascale and beyond : A study of Asynchronous and Batched Iterative Methods
Computing at scale has enabled scientific discovery in various fields, such as bioinformatics, energy, healthcare, and transportation. The algorithmic and computing landscapes must adapt as our mathematical models grow, aiming to provide a more accurate and realistic view of our physical world.
In the exascale era, extracting maximum performance from a system requires efficient algorithms that can take advantage of the massive parallelism that these machines provide. The heterogeneous nature of these machines necessitates efficient implementations at single and multi-GPU levels, with GPUs providing most of the parallelism. Ensuring minimal synchronization bottlenecks is paramount for efficient computation across this whole hierarchy. Synchronization for information exchange is required for many state-of-the-art algorithms such as linear solvers, which form the workhorse of many scientific applications. Minimizing or removing these synchronizations can accelerate applications.
In this work, we look at two techniques that minimize synchronizations between parallel computing units. The first technique, batching, maximizes the available parallelism on a Graphics Processing unit (GPU) by utilizing the perfect parallelism available for the solution of independent but related linear systems. Mapping these independent linear system solutions at the appropriate compute hierarchy level enables almost perfect scaling for single and multi-GPU systems. With a careful design of the data structures and the linear solver kernels, we provide a high-performance implementation of the batched solvers that significantly outperform state-of-the-art implementations. We showcase the benefits of these batched solvers for problems from different origins, including integrations into two real-world applications.
The second technique removes existing synchronizations by following a data asynchronous approach. In this case, the parallel computing elements process the latest available local data and incorporate any required data from their neighbors asynchronously. Using a probabilistic model, we study and analyze these asynchronous iterative methods. We also showcase the benefits of using these asynchronous methods with an efficient GPU implementation that outperforms the synchronous variant. To enable scaling to multiple GPUs, we implement, evaluate, and analyze the asynchronous Schwarz methods and show that they can beat their synchronous counterparts for realistic test cases.
Synchronization-free techniques help accelerate scientific simulations, enabling them to maximize the available compute resources. This allows scientists to efficiently scale up their simulations, gain a deeper understanding of the physical phenomena, and perform ground-breaking science to further scientific inquiry
An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.Deutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Karlsruher Institut für Technologie http://dx.doi.org/10.13039/10000913
Batched iterative solvers in Ginkgo: doctoral thesis software companion
<p>This software is a companion to the author's doctoral thesis. It contains algorithms, benchmarks and examples for batched matrix formats, iterative solvers and preconditioners. </p><p>Please send any relevant questions to [email protected]</p>
Recommended from our members
A Block-Asynchronous Relaxation Method for Graphics Processing Units
Recommended from our members
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement
An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.Deutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Karlsruher Institut für Technologie http://dx.doi.org/10.13039/10000913
Asynchronous and Multiprecision Linear Solvers - Scalable and Fault-Tolerant Numerics for Energy Efficient High Performance Computing
Asynchronous methods minimize idle times by removing synchronization barriers, and therefore allow the efficient usage of computer systems. The implied high tolerance with respect to communication latencies improves the fault tolerance. As asynchronous methods also enable the usage of the power and energy saving mechanisms provided by the hardware, they are suitable candidates for the highly parallel and heterogeneous hardware platforms that are expected for the near future
Portable Mixed Precision Algebraic Multigrid on High Performance GPUs
Multigrid methods are algorithms for solving partial differential equations (PDE) by generating a hierarchy of successively coarser discretizations and recursively using the solution on a coarser grid to update the solution on a finer grid. The methods are attractive as they avoid the need for an expensive solver with quadratic cost on the fine discretization of the problem. Algebraic Multigrid (AMG) generalizes this concept to problems that do not originate as the discretization of a PDE but build up the hierarchy on the system matrix of the linear system. Because of its robustness and efficiency, AMG has become a central component of many scientific computing applications in academia and industry. With modern supercomputers increasingly incorporating GPU accelerators and low precision support, its popularity raises demand for redesigning to leverage the fine-grain parallelism of GPUs and employ mixed precision strategies to reduce the runtime and memory footprint.
In this dissertation, we design and implement the first open-source high-performance AMG implementation that allows users to choose the precision format used in the distinct grid levels individually while providing platform portability across GPUs from AMD, Intel, and NVIDIA. We note that the development of this algorithm is heavily intertwined with the development of the GINKGO open-source software library. For this reason, the dissertation also includes significant detail on how the mixed precision AMG algorithm influenced and extended the design and capabilities of the GINKGO library. We explain how we extended the scope of GINKGO from supporting NVIDIA GPUs to supporting GPUs from AMD and Intel. We discuss how the sparse matrix vector product (SpMV) is the backbone of many sparse applications and demonstrate that optimizing this kernel improves AMG performance immediately. We show that the developed high-performance AMG embraces flexibility in terms of AMG options and portability in terms of supporting different hardware platforms while remaining competitive with vendor libraries like NVIDIA\u27s AmgX implementation. Finally, we introduce the idea of using lower precision formats for subsequent matrices to enhance the performance and memory footprint. We use experiments on real-world applications to showcase the numerical challenges that can arise and discuss problem-specific algorithmic strategies to overcome these. In performance experiments, we demonstrate that using low precision or a hierarchy of lower precision formats can reduce the overall execution time when using AMG in production for real-world problems
- …
