1,721,071 research outputs found

    Overlapping Communication with Computation in MPI Applications

    Full text link
    In High Performance Computing (HPC), minimizing communication overhead is one of the most important goals in order to get high performance. This is more than ever important on exascale platforms, where there will be a much higher degree of parallelism compared to petascale platforms, resulting in increased communication overhead with considerable impact on application execution time and energy expenses. A good strategy for containing this overhead is to hide communication costs by overlapping them with computation. Despite the increasing interest in achieving computation/communication overlapping, details about the reasons that prevent it from succeeding are not easy to find, leading to confusion and poor application optimization. The Message Passing Interface (MPI) library, a de-facto standard in the HPC world, has always provided non-blocking communication routines able, in theory, to achieve communication/computation overlapping. Unfortunately, several factors related with the MPI independent progress and offload capability of the underlying network, make this overlap hard do achieve. With the introduction of one-sided communication routines, providing high quality MPI implementations, able to progress communication independently, is becoming as important as providing low latency and high bandwidth communication. In this paper, we gather the most significant contributions about computation/communication overlapping and provide technical explanation of how such overlap can be achieved on modern supercomputers

    BootCMatch: A software package for bootstrap AMG based on graph weighted matching

    No full text
    This article has two main objectives: one is to describe some extensions of an adaptive Algebraic Multigrid (AMG) method of the form previously proposed by the first and third authors, and a second one is to present a new software framework, named BootCMatch, which implements all the components needed to build and apply the described adaptive AMG both as a stand-alone solver and as a preconditioner in a Krylov method. The adaptive AMG presented is meant to handle general symmetric and positive definite (SPD) sparse linear systems, without assuming any a priori information of the problem and its origin; the goal of adaptivity is to achieve a method with a prescribed convergence rate. The presented method exploits a general coarsening process based on aggregation of unknowns, obtained by a maximum weight matching in the adjacency graph of the system matrix. More specifically, a maximum product matching is employed to define an effective smoother subspace (complementary to the coarse space), a process referred to as compatible relaxation, at every level of the recursive two-level hierarchical AMG process. Results on a large variety of test cases and comparisons with related work demonstrate the reliability and efficiency of the method and of the software

    Social ski driver conditional autoregressive-based deep learning classifier for flight delay prediction

    Full text link
    The importance of robust flight delay prediction has recently increased in the air transportation industry. This industry seeks alternative methods and technologies for more robust flight delay prediction because of its significance for all stakeholders. The most affected are airlines that suffer from monetary and passenger loyalty losses. Several studies have attempted to analysed and solve flight delay prediction problems using machine learning methods. This research proposes a novel alternative method, namely social ski driver conditional autoregressive-based (SSDCA-based) deep learning. Our proposed method combines the Social Ski Driver algorithm with Conditional Autoregressive Value at Risk by Regression Quantiles. We consider the most relevant instances from the training dataset, which are the delayed flights. We applied data transformation to stabilise the data variance using Yeo-Johnson. We then perform the training and testing of our data using deep recurrent neural network (DRNN) and SSDCA-based algorithms. The SSDCA-based optimisation algorithm helped us choose the right network architecture with better accuracy and less error than the existing literature. The results of our proposed SSDCA-based method and existing benchmark methods were compared. The efficiency and computational time of our proposed method are compared against the existing benchmark methods. The SSDCA-based DRNN provides a more accurate flight delay prediction with 0.9361 and 0.9252 accuracy rates on both dataset-1 and dataset-2, respectively. To show the reliability of our method, we compared it with other meta-heuristic approaches. The result is that the SSDCA-based DRNN outperformed all existing benchmark methods tested in our experiment

    Sparse approximate inverse preconditioners on high performance GPU platforms

    Full text link
    Simulation with models based on partial differential equations often requires the solution of (sequences of) large and sparse algebraic linear systems. In multidimensional domains, preconditioned Krylov iterative solvers are often appropriate for these duties. Therefore, the search for efficient preconditioners for Krylov subspace methods is a crucial theme. Recent developments, especially in computing hardware, have renewed the interest in approximate inverse preconditioners in factorized form, because their application during the solution process can be more efficient. We present here some experiences focused on the approximate inverse preconditioners proposed by Benzi and Tůma from 1996 and the sparsification and inversion proposed by van Duin in 1999. Computational costs, reorderings and implementation issues are considered both on conventional and innovative computing architectures like Graphics Programming Units (GPUs)

    Automatic coarsening in Algebraic Multigrid utilizing quality measures for matching-based aggregations

    No full text
    In this paper, we discuss the convergence of an Algebraic MultiGrid (AMG) method for general symmetric positive-definite matrices. The method relies on an aggregation algorithm, named coarsening based on compatible weighted matching, which exploits the interplay between the principle of compatible relaxation and the maximum product matching in undirected weighted graphs. The results are based on a general convergence analysis theory applied to the class of AMG methods employing unsmoothed aggregation and identifying a quality measure for the coarsening; similar quality measures were originally introduced and applied to other methods as tools to obtain good quality aggregates leading to optimal convergence for M-matrices. The analysis, as well as the coarsening procedure, is purely algebraic and, in our case, allows an a posteriori evaluation of the quality of the aggregation procedure which we apply to analyze the impact of approximate algorithms for matching computation and the definition of graph edge weights. We also explore the connection between the choice of the aggregates and the compatible relaxation convergence, confirming the consistency between theories for designing coarsening procedures in purely algebraic multigrid methods and the effectiveness of the coarsening based on compatible weighted matching. We discuss various completely automatic algorithmic approaches to obtain aggregates for which good convergence properties are achieved on various test cases

    Heterogeneous CAF-based load balancing on Intel Xeon Phi

    No full text
    In order to reach challenging performance goals, computer architectures willchange significantly in the next future. Heterogeneous chips, equipped with different types of cores and memory will compel application developers to deal with irregularcommunication patterns, high parallelism, and unexpected behaviors. Load balancing among the heterogeneous compute units will be a critical task in order to exploit all the computational power providedby such new architectures. In this highly dynamic scenario, Partitioned Global Address Space (PGAS) languages, like Coarray Fortran (CAF), appear to be a promising alternativeto standard MPI programming using two-sided communications, in particularbecause of their one-sided semantic. In this work, we show how Coarray Fortran can be used for implementingdynamic load balancing algorithms on an exascale compute node and how these algorithms can produce performancebenefits for an Asian option pricing problem, running in symmetricmode on Intel Xeon Phi (KNC)
    corecore