1,720,980 research outputs found

    Subcube matrix decomposition: A unifying view for LU factorization on multicomputers

    No full text
    This paper represents the first attempt towards a decomposition-independent implementation of parallel algorithms for matrix computations. Main framework is the Subcube Matrix Decomposition, given in this work, that allows to view in a unifying way the most diffuse matrix decompositions on multicomputers. Its decomposition-independent properties extend, moreover, to the design of parallel algorithms, to their optimization, and to the performance analysis. We have verified all these characteristics by implementing an LU factorization meta-algorithm that unifies the known parallel programs and allows a decomposition-independent performance analysis both analytical and experimental

    Dynamic data decomposition in a message-passing environment

    No full text
    The performance of a data parallel program is critically dependent on the data decomposition that the programmer chooses at implementation time. This choice must take into account a combination of different factors such as the kind of the problem, the machine architecture and the data domain size. When these elements are known before execution, the programmer can adopt traditional message-passing languages and optimise performance by means of programs which are dependent on the chosen data decomposition. On the other hand, when the factors that determine the best decomposition are known at run-time only, adequate efficiency can be achieved by a code that dynamically adapts its computation/communication pattern to various decompositions. To assist the programmer in the implementation of a decomposition-independent code, we propose a new programming environment, namely PLUS. It provides the programmer with message-passing primitives that avoid the specification of a data decomposition during the implementation phase, and a run-time support that permits dynamic changes among regular decompositions without affecting the implemented program

    The row/column pivoting strategy on multicomputers

    No full text
    On multicomputers the partial pivoting phase of the LU factorization has a peculiar load unbalancing due to the presence of idle processors in most matrix decompositions. Moreover, intrinsic synchronization barriers do not allow a complete masking of this overhead by means of pipelining techniques. We propose to reduce load unbalancing by “assigning extra work to idle processors”; this leads to a new pivoting strategy, named row/column pivoting, that is mainly attractive to 2D decompositions. Row/column pivoting furnishes an LU factorization algorithm that guarantees better numerical stability at the same cost of partial pivoting in case of square decomposition. A further improvement is achieved by adding pipelining schemes to the naive form. In the design of the algorithms and in their evaluation we have adopted a new environment that allows a decomposition-independent parallel programming

    Load balancing and parallel implementation of iterative algorithms for row-continuous Markov chains

    No full text
    Presents the first parallel algorithms for solving row-continuous or generalized birth-death (GBD) Markov chains on distributed memory MIMD multiprocessors. These systems are characterized by very large transition probability matrices, decomposable in heterogeneous tridiagonal blocks. The parallelization of three aggregation/disaggregation iterative methods is carried out by a unique framework that keeps into account the special matrix structure. Great effort has been also devoted to define a general algorithm for approximating the optimum workload. Various computational experiments show that Vantilborgh's (1985) method is the fastest of the three algorithms on any data set dimensio

    Unifying and optimizing parallel linear algebra algorithms

    No full text
    Two issues in linear algebra algorithms for multicomputers are addressed. First, how tounify parallel implementations of the same algorithm in a decomposition-independent way. Second, how to optimize naive parallel programs maintaining the decompositionindependence. Several matrix decompositions are viewed as instances of a more generalallocation function called subcube matrix decomposition. By this meta-decomposition, aprogramming environment characterized by general primitives that allow one to designmeta-algorithms independently of a particular decomposition. The authors apply such aframework to the parallel solution of dense matrices. This demonstrates that most of theexisting algorithms can be derived by suitably setting the primitives used in themeta-algorithm. A further application of this programming style concerns the optimization of parallel algorithms. The idea to overlap communication and computation has been extended from 1-D decompositions to 2-D decompositions. Thus, a first attempt towards a decomposition-independent definition of such optimization strategies is provided

    Dynamic data reconfiguration for SPMD programs in faulty multicomputers

    No full text
    The single-program multiple-data (SPMD) paradigm is becoming the most diffuse way to program commercial multicomputers. In this paper we demonstrate that for a wide class of SPMD algorithms it is possible to achieve an efficient fault tolerance avoiding hardware redundancy. We propose a software approach that aims to reconfigure data, thus achieving a good slowdown in computation owing to the fine granularity of the workload to redistribute. In particular, we present and compare three data reconfiguration strategies applied to a problem model that includes a wide class of SPMD iterative algorithms characterized by nonlocal communications among the nodes. The result is that in most of the cases it is better to introduce some communication overhead than to leave idle a few healthy processor
    corecore