MIMS EPrints
Not a member yet
    2151 research outputs found

    Squeezing a Matrix into Half Precision, with an Application to Solving Linear Systems

    Full text link
    Motivated by the demand in machine learning, modern computer hardware is increas- ingly supporting reduced precision floating-point arithmetic, which provides advantages in speed, energy, and memory usage over single and double precision. Given the availability of such hardware, mixed precision algorithms that work in single or double precision but carry out part of a compu- tation in half precision are now of great interest for general scientific computing tasks. Because of the limited range of half precision arithmetic, in which positive numbers lie between 6 × 10−8 and 7 × 104, a straightforward rounding of single or double precision data into half precision can lead to overflow, underflow, or subnormal numbers being generated, all of which are undesirable. We develop an algorithm for converting a matrix from single or double precision to half precision. It first applies two-sided diagonal scaling in order to equilibrate the matrix (that is, to ensure that every row and column has ∞-norm 1), then multiplies by a scalar to bring the largest element within a factor θ ≤ 1 of the overflow level, and finally rounds to half precision. The second step ensures that full use is made of the limited range of half precision arithmetic, and θ must be chosen to allow sufficient headroom for subsequent computations. We apply the new algorithm to GMRES-based iterative re- finement (GMRES-IR), which solves a linear system Ax = b with single or double precision data by LU factorizing A in half precision and carrying out iterative refinement with the correction equations solved by GMRES preconditioned with the low precision LU factors. Previous implementations of this algorithm have used a crude conversion to half precision that our experiments show can cause slow convergence of GMRES-IR for badly scaled matrices or failure to converge at all. The new conversion algorithm computes ∞-norms of rows and columns of the matrix and its cost is negligible in the context of LU factorization. We show that it leads to faster convergence of GMRES-IR for badly scaled matrices and thereby allows a much wider class of problems to be solved

    An optimal iterative solver for symmetric indefinite linear systems with PDE origins: Balanced black-box stopping tests

    Full text link
    This work discusses the design of efficient algorithms for solving symmetric indefinite linear systems arising from FEM approximation of PDEs. The distinctive feature of the preconditioned MINRES solver that is used here is the incorporation of error control in the ‘natural norm’ in combination with an effective a posteriori estimator for the PDE approximation error. This leads to a robust and optimal blackbox stopping criterion: the iteration is terminated as soon as the algebraic error is insignificant compared to the approximation error

    Squeezing a Matrix into Half Precision, with an Application to Solving Linear Systems

    Full text link
    Motivated by the demand in machine learning, modern computer hardware is increas- ingly supporting reduced precision floating-point arithmetic, which provides advantages in speed, energy, and memory usage over single and double precision. Given the availability of such hardware, mixed precision algorithms that work in single or double precision but carry out part of a compu- tation in half precision are now of great interest for general scientific computing tasks. Because of the limited range of half precision arithmetic, in which positive numbers lie between 6 × 10−8 and 7 × 104, a straightforward rounding of single or double precision data into half precision can lead to overflow, underflow, or subnormal numbers being generated, all of which are undesirable. We develop an algorithm for converting a matrix from single or double precision to half precision. It first applies two-sided diagonal scaling in order to equilibrate the matrix (that is, to ensure that every row and column has ∞-norm 1), then multiplies by a scalar to bring the largest element within a factor θ ≤ 1 of the overflow level, and finally rounds to half precision. The second step ensures that full use is made of the limited range of half precision arithmetic, and θ must be chosen to allow sufficient headroom for subsequent computations. We apply the new algorithm to GMRES-based iterative re- finement (GMRES-IR), which solves a linear system Ax = b with single or double precision data by LU factorizing A in half precision and carrying out iterative refinement with the correction equations solved by GMRES preconditioned with the low precision LU factors. Previous implementations of this algorithm have used a crude conversion to half precision that our experiments show can cause slow convergence of GMRES-IR for badly scaled matrices or failure to converge at all. The new conversion algorithm computes ∞-norms of rows and columns of the matrix and its cost is negligible in the context of LU factorization. We show that it leads to faster convergence of GMRES-IR for badly scaled matrices and thereby allows a much wider class of problems to be solved

    A New Preconditioner that Exploits Low-Rank Approximations to Factorization Error

    No full text
    We consider ill-conditioned linear systems Ax=Ax = b that are to be solved iteratively, and assume that a low accuracy LU factorization AL^U^A \approx \widehat{L}\widehat{U} is available for use in a preconditioner. We have observed that for ill-conditioned matrices AA arising in practice, A1A^{-1} tends to be numerically low rank, that is, it has a small number of large singular values. Importantly, the error matrix E=U^1L^1AIE = \widehat{U}^{-1}\widehat{L}^{-1}A - I tends to have the same property. To understand this phenomenon we give bounds for the distance from EE to a low-rank matrix in terms of the corresponding distance for A1A^{-1}. We then design a novel preconditioner that exploits the low-rank property of the error to accelerate the convergence of iterative methods. We apply this new preconditioner in three different contexts fitting our general framework: low floating-point precision (e.g., half precision) LU factorization, incomplete LU factorization, and block low-rank LU factorization. In numerical experiments with GMRES-based iterative refinement we show that our preconditioner can achieve a significant reduction in the number of iterations required to solve a variety of real-life problems

    Multiprecision Algorithms for Computing the Matrix Logarithm

    No full text
    Two algorithms are developed for computing the matrix logarithm in floating point arithmetic of any specified precision. The backward error-based approach used in the state of the art inverse scaling and squaring algorithms does not conveniently extend to a multiprecision environment, so instead we choose algorithmic parameters based on a forward error bound. We derive a new forward error bound for Pad\'{e} approximants that for highly nonnormal matrices can be much smaller than the classical bound of Kenney and Laub. One of our algorithms exploits a Schur decomposition while the other is transformation-free and uses only the computational kernels of matrix multiplication and the solution of multiple right-hand side linear systems. For double precision computations the algorithms are competitive with the state of the art algorithm of Al-Mohy, Higham, and Relton implemented in \texttt{logm} in MATLAB\@. They are intended for computing environments providing multiprecision floating point arithmetic, such as Julia, MATLAB via the Symbolic Math Toolbox or the Multiprecision Computing Toolbox, or Python with the mpmath or SymPy packages. We show experimentally that the algorithms behave in a forward stable manner over a wide range of precisions, unlike existing alternatives

    Filtering Frequencies in a Shift-and-invert Lanczos Algorithm for the Dynamic Analysis of Structures

    Full text link
    The shift-and-invert Lanczos algorithm is a commonly used solution procedure to compute the eigenpairs of large, sparse eigenvalue problems that arise when approximating the elastic dynamic response of large structures under the influence of seismic forces. Not all eigenvectors are equally important to the response when the structure is exposed to a mass-dependent external force of the form g(t)Mbg(t) Mb, where MM is the mass matrix of the system and bb the rigid body vector. Structural engineers select eigenvectors xjx_j, j=1,,j=1,\ldots,\ell, such that their cumulative mass participation, measured as j=1(xjTMb)2/(bTMb)\sum_{j=1}^\ell (x_j^T Mb)^2/(b^TMb), is above a target threshold ξ\xi. We show that when the starting vector for the unshifted Lanczos algorithm is the spatial distribution vector bb, the Lanczos procedure can be used to provide an estimate of the cumulative mass participation. This allows us to identify intervals containing eigenvalues whose eigenvectors have a large contribution to the cumulative mass participation and filter out intervals containing eigenvalues whose eigenvectors have a negligible contribution. We use this information to devise a sequence of shifts σ1,,σp\sigma_1, \ldots,\sigma_p for the shift-and-invert Lanczos algorithm as well as a stopping criterion for the iteration with shift σi\sigma_i so that the cumulative mass participation of the computed eigenvectors reaches the required level ξ\xi. Numerical experiments on real engineering problems show that our approach computes up to 80%80\% fewer eigenvectors and requires fewer shifts, on average, than the more general shifting strategy proposed by Ericsson and Ruhe (Math. Comp., 35 (1980))

    Extracellular matrix fragmentation in young, healthy cartilaginous tissues

    Full text link
    Although the composition and structure of cartilaginous tissues is complex, collagen II fibrils and aggrecan are the most abundant assemblies in both articular cartilage (AC) and the nucleus pulposus (NP) of the intervertebral disc (IVD). Whilst structural heterogeneity of intact aggrecan (containing three globular domains) is well characterised, the extent of aggrecan fragmentation in healthy tissues is poorly defined. Using young, yet skeletally mature (18-30 months), bovine AC and NP tissues, it was shown that, whilst the ultrastructure of intact aggrecan was tissue-dependent, most molecules (AC: 95 %; NP: 99.5 %) were fragmented (lacking one or more globular domains). Fragments were significantly smaller and more structurally heterogeneous in the NP compared with the AC (molecular area; AC: 8543 nm^2; NP: 4625 nm^2; p < 0.0001). In contrast, fibrillar collagen appeared structurally intact and tissue-invariant. Molecular fragmentation is considered indicative of a pathology; however, these young, skeletally mature tissues were histologically and mechanically (reduced modulus: AC: ≈ 500 kPa; NP: ≈ 80 kPa) comparable to healthy tissues and devoid of notable gelatinase activity (compared with rat dermis). As aggrecan fragmentation was prevalent in neonatal bovine AC (99.5 % fragmented, molecular area: 5137 nm^2) as compared with mature AC (95.0 % fragmented, molecular area: 8667 nm^2), it was hypothesised that targeted proteolysis might be an adaptive process that modified aggrecan packing (as simulated computationally) and, hence, tissue charge density, mechanical properties and porosity. These observations provided a baseline against which pathological and/or age-related fragmentation of aggrecan could be assessed and suggested that new strategies might be required to engineer constructs that mimic the mechanical properties of native cartilaginous tissues

    A block rational Krylov method for three-dimensional time-domain marine controlled-source electromagnetic modeling

    No full text
    We introduce a novel block rational Krylov method to accelerate three-dimensional time-domain marine controlled-source electromagnetic modeling with multiple sources. This method approximates the time-varying electric solutions explicitly in terms of matrix exponential functions. A main attraction is that no time stepping is required, while most of the computational costs are concentrated in constructing a rational Krylov basis. We optimize the shift parameters defining the rational Krylov space with a positive exponential weight function, thereby producing smaller approximation errors at later times and reducing iteration numbers. Furthermore, for multi-source modeling problems, we adopt block Krylov techniques to incorporate all source vectors in a single approximation space. The method is tested on two examples: a layered seafloor model and a 3D hydrocarbon reservoir model with seafloor bathymetry. The modeling results are found to agree very well with those from 1D semi-analytic solutions and finite-element time-domain solutions using a backward Euler scheme, respectively. Benchmarks of the block rational Krylov method demonstrate that it can be as up to 10 times faster than backward Euler. The block method also benefits from better memory efficiency, resulting in considerable speedup compared to non-block methods

    A New Preconditioner that Exploits Low-Rank Approximations to Factorization Error

    No full text
    We consider ill-conditioned linear systems Ax=Ax = b that are to be solved iteratively, and assume that a low accuracy LU factorization AL^U^A \approx \widehat{L}\widehat{U} is available for use in a preconditioner. We have observed that for ill-conditioned matrices AA arising in practice, A1A^{-1} tends to be numerically low rank, that is, it has a small number of large singular values. Importantly, the error matrix E=U^1L^1AIE = \widehat{U}^{-1}\widehat{L}^{-1}A - I tends to have the same property. To understand this phenomenon we give bounds for the distance from EE to a low-rank matrix in terms of the corresponding distance for A1A^{-1}. We then design a novel preconditioner that exploits the low-rank property of the error to accelerate the convergence of iterative methods. We apply this new preconditioner in three different contexts fitting our general framework: low floating-point precision (e.g., half precision) LU factorization, incomplete LU factorization, and block low-rank LU factorization. In numerical experiments with GMRES-based iterative refinement we show that our preconditioner can achieve a significant reduction in the number of iterations required to solve a variety of real-life problems

    1,445

    full texts

    2,151

    metadata records
    Updated in last 30 days.
    MIMS EPrints
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇