MIMS EPrints
Not a member yet
2151 research outputs found
Sort by
Squeezing a Matrix into Half Precision, with an Application to Solving Linear Systems
Motivated by the demand in machine learning, modern computer hardware is increas- ingly supporting reduced precision floating-point arithmetic, which provides advantages in speed, energy, and memory usage over single and double precision. Given the availability of such hardware, mixed precision algorithms that work in single or double precision but carry out part of a compu- tation in half precision are now of great interest for general scientific computing tasks. Because of the limited range of half precision arithmetic, in which positive numbers lie between 6 × 10−8 and 7 × 104, a straightforward rounding of single or double precision data into half precision can lead to overflow, underflow, or subnormal numbers being generated, all of which are undesirable. We develop an algorithm for converting a matrix from single or double precision to half precision. It first applies two-sided diagonal scaling in order to equilibrate the matrix (that is, to ensure that every row and column has ∞-norm 1), then multiplies by a scalar to bring the largest element within a factor θ ≤ 1 of the overflow level, and finally rounds to half precision. The second step ensures that full use is made of the limited range of half precision arithmetic, and θ must be chosen to allow sufficient headroom for subsequent computations. We apply the new algorithm to GMRES-based iterative re- finement (GMRES-IR), which solves a linear system Ax = b with single or double precision data by LU factorizing A in half precision and carrying out iterative refinement with the correction equations solved by GMRES preconditioned with the low precision LU factors. Previous implementations of this algorithm have used a crude conversion to half precision that our experiments show can cause slow convergence of GMRES-IR for badly scaled matrices or failure to converge at all. The new conversion algorithm computes ∞-norms of rows and columns of the matrix and its cost is negligible in the context of LU factorization. We show that it leads to faster convergence of GMRES-IR for badly scaled matrices and thereby allows a much wider class of problems to be solved
An optimal iterative solver for symmetric indefinite linear systems with PDE origins: Balanced black-box stopping tests
This work discusses the design of efficient algorithms for solving symmetric indefinite linear systems arising from FEM approximation of PDEs. The distinctive feature of the preconditioned MINRES solver that is used here is the incorporation of error control in the ‘natural norm’ in combination with an effective a posteriori estimator for the PDE approximation error. This leads to a robust and optimal blackbox stopping criterion: the iteration is terminated as soon as the algebraic error is insignificant compared to the approximation error
Squeezing a Matrix into Half Precision, with an Application to Solving Linear Systems
Motivated by the demand in machine learning, modern computer hardware is increas- ingly supporting reduced precision floating-point arithmetic, which provides advantages in speed, energy, and memory usage over single and double precision. Given the availability of such hardware, mixed precision algorithms that work in single or double precision but carry out part of a compu- tation in half precision are now of great interest for general scientific computing tasks. Because of the limited range of half precision arithmetic, in which positive numbers lie between 6 × 10−8 and 7 × 104, a straightforward rounding of single or double precision data into half precision can lead to overflow, underflow, or subnormal numbers being generated, all of which are undesirable. We develop an algorithm for converting a matrix from single or double precision to half precision. It first applies two-sided diagonal scaling in order to equilibrate the matrix (that is, to ensure that every row and column has ∞-norm 1), then multiplies by a scalar to bring the largest element within a factor θ ≤ 1 of the overflow level, and finally rounds to half precision. The second step ensures that full use is made of the limited range of half precision arithmetic, and θ must be chosen to allow sufficient headroom for subsequent computations. We apply the new algorithm to GMRES-based iterative re- finement (GMRES-IR), which solves a linear system Ax = b with single or double precision data by LU factorizing A in half precision and carrying out iterative refinement with the correction equations solved by GMRES preconditioned with the low precision LU factors. Previous implementations of this algorithm have used a crude conversion to half precision that our experiments show can cause slow convergence of GMRES-IR for badly scaled matrices or failure to converge at all. The new conversion algorithm computes ∞-norms of rows and columns of the matrix and its cost is negligible in the context of LU factorization. We show that it leads to faster convergence of GMRES-IR for badly scaled matrices and thereby allows a much wider class of problems to be solved
A New Preconditioner that Exploits Low-Rank Approximations to Factorization Error
We consider ill-conditioned linear systems b that are to be solved
iteratively, and assume that a low accuracy LU factorization
is available for use in a
preconditioner. We have observed that for ill-conditioned matrices
arising in practice, tends to be numerically low rank, that is, it
has a small number of large singular values. Importantly, the error matrix
tends to have the same
property. To understand this phenomenon we give bounds for the distance
from to a low-rank matrix in terms of the corresponding distance for
. We then design a novel preconditioner that exploits the low-rank
property of the error to accelerate the convergence of iterative
methods. We apply this new preconditioner in three different contexts
fitting our general framework: low floating-point precision (e.g., half
precision) LU factorization, incomplete LU factorization, and block
low-rank LU factorization.
In numerical experiments with GMRES-based iterative refinement we show
that our preconditioner can achieve a significant reduction in the number
of iterations required to solve a variety of real-life problems
Multiprecision Algorithms for Computing the Matrix Logarithm
Two algorithms are developed for computing the matrix logarithm in floating
point arithmetic of any specified precision. The backward error-based
approach used in the state of the art inverse scaling and squaring
algorithms does not conveniently extend to a multiprecision environment, so
instead we choose algorithmic parameters based on a forward error bound. We
derive a new forward error bound for Pad\'{e} approximants that for
highly nonnormal matrices can be much smaller than the classical bound of
Kenney and Laub. One of our algorithms exploits a Schur decomposition
while the other is transformation-free and uses only the computational
kernels of matrix multiplication and the solution of multiple right-hand
side linear systems. For double precision computations the algorithms are
competitive with the state of the art algorithm of Al-Mohy, Higham, and
Relton implemented in \texttt{logm} in MATLAB\@. They are intended for
computing environments providing multiprecision floating point arithmetic,
such as Julia, MATLAB via the Symbolic Math Toolbox or the Multiprecision
Computing Toolbox, or Python with the mpmath or SymPy packages. We show
experimentally that
the algorithms behave in a forward stable manner over a
wide range of precisions, unlike existing alternatives
Filtering Frequencies in a Shift-and-invert Lanczos Algorithm for the Dynamic Analysis of Structures
The shift-and-invert Lanczos algorithm is a commonly used solution procedure to compute the eigenpairs of large, sparse eigenvalue problems that arise when approximating the elastic dynamic response of large structures under the influence of seismic forces. Not all eigenvectors are equally important to the response when the structure is exposed to a mass-dependent external force of the form
, where is the mass matrix of the system and the rigid body vector. Structural engineers select eigenvectors , , such that
their cumulative mass participation, measured as , is above a target threshold . We show that when the starting vector for the unshifted Lanczos algorithm is the spatial distribution vector , the Lanczos procedure can be used to provide
an estimate of the cumulative mass participation. This allows us to identify intervals containing eigenvalues whose eigenvectors have a large contribution to the cumulative mass participation and filter out intervals containing eigenvalues whose eigenvectors have a negligible contribution. We use this information to devise a sequence of shifts for the shift-and-invert Lanczos algorithm as well as a stopping criterion for the iteration with shift so that the cumulative mass participation of the computed eigenvectors reaches the
required level . Numerical experiments on real engineering problems show that our approach computes
up to fewer eigenvectors and requires fewer shifts, on average, than the more general shifting strategy proposed by Ericsson and Ruhe (Math. Comp., 35 (1980))
Extracellular matrix fragmentation in young, healthy cartilaginous tissues
Although the composition and structure of cartilaginous tissues is complex, collagen II fibrils and aggrecan are the most abundant assemblies in both articular cartilage (AC) and the nucleus pulposus (NP) of the intervertebral disc (IVD). Whilst structural heterogeneity of intact aggrecan (containing three globular domains) is well characterised, the extent of aggrecan fragmentation in healthy tissues is poorly defined. Using young, yet skeletally mature (18-30 months), bovine AC and NP tissues, it was shown that, whilst
the ultrastructure of intact aggrecan was tissue-dependent, most molecules (AC: 95 %; NP: 99.5 %) were fragmented (lacking one or more globular domains). Fragments were significantly smaller and more structurally heterogeneous in the NP compared with the AC (molecular area; AC: 8543 nm^2; NP: 4625 nm^2; p < 0.0001). In contrast, fibrillar collagen appeared structurally intact and tissue-invariant. Molecular
fragmentation is considered indicative of a pathology; however, these young, skeletally mature tissues were histologically and mechanically (reduced modulus: AC: ≈
500 kPa; NP: ≈ 80 kPa) comparable to healthy tissues and devoid of notable gelatinase activity (compared with rat dermis). As aggrecan fragmentation was prevalent in neonatal bovine AC (99.5 % fragmented, molecular area: 5137 nm^2) as compared with mature AC (95.0 % fragmented, molecular area: 8667 nm^2), it was hypothesised that targeted proteolysis might be an adaptive process that modified aggrecan packing (as simulated computationally) and, hence, tissue charge density, mechanical properties and porosity. These observations provided a baseline against which pathological
and/or age-related fragmentation of aggrecan could be assessed and suggested that new strategies might be required to engineer constructs that mimic the mechanical properties of native cartilaginous tissues
A block rational Krylov method for three-dimensional time-domain marine controlled-source electromagnetic modeling
We introduce a novel block rational Krylov method to accelerate three-dimensional time-domain marine controlled-source electromagnetic modeling with multiple sources. This method approximates the time-varying electric solutions explicitly in terms of matrix exponential functions. A main attraction is that no time stepping is required, while most of the computational costs are concentrated in constructing a rational Krylov basis. We optimize the shift parameters defining the rational Krylov space with a positive exponential weight function, thereby producing smaller approximation errors at later times and reducing iteration numbers. Furthermore, for multi-source modeling problems, we adopt block Krylov techniques to incorporate all source vectors in a single approximation space. The method is tested on two examples: a layered seafloor model and a 3D hydrocarbon reservoir model with seafloor bathymetry. The modeling results are found to agree very well with those from 1D semi-analytic solutions and finite-element time-domain solutions using a backward Euler scheme, respectively. Benchmarks of the block rational Krylov method demonstrate that it can be as up to 10 times faster than backward Euler. The block method also benefits from better memory efficiency, resulting in considerable speedup compared to non-block methods
A New Preconditioner that Exploits Low-Rank Approximations to Factorization Error
We consider ill-conditioned linear systems b that are to be solved
iteratively, and assume that a low accuracy LU factorization
is available for use in a
preconditioner. We have observed that for ill-conditioned matrices
arising in practice, tends to be numerically low rank, that is, it
has a small number of large singular values. Importantly, the error matrix
tends to have the same
property. To understand this phenomenon we give bounds for the distance
from to a low-rank matrix in terms of the corresponding distance for
. We then design a novel preconditioner that exploits the low-rank
property of the error to accelerate the convergence of iterative
methods. We apply this new preconditioner in three different contexts
fitting our general framework: low floating-point precision (e.g., half
precision) LU factorization, incomplete LU factorization, and block
low-rank LU factorization.
In numerical experiments with GMRES-based iterative refinement we show
that our preconditioner can achieve a significant reduction in the number
of iterations required to solve a variety of real-life problems