MIMS EPrints

Not a member yet

2151 research outputs found

Sort by

A comparison of limited-memory Krylov methods for Stieltjes functions of Hermitian matrices

Author: Güttel Stefan
Schweitzer Marcel
Publication venue
Publication date: 12/06/2020
Field of study

Given a limited amount of memory and a target accuracy, we propose and compare several polynomial Krylov methods for the approximation of f(A)b, the action of a Stieltjes matrix function of a large Hermitian matrix on a vector. Using new error bounds and estimates, as well as existing results, we derive predictions of the practical performance of the methods, and rank them accordingly. As by-products, we derive new results on inexact Krylov iterations for matrix functions in order to allow for a fair comparison of rational Krylov methods with polynomial inner solves

The dual inverse scaling and squaring algorithm for the matrix logarithm

Author: Fasi Massimiliano
Iannazzo Bruno
Publication venue
Publication date: 27/05/2020
Field of study

The inverse scaling and squaring algorithm computes the logarithm of a square matrix A by evaluating a rational approximant to the logarithm at the matrix B:=A^{2^{-s}} for a suitable choice of s. We introduce a dual approach and approximate the logarithm of B by solving the rational equation r(X)=B, where r is a diagonal Padé approximant to the matrix exponential at 0. This equation is solved by a substitution technique in the style of those developed in (Fasi & Iannazzo, Elect. Trans. Num. Anal., 53 (2020), pp. 500--521). The new method is tailored to the special structure of the diagonal Padé approximants to the exponential, and in terms of computational cost is more efficient than the state-of-the-art inverse scaling and squaring algorithm

Random Matrices Generating Large Growth in LU Factorization with Pivoting

Author: Higham Desmond J.
Higham Nicholas J.
Pranesh Srikara
Publication venue
Publication date: 14/05/2020
Field of study

We identify a class of random, dense

n\times n

matrices for which LU factorization with any form of pivoting produces a growth factor typically of size at least

n/(4 \log n)

for large

n

. The condition number of the matrices can be arbitrarily chosen and large growth also happens for the transpose. No previous matrices with all these properties were known. The matrices can be generated by the MATLAB function \verb"gallery('randsvd',..)", and they are formed as the product of two random orthogonal matrices from the Haar distribution with a diagonal matrix having only one diagonal entry different from

1

, which lies between

0

and

1

(the ``one small singular value'' case). Our explanation for the large growth uses the fact that the maximum absolute value of any element of a Haar distributed orthogonal matrix tends to be relatively small for large

n

. We verify the behavior numerically, finding that for partial pivoting the actual growth is significantly larger than the lower bound, and much larger than the growth observed for random matrices with elements from the uniform

[0,1

] or standard normal distributions. We show more generally that a rank-

1

perturbation to an orthogonal matrix producing large growth for any form of pivoting also generates large growth under reasonable assumptions. Finally, we demonstrate that GMRES-based iterative refinement can provide stable solutions to

Ax = b

when large growth occurs in low precision LU factors, even when standard iterative refinement cannot

Numerical Behavior of NVIDIA Tensor Cores

Author: Fasi Massimiliano
Higham Nicholas J.
Mikaitis Mantas
Pranesh Srikara
Publication venue
Publication date: 23/04/2020
Field of study

We explore the floating-point arithmetic implemented in NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100 and Turing T4 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: 1) accurately simulate NVIDIA tensor cores on conventional hardware; 2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and 3) build hardware that computes a matrix-matrix product matching the results of the NVIDIA tensor cores. As part of this work we provide a testsuite that can be easily adapted to test the latest tensor cores available in the NVIDIA Ampere A100, once those graphics cards become easily accessible. Moreover, we identify a non-monotonicity issue that arises in floating-point multi-operand addition if the intermediate results are not normalized

Robust rational approximations of nonlinear eigenvalue problems

Author: Güttel Stefan
Negri Porzio Gian Maria
Tisseur Francoise
Publication venue
Publication date: 13/11/2020
Field of study

We develop algorithms that construct robust (i.e., reliable for a given tolerance and scaling independent) rational approximations of matrix-valued functions on a given subset of the complex plane. We consider matrix-valued functions provided in both split form (i.e., as a sum of scalar functions times constant coefficient matrices) and in black box form. We use an error analysis to construct stopping criteria, one for each form. The criterion for split forms adds weights chosen relative to the importance of each scalar function, which we use in our weighted AAA algorithm, a variant of the set-valued AAA algorithm that guarantees to return a rational approximant with a user chosen accuracy. We propose two-phase approaches for black box matrix-valued functions that construct a surrogate AAA approximation in phase one and refine it in phase two, leading to the surrogate AAA algorithm with exact search and the surrogate AAA algorithm with cyclic Leja--Bagby refinement. The stopping criterion for black box matrix-valued functions is updated at each step of phase two to include information from the previous step. When convergence occurs, our two-phase approaches return rational approximants with a user chosen accuracy. We select problems from the NLEVP collection that represent a variety of matrix-valued functions of different sizes and properties and use them to benchmark our algorithms

Sharper Probabilistic Backward Error Analysis for Basic Linear Algebra Kernels with Random Data

Author: Higham Nicholas J.
Mary Theo
Publication venue
Publication date: 21/01/2020
Field of study

CPFloat: A C library for simulating low-precision arithmetic

Author: Fasi Massimiliano
Mikaitis Mantas
Publication venue
Publication date: 20/10/2020
Field of study

One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored, and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic available in any language

Sharper Probabilistic Backward Error Analysis for Basic Linear Algebra Kernels with Random Data

Author: Higham Nicholas J.
Mary Theo
Publication venue
Publication date: 21/01/2020
Field of study

CPFloat: A C library for simulating low-precision arithmetic

Author: Fasi Massimiliano
Mikaitis Mantas
Publication venue
Publication date: 20/10/2020
Field of study

Three-Precision GMRES-Based Iterative Refinement for Least Squares Problems

Author: Carson Erin
Higham Nicholas J.
Pranesh Srikara
Publication venue
Publication date: 03/02/2020
Field of study

1,445

full texts

2,151

metadata records

Updated in last 30 days.

MIMS EPrints

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇