MIMS EPrints
Not a member yet
2151 research outputs found
Sort by
A comparison of limited-memory Krylov methods for Stieltjes functions of Hermitian matrices
Given a limited amount of memory and a target accuracy, we propose and compare several polynomial Krylov methods for the approximation of f(A)b, the action of a Stieltjes matrix function of a large Hermitian matrix on a vector. Using new error bounds and estimates, as well as existing results, we derive predictions of the practical performance of the methods, and rank them accordingly. As by-products, we derive new results on inexact Krylov iterations for matrix functions in order to allow for a fair comparison of rational Krylov methods with polynomial inner solves
The dual inverse scaling and squaring algorithm for the matrix logarithm
The inverse scaling and squaring algorithm computes the logarithm of a square matrix A by evaluating a rational approximant to the logarithm at the matrix B:=A^{2^{-s}} for a suitable choice of s. We introduce a dual approach and approximate the logarithm of B by solving the rational equation r(X)=B, where r is a diagonal Padé approximant to the matrix exponential at 0. This equation is solved by a substitution technique in the style of those developed in (Fasi & Iannazzo, Elect. Trans. Num. Anal., 53 (2020), pp. 500--521). The new method is tailored to the special structure of the diagonal Padé approximants to the exponential, and in terms of computational cost is more efficient than the state-of-the-art inverse scaling and squaring algorithm
Random Matrices Generating Large Growth in LU Factorization with Pivoting
We identify a class of random, dense matrices for which
LU factorization with any form of pivoting produces a growth factor
typically of size at least for large .
The condition number of the matrices can be arbitrarily chosen
and large growth also happens for the transpose.
No previous matrices with all these properties were known.
The matrices can be generated by the MATLAB function
\verb"gallery('randsvd',..)",
and they are formed as the product of two random orthogonal
matrices from the Haar distribution with a diagonal matrix
having only one diagonal entry different from ,
which lies between and
(the ``one small singular value'' case).
Our explanation for the large growth uses the fact that
the maximum absolute value of any element of
a Haar distributed orthogonal matrix tends to be relatively small for
large .
We verify the behavior numerically, finding that for partial pivoting the
actual growth is significantly larger than the lower bound,
and much larger than the growth observed
for random matrices with elements from the
uniform ] or standard normal distributions.
We show more generally that a rank- perturbation to an orthogonal matrix
producing large growth for any form of pivoting also generates large growth
under reasonable assumptions.
Finally, we demonstrate that GMRES-based iterative refinement
can provide stable solutions to when large growth occurs in low
precision LU factors, even when standard iterative refinement cannot
Numerical Behavior of NVIDIA Tensor Cores
We explore the floating-point arithmetic implemented in NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100 and Turing T4 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: 1) accurately simulate NVIDIA tensor cores on conventional hardware; 2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and 3) build hardware that computes a matrix-matrix product matching the results of the NVIDIA tensor cores. As part of this work we provide a testsuite that can be easily adapted to test the latest tensor cores available in the NVIDIA Ampere A100, once those graphics cards become easily accessible. Moreover, we identify a non-monotonicity issue that arises in floating-point multi-operand addition if the intermediate results are not normalized
Robust rational approximations of nonlinear eigenvalue problems
We develop algorithms that construct robust (i.e., reliable for a given tolerance and scaling independent) rational approximations of matrix-valued functions on a given subset of the complex plane.
We consider matrix-valued functions provided in both split form (i.e., as a sum of scalar functions times constant coefficient matrices) and in black box form.
We use an error analysis to construct stopping criteria, one for each form. The criterion for split forms adds weights chosen relative to the importance of each scalar function, which we use in our weighted AAA algorithm, a variant of the set-valued AAA algorithm that guarantees to return a rational approximant with a user chosen accuracy.
We propose two-phase approaches for black box matrix-valued functions that construct a surrogate AAA approximation in phase one and refine it in phase two, leading to the surrogate AAA algorithm with exact search and the surrogate AAA algorithm with cyclic Leja--Bagby refinement.
The stopping criterion for black box matrix-valued functions is updated at each step of phase two to include information from the previous step.
When convergence occurs, our two-phase approaches return
rational approximants with a user chosen accuracy.
We select problems from the NLEVP collection that represent a variety of matrix-valued functions of different sizes and properties and use them to benchmark our algorithms
Sharper Probabilistic Backward Error Analysis for Basic Linear Algebra Kernels with Random Data
CPFloat: A C library for simulating low-precision arithmetic
One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored, and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic available in any language
Sharper Probabilistic Backward Error Analysis for Basic Linear Algebra Kernels with Random Data
CPFloat: A C library for simulating low-precision arithmetic
One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored, and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic available in any language