MIMS EPrints
Not a member yet
2151 research outputs found
Sort by
Computing the square root of a low-rank perturbation of the scaled identity matrix
We consider the problem of computing the square root of a perturbation of the scaled identity matrix, A = α Iₙ + UVᴴ, where U and V are n × k matrices with k ≤ n. This problem arises in various applications, including computer vision and optimization methods for machine learning. We derive a new formula for the pth root of A that involves a weighted sum of powers of the pth root of the k × k matrix α Iₖ + VᴴU. This formula is particularly attractive for the square root, since the sum has just one term when p = 2. We also derive a new class of Newton iterations for computing the square root that exploit the low-rank structure. We test these new methods on random matrices and on positive definite matrices arising in applications. Numerical experiments show that the new approaches can yield a much smaller residual than existing alternatives and can be significantly faster when the perturbation UVᴴ has low rank
Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores
In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower-precision matrices, and a matrix product is formed by multiplying the constituents in low precision. We investigate the use of multiword arithmetic for improving the performance-accuracy tradeoff of matrix multiplication with mixed precision block fused multiply-add (FMA) hardware, focusing especially on the tensor cores available on NVIDIA GPUs. Building on a general block FMA framework, we develop a comprehensive error analysis of multiword matrix multiplication. After confirming the theoretical error bounds experimentally by simulating low precision in software, we use the cuBLAS and CUTLASS libraries to implement a number of matrix multiplication algorithms using double-fp16 (double-binary16) arithmetic. When running the algorithms on NVIDIA V100 and A100 GPUs, we find that double-fp16 is not as accurate as fp32 (binary32) arithmetic despite satisfying the same worst-case error bound. Using probabilistic error analysis, we explain why this issue is likely to be caused by the rounding mode used by the NVIDIA tensor cores, and propose a parameterized blocked summation algorithm that alleviates the problem and significantly improves the performance-accuracy tradeoff
Probabilistic Rounding Error Analysis of Householder QR Factorization
When an matrix is premultiplied by a product of Householder matrices the worst-case normwise rounding error bound is proportional to , where is the unit roundoff. We prove that this bound can be replaced by one proportional to that holds with high probability if the rounding errors are mean independent and of mean zero. The proof makes use of a matrix concentration inequality. In particular, this result applies to Householder QR factorization. The same square rooting of the error constant applies to two-sided transformations by Householder matrices and hence to standard QR-type algorithms for computing eigenvalues and singular values. It also applies to Givens QR factorization. These results complement recent probabilistic rounding error analysis results for inner-product based algorithms and show that the square rooting effect is widespread in numerical linear algebra. Our numerical experiments, which make use of a new backward error formula for QR factorization, show that the probabilistic bounds give a much better indicator of the actual backward errors and their rate of growth than the worst-case bounds
Conditions for Digit Stability in Iterative Methods Using the Redundant Number Representation
Iterative methods play an important role in science and engineering applications, with uses ranging from linear system solvers in finite element methods to optimization solvers in model predictive control. Recently, a new computational strategy for iterative methods called ARCHITECT was proposed by Li et al. in [1] that uses the redundant number representation to create "stable digits" in the Most-significant Digits (MSDs) of an iterate, allowing the future iterations to assume the stable MSDs have not changed their value, eliminating the need to recompute them. In this work, we present a theoretical analysis of how these "stable digits" arise in iterative methods by showing that a Fejér monotone sequence in the redundant number representation can develop stable MSDs in the elements of the sequence as the sequence grows in length. This property of Fejér monotone sequences allows us to expand the class of iterative methods known to have MSD stability when using the redundant number representation to include any fixed-point iteration of a contractive Lipschitz continuous function. We then show that this allows for the theoretical guarantee of digit stability not just in the Jacobi method that was previously analyzed by Li et al. in [2], but also in other commonly used methods such as Newton's method
Probabilistic Rounding Error Analysis of Householder QR Factorization
The standard worst-case normwise backward error bound for Householder QR factorization of an matrix is proportional to , where is the unit roundoff. We prove that the bound can be replaced by one proportional to that holds with high probability if the rounding errors are mean independent and of mean zero and if the normwise backward errors in applying a sequence of Householder matrices to a vector satisfy bounds proportional to with probability . The proof makes use of a matrix concentration inequality. The same square rooting of the error constant applies to two-sided transformations by Householder matrices and hence to standard QR-type algorithms for computing eigenvalues and singular values. It also applies to Givens QR factorization. These results complement recent probabilistic rounding error analysis results for inner-product based algorithms and show that the square rooting effect is widespread in numerical linear algebra. Our numerical experiments, which make use of a new backward error formula for QR factorization, show that the probabilistic bounds give a much better indicator of the actual backward errors and their rate of growth than the worst-case bounds
Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores
In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower-precision matrices, and a matrix product is formed by multiplying the constituents in low precision. We investigate the use of multiword arithmetic for improving the performance-accuracy tradeoff of matrix multiplication with mixed precision block fused multiply-add (FMA) hardware, focusing especially on the tensor cores available on NVIDIA GPUs. Building on a general block FMA framework, we develop a comprehensive error analysis of multiword matrix multiplication. After confirming the theoretical error bounds experimentally by simulating low precision in software, we use the cuBLAS and CUTLASS libraries to implement a number of matrix multiplication algorithms using double-fp16 (double-binary16) arithmetic. When running the algorithms on NVIDIA V100 and A100 GPUs, we find that double-fp16 is not as accurate as fp32 (binary32) arithmetic despite satisfying the same worst-case error bound. Using probabilistic error analysis, we explain why this issue is likely to be caused by the rounding mode used by the NVIDIA tensor cores, and propose a parameterized blocked summation algorithm that alleviates the problem and significantly improves the performance-accuracy tradeoff
ML-SGFEM User Guide
This is a User Guide for the MATLAB toolbox ML-SGFEM. The software can be used to investigate computational issues associated with multilevel stochastic Galerkin finite element approximation for elliptic PDEs with parameter-dependent coefficients. The distinctive feature of the software is the hierarchical a posteriori error estimation strategy it uses to drive the adaptive enrichment of the approximation space at each step. This document contains installation instructions, a brief mathematical description of the methodology, a sample session and a description of the directory structure
Computing the square root of a low-rank perturbation of the scaled identity matrix
We consider the problem of computing the square root of a perturbation of the scaled identity matrix, A = α Iₙ + UVᴴ, where U and V are n × k matrices with k ≤ n. This problem arises in various applications, including computer vision and optimization methods for machine learning. We derive a new formula for the pth root of A that involves a weighted sum of powers of the pth root of the k × k matrix α Iₖ + VᴴU. This formula is particularly attractive for the square root, since the sum has just one term when p = 2. We also derive a new class of Newton iterations for computing the square root that exploit the low-rank structure. We test these new methods on random matrices and on positive definite matrices arising in applications. Numerical experiments show that the new approaches can yield much smaller residual than existing alternatives and can be significantly faster when the perturbation UVᴴ has low rank
Arbitrary Precision Algorithms for Computing the Matrix Cosine and its Fréchet Derivative
Existing algorithms for computing the matrix cosine are tightly coupled to a specific precision of floating-point arithmetic for optimal efficiency so they do not conveniently extend to an arbitrary precision environment. We develop an algorithm for computing the matrix cosine that takes the unit roundoff of the working precision as input, and so works in an arbitrary precision. The algorithm employs a Taylor approximation with scaling and recovering and it can be used with a Schur decomposition or in a decomposition-free manner. We also derive a framework for computing the \fd, construct an efficient evaluation scheme for computing the cosine and its Fr\'echet derivative simultaneously in arbitrary precision, and show how this scheme can be extended to compute the matrix sine, cosine, and their \fd s all together. Numerical experiments show that the new algorithms behave in a forward stable way over a wide range of precisions. The transformation-free version of the algorithm for computing the cosine is competitive in accuracy with the state-of-the-art algorithms in double precision and surpasses existing alternatives in both speed and accuracy in working precisions higher than double