MIMS EPrints
Not a member yet
2151 research outputs found
Sort by
Optimal switching between two spectrally negative Lévy processes to minimise ruin probability
We consider an optimal underwriting problem where given two insurance portfolios that generate cash flows according to two spectrally negative Lévy processes of bounded variation X and Y, one has to underwrite adaptively a convex combination of the two such that the probability of ruin occurring in the combined portfolio is minimised. This optimal underwriting problem boils down to an optimal switching problem where one has to decide, based on the available capital at a given time, whether to go for mode or for mode at that time. The 1-switch-level strategy with parameter b in [0,oo] is the strategy where one switches from one mode to the other only at times when the capital goes above or below the level b. We find a set of sufficient conditions on the two Lévy measures such that an optimal strategy is formed by a 1-switch-level strategy, which covers in particular the case where the hazard rates of the two Lévy measures are decreasing and ordered. An interesting tool in the analysis is a new monotonicity property regarding quasi-convexity for renewal equations
Robust rational approximations of nonlinear eigenvalue problems
We develop algorithms that construct robust (i.e., reliable for a given tolerance, and scaling independent) rational approximants of matrix-valued functions on a given subset of the complex plane. We consider matrix-valued functions provided in both split form (i.e., as a sum of scalar functions times constant coefficient matrices) and in black box form. We develop a new error analysis and use it for the construction of stopping criteria, one for each form.
Our criterion for split forms adds weights chosen relative to the importance of each scalar function, leading to the weighted AAA algorithm, a variant of the set-valued AAA algorithm that can guarantee to return a rational
approximant with a user-chosen accuracy. We propose two-phase approaches for black box matrix-valued functions that
construct a surrogate AAA approximation in phase one and
refine it in phase two, leading to the surrogate AAA algorithm with exact search and the surrogate AAA algorithm with cyclic Leja--Bagby refinement. The stopping criterion for black box matrix-valued functions is updated at each
step of phase two to include information from the previous step. When convergence occurs, our two-phase approaches return rational approximants with a user-chosen accuracy.
We select problems from the NLEVP collection that represent a variety of matrix-valued functions of different sizes and properties and use them to benchmark our algorithms
Algorithms for stochastically rounded elementary arithmetic operations in IEEE 754 floating-point arithmetic
We present algorithms for performing the four elementary arithmetic operations (+, -, ×, and ÷) in floating point arithmetic with stochastic rounding, and discuss a few examples where using stochastic rounding may be beneficial. The algorithms require that the hardware be compliant with the IEEE 754 floating-point standard and that a floating-point pseudorandom number generator be available, either in software or in hardware. The goal of these techniques is to emulate operations with stochastic rounding when the underlying hardware does not support this rounding mode, as is the case for most existing CPUs and GPUs. When stochastically rounding double precision operations, the algorithms we propose are on average over 5 times faster than an implementation that uses extended precision. We test our algorithms on various problems where stochastic rounding is expected to bring advantages: harmonic sum, summation of small random numbers, and ordinary differential equation solvers
Matrices with Tunable Infinity-Norm Condition Number and No Need for Pivoting in LU Factorization
We propose a two-parameter family of nonsymmetric dense n ⨉ n matrices A(α, β) for which LU factorization without pivoting is numerically stable, and we show how to choose α and β to achieve any value of the ∞-norm condition number. The matrix A(α, β) can be formed from a simple formula in O(n²) flops. The matrix is suitable for use in the HPL-AI Mixed-Precision Benchmark, which requires an extreme scale test matrix (dimension n > 10⁷) that has a controlled condition number and can be safely used in LU factorization without pivoting. It is also of interest as a general-purpose test matrix
CPFloat: A C library for emulating low-precision arithmetic
Low-precision floating-point arithmetic can be simulated via software by executing each arithmetic operation in hardware and rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing algorithms are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library that offers efficient routines for rounding arrays of binary32 and binary64 numbers to lower precision. The software exploits the bit level representation of the underlying formats and performs only low-level bit manipulation and integer arithmetic, without relying on costly library calls. In numerical experiments the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To the best of our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic available in any language
A Multiprecision Derivative-Free Schur--Parlett Algorithm for Computing Matrix Functions
The Schur--Parlett algorithm, implemented in MATLAB as \texttt{funm}, evaluates an analytic function at an matrix argument by using the Schur decomposition and a block recurrence of Parlett. The algorithm requires the ability to compute and its derivatives, and it requires that has a Taylor series expansion with a suitably large radius of convergence. \new{We develop a version of the Schur--Parlett algorithm that requires only function values and not derivatives. The algorithm requires access to arithmetic of a matrix-dependent precision at least double the working precision, which is used to evaluate on the diagonal blocks of order greater than (if there are any) of the reordered and blocked Schur form.} The key idea is to compute by diagonalization the function of a small random diagonal perturbation of each diagonal block, where the perturbation ensures that diagonalization will succeed. Our algorithm is inspired by Davies's randomized approximate diagonalization method, but we explain why that is not a reliable numerical method for computing matrix functions. This multiprecision Schur--Parlett algorithm is applicable to arbitrary analytic functions and, like the original Schur--Parlett algorithm, it generally behaves in a numerically stable fashion. \new{The algorithm is especially useful when the derivatives of are not readily available or accurately computable.} We apply our algorithm to the matrix Mittag--Leffler function and show that it yields results of accuracy similar to, and in some cases much greater than, the state of the art algorithm for this function
A Hamiltonian Decomposition for Fast Interior-Point Solvers in Model Predictive Control
Optimal decision-making tools are essential in industry to achieve high performance. One of these tools is Model Predictive Control (MPC), which is an advanced control technique that generates an action that affects the controlled variables, while satisfying the process’ operational constraints. At the core of the MPC algorithm lies an optimization problem that is solved by a numerical method at every sample time. New demand for more self-contained modular processes has seen MPC embedded in small-scale platforms. This has prompted a need for custom-made numerical methods that help to efficiently run the computationally demanding optimization algorithms. In this paper, we propose two approaches that factorize the Newton system of the interior-point method (IPM) based on the two-point boundary-value (TPBV) problem structure, rarely explored in MPC. Exploiting the Hamiltonian form of the augmented system, we derive an incomplete LU factorization. A direct method is available to compute the solution of the system using a forward substitution of a series of matrices. An iterative method is also available. We propose a preconditioned Krylov method that converges within a small number of iterations only depending on the number of states
Numerical Behavior of NVIDIA Tensor Cores
We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4 and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: 1) accurately simulate NVIDIA tensor cores on conventional hardware; 2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and 3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensor cores as well as similar accelerators from other vendors, as they become available. Moreover, we identify a non-monotonicity issue affecting floating point multi-operand adders if the intermediate results are not normalized after each step
Numerical Behavior of the NVIDIA Tensor Cores
We explore the floating-point arithmetic used by the NVIDIA Volta tensor cores, which are hardware accelerators for mixed-precision matrix multiplication. We investigate what precision is used for intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations in the dot products arising in the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware accelerators. Knowing the answers to these questions is important if one wishes to: 1) build hardware that computes a matrix-matrix product matching the results of NVIDIA tensor cores; 2) achieve bit-reproducible results when designing on conventional hardware with IEEE 754 floating-point arithmetic code meant to run on NVIDIA tensor cores; and 3) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations. As an additional result, we point out a non-monotonicity issue that arises in floating-point multi-operand addition without the normalization of the intermediate results
The dual inverse scaling and squaring algorithm for the matrix logarithm
The inverse scaling and squaring algorithm computes the logarithm of a square matrix A by evaluating a rational approximant to the logarithm at the matrix B := A^(1/2^s) for a suitable choice of s. We introduce a dual approach and approximate the logarithm of B by solving the rational equation r(X) = B, where r is a diagonal Padé approximant to the matrix exponential at 0. This equation is solved by a substitution algorithm in the style of [M. Fasi and B. Iannazzo, MIMS EPrint 2019.8, 2019] which is tailored to the special structure of the approximants to the exponential. In terms of floating-point operations, the resulting method is cheaper than the state-of-the-art inverse scaling and squaring algorithm