1,720,984 research outputs found
Minimizing distance between distribution functions: discrete counterparts to continuous random variables with applications in non-life insurance and stochastic reliability
In this work, we propose a novel family of procedures for deriving a discrete counterpart to a continuous probability distribution. They are based on a class of distances between cumulative distribution functions, including the Cram & eacute;r, the Cram & eacute;r-von Mises, and the Anderson-Darling distances as particular cases. The discrete counterpart is defined and derived as the random variable which minimizes its distance to the assigned continuous probability distribution among all the discrete random variables supported on the set of integers (or positive integers). Applications are provided with reference to the exponential and the normal distributions, among others; the discrete counterparts are derived, and their main properties are discussed, also in comparison with the one obtained through an existing discretization technique based on the preservation of the cumulative distribution function at integer values. Parameter estimation for these discrete analogs is discussed, along with an analysis of two real datasets, where they are compared in terms of goodness-of-fit with some popular discrete distributions. Furthermore, in order to highlight the effectiveness and the benefits derived from the proposed discretization procedures, we illustrate two practical applications in actuarial science and in reliability engineering. In the former case, the problem of determining the distribution of the total claims amount for a non-life insurance portfolio is considered, where the claim sizes can be modelled as iid random variables, and the number of claims is random as well. Actuaries use a recursive calculation method based on Panjer's formula, which requires an appropriate discretization of the individual claim distribution, and therefore the proposed procedures can be used. Since we consider two simple cases where the distribution of the total claims amount is analytically acquirable, the efficacy of the discretization procedures in the final approximation can be easily assessed and turns out to be satisfactory, especially when compared to the existing discretization. The latter case considers the determination of the reliability parameter for a complex stress-strength model. Here, the approximation by discretization is compared to Monte Carlo simulation and shown to be relevant: with a comparable if not smaller computational effort, discretization leads to similar results as simulation. Such discretizations can also naturally be applied to more complex problems such as scenario generation in stochastic programming. R code for this article is provided as supplementary material
Discrete half-logistic distributions with applications in reliability and risk analysis
In the statistical literature, several discrete distributions have been developed so far for mod-
eling non-negative integer-valued phenomena, yet there is still room for new counting models
that adequately capture the diversity of real data sets. Here, we first discuss a count distri-
bution derived as a discrete analogue of the continuous half-logistic distribution, which is
obtained by preserving the expression of its survival function at each non-negative integer
support point. The proposed discrete distribution has a mode at zero and allows for over-
dispersion; these two features make it suitable for modeling purposes in many fields (e.g.,
insurance and ecology), when these conditions are satisfied by the data. In order to widen
its spectrum of applications, a discrete analogue is also presented of the type I generalized
half-logistic distribution (obtained by adding a shape parameter to the simple one-parameter
half-logistic), which allows us to model count data whose mode is not necessarily zero. For
these new count distributions, the main statistical properties are outlined, and parameter esti-
mation along with related issues is discussed. Their feasibility is proved on two real data
sets taken from the literature, which have already been fitted by other well-established count
distributions. Finally, a possible application is illustrated in the insurance field, related to the
exact/approximate determination of the distribution of the total claims amount through the
well-known Panjer’s recursive formula, within the framework of collective risk models
A Discrete Version of the Half-Logistic Distribution Based on the Mimicking of the Probability Density Function
We introduce a count distribution obtained as a discrete analogue of the continuous half-logistic distribution. It is derived by assigning to each non-negative integer value a probability proportional to the corresponding value of the density function of the parent model. The main features of this new distribution, in particular related to its shape, moments, and reliability properties, are described. Parameter estimation, which can be carried out resorting to different methods including maximum likelihood, is discussed, and a numerical comparison of their performances, based on Monte Carlo simulations, is presented. The applicability of the proposed distribution is proved on two real datasets, which have been already fitted by other well-established count distributions. In order to increase the flexibility of this counting model, a generalization is finally suggested, which is obtained by adding a shape parameter to the continuous one-parameter half-logistic and then applying the same discretization technique, based on the mimicking of the density function
An Alternative Discrete Analogue of the Half-Logistic Distribution Based on Minimization of a Distance between Cumulative Distribution Functions
A discrete version of the continuous half-logistic distribution is introduced, which is based on the minimization of the Cramér distance between the corresponding continuous and step-wise cumulative distribution functions. The expression of the probability mass function is derived in an analytic form, and some properties of the distribution - mainly related to moments and reliability concepts - are discussed. As for sample estimation, three different techniques are suggested, whose theoretical and empirical features are examined also through a Monte Carlo simulation study, comprising several parameter and sample size combinations. A comparison is also made between the proposed distribution and a discrete version already proposed in the literature, based on a different rationale, and a main difference is highlighted. A count regression model is suggested where the response variable follows the discrete half-logistic distribution and artificial and real data are used to illustrate its use. Finally, the performance of the proposed distribution over other classical models is discussed based on a real data set
Heterogeneous Data Fusion for Accurate Road User Tracking: A Distributed Multi-Sensor Collaborative Approach
This work presents the design and validation of a distributed multi-sensor object tracking algorithm designed to integrate heterogeneous sensory data from multiple static acquisition stations. The primary challenge addressed is the accurate tracking of targets in complex urban environments, where occlusions and the dynamic nature of traffic frequently hinder detection and tracking efforts. This challenge is particularly relevant in multimodal exchange areas, where vehicular traffic merges with heavy pedestrian and bicycle flow. We also address the scenario of delayed detection, which can easily occur when data from multiple stations are combined or when intensive data processing is performed. Our algorithm ensures high coverage and accuracy by maintaining dual Extended Kalman Filter states for each object, thus allowing for the assimilation of delayed detections and preserving optimal filter estimates at all times. The results of the proposed pipeline, tested using a digital twin of the Milano Bovisa Campus, demonstrate its efficacy, achieving high tracking precision across various scenarios and sensor combinations. Moreover, the results highlight the advantages of a distributed multi-sensor acquisition system compared to a single central station
Approximation of continuous random variables for the evaluation of the reliability parameter of complex stress–strength models
In many management science or economic applications, it is common to represent the key uncertain inputs as continuous random variables. However, when analytic techniques fail to provide a closed-form solution to a problem or when one needs to reduce the computational load, it is often necessary to resort to some problem-specific approximation technique or approximate each given continuous probability distribution by a discrete distribution. Many discretization methods have been proposed so far; in this work, we revise the most popular techniques, highlighting their strengths and weaknesses, and empirically investigate their performance through a comparative study applied to a well-known engineering problem, formulated as a stress–strength model, with the aim of weighting up their feasibility and accuracy in recovering the value of the reliability parameter, also with reference to the number of discrete points. The results overall reward a recently introduced method as the best performer, which derives the discrete approximation as the numerical solution of a constrained non-linear optimization, preserving the first two moments of the original distribution. This method provides more accurate results than an ad-hoc first-order approximation technique. However, it is the most computationally demanding as well and the computation time can get even larger than that required by Monte Carlo approximation if the number of discrete points exceeds a certain threshold
A Discrete Analogue of the Half-Logistic Distribution
In lifetime modeling, the observed measurements are usually discrete in nature, because the values are measured to only a finite number of decimal places and cannot really constitute all points in a continuum. For example, the survival time of a cancer patient can be measured as the number of months he/she survives. Then, even if the lifetime (of a patient, a device, etc.) is intrinsically continuous, it is reasonable to consider its observations as coming from a discretized distribution generated from an underlying continuous model. In this work, a discrete random distribution, supported on the non-negative integers, is obtained from the continuous half-logistic distribution by using a well-established discretization technique, which preserves the functional form of the survival function. Its main statistical properties are explored, with a special focus on the shape of the probability mass function and the determination of the first two moments; we discuss and compare, both theoretically and empirically, two different methods for estimating its unique parameter. This discrete random distribution can be used for modeling data exhibiting excess of zeros and over-dispersion, which are features often met in the insurance and ecology fields: an example of application is illustrated. An extension of this discrete distribution is finally suggested, by considering the generalized half-logistic distribution, which introduces a second shape parameter allowing for greater flexibility
Goodman and Kruskal’s gamma coefficient for ordinalized bivariate distributions
We consider a bivariate normal distribution with linear correlation ρ whose random components are discretized according to two assigned sets of thresholds. On the resulting bivariate ordinal random variable, one can compute Goodman and Kruskal’s gamma coefficient, γ, which is a common measure of ordinal association. Given the known analytical monotonic relationship between Pearson’s ρ and Kendall’s rank correlation τ for the bivariate normal distribution, and since in the continuous case, Kendall’s τ coincides with Goodman and Kruskal’s γ, the change of this association measure before and after discretization is worth studying. We consider several experimental settings obtained by varying the two sets of thresholds, or, equivalently, the marginal distributions of the final ordinal variables. This study, confirming previous findings, shows how the gamma coefficient is always larger in absolute value than Kendall’s rank correlation; this discrepancy lessens when the number of categories increases or, given the same number of categories, when using equally probable categories. Based on these results, a proposal is suggested to build a bivariate ordinal variable with assigned margins and Goodman and Kruskal’s γ by ordinalizing a bivariate normal distribution. Illustrative examples employing artificial and real data are provided
Discrete approximations of continuous probability distributions obtained by minimizing Cramer-von Mises-type distances
We consider the problem of approximating a continuous random variable, characterized by a cumulative distribution function (cdf) F(x), by means of k points, x(1) < x(2) < . . . < x(k), with probabilities p(i), i = 1,..., k. For a given k, a criterion for determining the xi and pi of the approximating k-point discrete distribution can be the minimization of some distance to the original distribution. Here we consider the weighted Cramer-von Mises distance between the original cdf F( x) and the step-wise cdf <^> F (x) of the approximating discrete distribution, characterized by a nonnegative weighting function w( x). This problem has been already solved analytically when w(x) corresponds to the probability density function of the continuous random variable, w(x) = <(F)over cap> (x), and when w(x) is a piece-wise constant function, through a numerical iterative procedure based on a homotopy continuation approach. In this paper, we propose and implement a solution to the problem for different choices of the weighting function w(x), highlighting how the results are affected by w(x) itself and by the number of approximating points k, in addition to F(x); although an analytic solution is not usually available, yet the problem can be numerically solved through an iterative method, which alternately updates the two sub-sets of k unknowns, the x(i) 's (or a transformation thereof) and the p(i) 's, till convergence. The main apparent advantage of these discrete approximations is their universality, since they can be applied tomost continuous distributions, whether they possess or not the first moments. In order to shed some light on the proposed approaches, applications to several well-known continuous distributions (among them, the normal and the exponential) and to a practical problem where discretization is a useful tool are also illustrated
- …
