1,721,057 research outputs found

    Functional clustering of NPLs recovery curves

    No full text
    The recovery performance of a portfolio of Non-Performing Loans can be measured in terms of recovery rate and liquidation time jointly through a “recovery curve” representative of recovery rates over time. When portfolio heterogeneity is very high, it is informative to estimate more than just one curve by dividing the portfolio into several homogeneous subsets, i.e. clusters, and calculating a recovery curve for each of them. The aim of this work is to estimate the optimal portfolio partition and the smoothed recovery curves of each cluster by means of non-parametric statistical learning techniques

    Principal Stratification in Sample Selection Problems with Non Normal Error Terms

    Full text link
    The aim of the paper is to relax distributional assumptions on the error terms, often imposed in parametric sample selection models to estimate causal effects, when plausible exclusion restrictions are not available. Within the principal stratification framework, we approximate the true distribution of the error terms with a mixture of Gaussian. We propose an EM type algorithm for ML estimation. In a simulation study we show that our estimator has lower MSE than the ML and two-step Heckman estimators with any non normal distribution considered for the error terms. Finally we provide an application to the Job Corps training program

    Clustering Ordinal Data Via Parsimonious Models

    No full text
    This review presents some parsimonious models to cluster two-way and three-way ordinal data. They are formulated has a reparameterization of a finite mixture of Gaussians that is partially observed through a discretization of its variates. Model parameters are estimated using a composite likelihood approach in order to reduce the numerical complexity. The parsimony is obtained by reducing the dimensionality of the variable’s space within and/or between the components

    Mixture of factor analyzers for mixed-type data via a composite likelihood approach

    No full text
    A parsimonious modelling approach for clustering mixed-type (ordinal and continuous) data is presented. It is assumed that ordinal and continuous data follow a finite mixture of Gaussians that is only partially observed.We define a general class of parsimonious models for mixed-type data by imposing a factor decomposition on component-specific covariance matrices. Parameter estimation is carried out using a EM-type algorithm based on composite likelihood

    Candecomp/Parafac with ridge regularization

    No full text
    The Candecomp/Parafac (CP) model decomposes a three-way array through components. In the practical use of CP, degeneracy may arise, i.e. CP parameter matrices with diverging, highly collinear and uninterpretable components. A frequently applied remedy to degeneracy is to fit a CP model with orthogonality constraints on one of the component matrices. However, this does not guarantee that the so-extracted components well resemble the true ones because the occurrence of degeneracy does not imply the orthogonality of the true components. For this reason, a new CP method involving a particular ridge regularization term (hence, called CP-Ridge) is introduced. It solves the degeneracy problem admitting an overall maximum level of collinearity among the components. A simulation experiment is performed in order to illustrate the properties of CP-Ridge and to compare its performance with those of some other competitors available in the literature. (C) 2013 Elsevier B.V. All rights reserved

    Generalized Reduced K–Means

    No full text
    In the context of sports analytics, the evaluation of players’ performance has traditionally been a complex endeavor, given the multidimensional nature of the data involved. This paper introduces a novel approach for multivariate analyses of complex data sets, with a focus on professional basketball data. The proposed model simultaneously performs unsupervised classification of units into K clusters and their optimal low-dimensional reconstruction. This is done considering variables’ dimensionality representation into Q components for each group of clusters that can be identified by the same latent dimensions. Consequently, we refer to the new model as Generalized Reduced K-Means (GRKM), which includes RKM as a special case when a unique lower rank reconstruction of the variables is needed. Before the application on real data, the effectiveness of the proposal is shown by means of an extended simulation study. By applying this innovative method to a comprehensive set of National Basketball Association (NBA) statistics, we demonstrate its efficacy in distinguishing player profiles across offensive and defensive spectrums, simultaneously grouping them into coherent clusters

    Estimating recovery rate and time to liquidate for NPLs

    No full text
    The objective of the present paper is to propose a new method to measure the recovery performance of a portfolio of non-performing loans (NPLs) in terms of recovery rate and time to liquidate. The fundamental idea is to draw a curve representing the recovery rates during time, here assumed discretized, for example, in years. In this way, the user can get simultaneously information about recovery rate and time to liquidate of the portfolio. In .particular, it is discussed how to estimate such a curve in presence of right censored data, i.e. when the NPLs composing the portfolio have been observed in different time periods. Uncertainty about the estimates is depicted trough confidence bands obtained by using the non-parametric Bootstrap. The effectiveness of the proposals is shown by applying the method to a real financial data set about some portfolios of Italian unsecured NPLs taken in charge by a specialized operator

    Three-mode component analysis with crisp or fuzzy partition of units

    No full text
    A new methodology is proposed for the simultaneous reduction of units, variables, and occasions of a three-mode data set. Units are partitioned into a reduced number of classes, while, simultaneously, components for variables and occasions accounting for the largest common information for the classification are identified. The model is a constrained three-mode factor analysis and it can be seen as a generalization of the REDKM model proposed by De Soete and Carroll for two-mode data. The least squares fitting problem is mathematically formalized as a constrained problem in continuous and discrete variables. An iterative alternating least squares algorithm is proposed to give an efficient solution to this minimization problem in the crisp and fuzzy classification context. The performances of the proposed methodology are investigated by a simulation study comparing our model with other competing methodologies. Different procedures for starting the proposed algorithm have also been tested. A discussion of some interesting differences in the results follows. Finally, an application to real data illustrates the ability of the proposed model to provide substantive insights into the data complexities

    Two-mode multi-partitioning

    No full text
    New methodologies for two-mode (objects and variables) multi-partitioning of two way data are presented. In particular, by reanalyzing the double k-means, that identifies a unique partition for each mode of the data, a relevant extension is discussed which allows to specify more partitions of one mode, conditionally to the partition of the other one. The performance of such generalized double k-means has been tested by both a simulation study and an application to gene microarray data. (C) 2007 Elsevier B.V. All rights reserved

    Mixture models for simultaneous classification and reduction of three-way data

    No full text
    Finite mixture of Gaussians are often used to classify two- (units and variables) or three- (units, variables and occasions) way data. However, two issues arise: model complexity and capturing the true cluster structure. Indeed, a large number of variables and/or occasions implies a large number of model parameters; while the existence of noise variables (and/or occasions) could mask the true cluster structure. The approach adopted in the present paper is to reduce the number of model parameters by identifying a sub-space containing the information needed to classify the observations. This should also help in identifying noise variables and/or occasions. The maximum likelihood model estimation is carried out through an EM-like algorithm. The effectiveness of the proposal is assessed through a simulation study and an application to real data
    corecore