1,720,968 research outputs found
The role of Explainable Artificial Intelligence in risk assessment: a study on the economic and epidemiologic impact
The growing application of black-box Artificial Intelligence algorithms in many real-world application is raising the importance of understanding how the models make their decision. The research field that aims to "open" the black-box and to make the predictions more interpretable, is referred as eXplainable Artificial Intelligence (XAI). Another important field of research, strictly related to XAI, is the compression of information, also referred as dimensionality reduction. Having a synthetic set of few variables that captures the behaviour and the relationships of many more variables can be an effective tool for XAI as well. Thus, the contribution of the present thesis is the development of new approaches in the field of explainability, working on the two complementary pillars of dimensionality reduction and variables importance. The convergence of the two pillars copes with the aim of helping decision makers with the interpretation of the results.
This thesis is composed of seven chapters: an introduction and a conclusion plus five self contained sections reporting the corresponding papers. Chapter 1 proposes a PCA-based method to create a synthetic index to measure the condition of a country’s financial system, providing policy makers and financial institutions with a monitoring and policy tool that is easy to implement and update. In chapter 2, a Dynamic Factor Model is used to produce a synthetic index that is able to capture the time evolution of cross-country dependencies of financial variables. The index is proved to increase the accuracy in predicting the ease in accessing to financial funding. In chapter 3, a set of variables covering health, environmental safety infrastructures, demographic, economic and institutional effectiveness is used to test two methodologies to build an Epidemiological Susceptibility Risk index. The predictive power of both indexes is tested on forecasting task involving Macroeconomic variables. In chapter 4, the credit riskiness of Small Medium Enterprises (henceforth SMEs) is assessed by testing and assessing the increase of performance of a machine learning historical random forest model compared to an ordered probit model. The relevance of each variable in predicting SME credit risk is assessed by using Shapley values. In chapter 5, a dataset of Italian unlisted firms provides evidence of the importance of using market information when assessing the credit risk for SMEs. A non-linear dimensionality reduction technique is applied to assign market volatility from listed peers and to evaluate Merton's probability of default (PD). Results show the increase in accuracy of predicting the default of unlisted firms when using the evaluated PD. Moreover, the way PD affects the defaults is explored by assessing its contribution to the predicted outcome by the means of Shapley values
Can unlisted firms benefit from market information? A data-driven approach
[EN] We employ a sample of 10,136 Italian micro-, small-, and mid-sized enterprises (MSMEs) that borrow from 113 cooperative banks to examine whether market pricing of public firms adds additional information to accounting measures in predicting default of private firms. Specifically, we first match the asset prices of listed firms following a data-driven clustering by means of Neural Networks Autoencoder so to evaluate the firm-wise probability of default (PD) of MSMEs. Then, we adopt three statistical techniques, namely linear models, multivariate adaptive regression spline, and random forest to assess the performance of the models and to explain the relevance of each predictor. Our results provide novel evidence that market information represents a crucial indicator in predicting corporate default of unlisted firms. Indeed, we show a significant improvement of the model performance, both on class-specific (F1-score for defaulted class) and overall metrics (AUC) when using market information in credit risk assessment, in addition to accounting information. Moreover, by taking advantage of global and local variable importance technique we prove that the increase in performance is effectively attributable to market information, highlighting its relevant effect in predicting corporate default.Bitetto, A.; Filomeni, S.; Modina, M. (2022). Can unlisted firms benefit from market information? A data-driven approach. En 4th International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat Politècnica de València. 65-72. https://doi.org/10.4995/CARMA2022.2022.15045OCS657
Can we trust machine learning to predict the credit risk of small businesses?
With the emergence of Fintech lending, small firms can benefit from new channels of financing. In this setting, the creditworthiness and the decision to extend credit are often based on standardized and advanced machine-learning techniques that employ limited information. This paper investigates the ability of machine learning to correctly predict credit risk ratings for small firms. By employing a unique proprietary dataset on invoice lending activities, this paper shows that machine learning techniques overperform traditional techniques, such as probit, when the set of information available to lenders is limited. This paper contributes to the understanding of the reliability of advanced credit scoring techniques in the lending process to small businesses, making it a special interesting case for the Fintech environment
Machine learning and credit risk: Empirical evidence from small- and mid-sized businesses
In this paper, we compare two different approaches to estimate the credit risk for small- and mid-sized businesses (SMBs), namely a classic parametric approach, by fitting an ordered probit model, and a non-parametric approach, calibrating a machine learning historical random forest (HRF) model. The models are applied to a unique and proprietary dataset comprising granular firm-level quarterly data collected from a European investment bank and an international insurance company on a sample of 464 Italian SMBs over the period 2015–2017. Results show that the HRF approach outperforms the traditional ordered probit model, highlighting how advanced estimation methodologies that use machine learning techniques can be successfully implemented to predict SMB credit risk, i.e. when facing high asymmetries of information. Moreover, by using Shapley values, we are able to assess the relevance of each variable in predicting SMB credit risk
A nonlinear principal component analysis to study archeometric data
Statistical techniques, when applied to data obtained by chemical investigations on ancient artworks, are usually expected to recognize groups of objects to classify the archeological finds, to attribute the provenance of items compared with earlier investigated ones, or to determine whether an archaelogical attribution is possible or not. The statistical technique most frequently used in archeometry is the principal component analysis (PCA), because of its simplicity in theory and implementation. However, the application of PCA to archeometric data showed severe limitations because of its linear feature. Indeed, PCA is inadequate to classify data whose behavior describe a curve or a curved subspace of the original data space. As a consequence of it, an amount of information is lost because the multi-dimensional data space is compressed into a lower-dimensional subspace including principal components. The aim of this work is then to test a novel statistical technique for archeometry. We propose a nonlinear PCA method to extract maximum chemical information by plotting data on the smallest number of principal components and to answer archeological questions. The higher accuracy and effectiveness of nonlinear PCA approach with respect to standard PCA for the analysis of archeometric data are shown through the study of Apulian red figured pottery (fifth–fourth century BC) coming from some of the most relevant archeological sites of ancient Apulia (Monte Sannace (Gioia del Colle), Egnatia (Fasano), Canosa, Altamura, Conversano, and Arpi(Foggia)). Copyright © 2016 John Wiley & Sons, Ltd
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
- …
