1,721,040 research outputs found
Deep learning for text-based credit scoring for micro, small and medium enterprises
Personal credit risk models are built upon a wealth of structured sociodemographic and behavioural data. It tends to be high in volume and low in cost and as a result, personal lending is a highly automated process. This, however, is not true for micro, medium and small business credit processing which is cumbersome and expensive for lenders.Often, a lack of sufficient structured data and the bespoke nature a credit request requires expert judgement on the creditworthiness of an organisation. This occurs in the first instance by a financial analyst who generates a written report, which is then usually passed onto a further assessor who makes the final decision based on the written report combined with other sources of available data.The purpose of this research is to eliminate the requirement for a second stage of assessment - where both the traditional variables and text-based evaluation by the credit agent are considered - by developing Deep Learning models that can capture the rich and dynamic information available from the written financial analyst reports, outputting a probability score that can then be used alongside other structured sources of information.The results suggest that the implementation of a semi-automated process would allow for both a more accurate and cost-effective approach assessing credit risk for micro, small and medium enterprises
Deep residential representations: Using unsupervised learning to unlock elevation data for geo-demographic prediction
LiDAR (short for "Light Detection And Ranging" or "Laser Imaging, Detection,
And Ranging") technology can be used to provide detailed three-dimensional
elevation maps of urban and rural landscapes. To date, airborne LiDAR imaging
has been predominantly confined to the environmental and archaeological
domains. However, the geographically granular and open-source nature of this
data also lends itself to an array of societal, organizational and business
applications where geo-demographic type data is utilised. Arguably, the
complexity involved in processing this multi-dimensional data has thus far
restricted its broader adoption. In this paper, we propose a series of
convenient task-agnostic tile elevation embeddings to address this challenge,
using recent advances from unsupervised Deep Learning. We test the potential of
our embeddings by predicting seven English indices of deprivation (2019) for
small geographies in the Greater London area. These indices cover a range of
socio-economic outcomes and serve as a proxy for a wide variety of downstream
tasks to which the embeddings can be applied. We consider the suitability of
this data not just on its own but also as an auxiliary source of data in
combination with demographic features, thus providing a realistic use case for
the embeddings. Having trialled various model/embedding configurations, we find
that our best performing embeddings lead to Root-Mean-Squared-Error (RMSE)
improvements of up to 21% over using standard demographic features alone. We
also demonstrate how our embedding pipeline, using Deep Learning combined with
K-means clustering, produces coherent tile segments which allow the latent
embedding features to be interpreted.Comment: 29 pages, 13 figures. V2 - Publishe
Advanced turbidity prediction for operational water supply planning
Turbidity is an optical quality of water caused by suspended solids that give the appearance of ‘cloudiness’. While turbidity itself does not directly present a hazard to human health, it can be an indication of poor water quality and mask the presence of parasites such as Cryptosporidium. It is, therefore, a recommendation of the World Health Organisation (WHO) that turbidity should not exceed a level of 1 Nephelometric Turbidity Unit (NTU) before chlorination. For a drinking water supplier, turbidity peaks can be highly disruptive requiring the temporary shutdown of a water treatment works. Such events must be carefully managed to ensure continued supply; to recover the supply deficit, water stores must be depleted or alternative works utilised. Machine learning techniques have been shown to be effective for the modelling of complex environmental systems, often used to help shape environmental policy. We contribute to the literature by adopting such techniques for operational purposes, developing a decision support tool that predicts >1 NTU turbidity events up to seven days in advance allowing water supply managers to make proactive interventions. We apply a Generalised Linear Model (GLM) and a Random Forest (RF) model for the prediction of >1 NTU events. AUROC scores of over 0.80 at five of six sites suggest that machine learning techniques are suitable for predicting turbidity peaking events. Furthermore, we find that the RF model can provide a modest performance boost due to its stronger capacity to capture nonlinear interactions in the data
Novel applications of advanced predictive analytics and artificial intelligence to improve SME competitiveness and access to funding
Small and Medium-sized Enterprises (SMEs) are a collective group of organisations that make a significant societal and economic impact globally. However, these organisations face numerous challenges compared to their larger counterparts. Over the three papers that form this thesis, methodologies are developed from the Artificial Intelligence (AI) and Machine Learning (ML) domains to enhance SME competitiveness and access to funding. The first paper addresses the access to funding challenge driven by information asymmetries and prohibitive costs faced by SME credit lenders. Specifically, a deep language model is applied to loan officer free-text assessments to predict default risk. The study shows that the text alone effectively predicts default and, when combined with traditional credit scoring data, is suitable for partly automating the SME lending process while offering insights to address information asymmetries. The second paper then moves on to the competitiveness challenge. Many SMEs lack the resources and technical expertise to leverage advanced AI models and unstructured data. In this paper, unsupervised computer vision and deep learning methodologies are developed to create user-friendly feature representations for remote sensing data. Specifically, the derived LiDAR imagery representations are tested in a predictive context using socio-economic outcomes for small geographies in Greater London. The results demonstrate that these accessible representations outperform baselines using standard features alone, thus making them suitable for organisations like SMEs.Finally, the third paper considers how SMEs can benefit from enhanced automation and decision-making by adopting AI systems. Focusing on developing a decision support tool to improve debt recovery using customer behavioural data, this study combines deep sequence learning and uplift modelling to identify customers more likely to respond to targeted interventions. When applied to a dataset supplied by a small utility company, the results show significant performance improvements compared to baseline models. As a result, such approaches can assist SMEs in reducing their debt book value and streamlining recovery resource allocation
The value of text for small business default prediction: a deep learning approach
Compared to consumer lending, Micro, Small and Medium Enterprise (mSME) credit risk modelling is particularly challenging, as, often, the same sources of information are not available. Therefore, it is standard policy for a loan officer to provide a textual loan assessment to mitigate limited data availability. In turn, this statement is analysed by a credit expert alongside any available standard credit data. In our paper, we exploit recent advances from the field of Deep Learning and Natural Language Processing (NLP), including the BERT (Bidirectional Encoder Representations from Transformers) model, to extract information from 60000 textual assessments provided by a lender. We consider the performance in terms of the AUC (Area Under the receiver operating characteristic Curve) and Brier Score metrics and find that the text alone is surprisingly effective for predicting default. However, when combined with traditional data, it yields no additional predictive capability, with performance dependent on the text’s length. Our proposed deep learning model does, however, appear to be robust to the quality of the text and therefore suitable for partly automating the mSME lending process. We also demonstrate how the content of loan assessments influences performance, leading us to a series of recommendations on a new strategy for collecting future mSME loan assessments
Subjective machines: probabilistic risk assessment based on deep learning of soft information
For several years machine learning methods have been proposed for risk classification. While machine learning methods have also been used for failure diagnosis and condition monitoring, to the best of our knowledge, these methods have not been used for probabilistic risk assessment. Probabilistic risk assessment is a subjective process. The problem of how well machine learning methods can emulate expert judgments is challenging. Expert judgments are based on mental shortcuts, heuristics, which are susceptible to biases. This paper presents a process for developing natural language-based probabilistic risk assessment models, applying deep learning algorithms to emulate experts’ quantified risk estimates. This allows the risk analyst to obtain an a priori risk assessment when there is limited information in the form of text and numeric data. Universal sentence embedding (USE) with gradient boosting regression (GBR) trees trained over limited structured data presented the most promising results. When we apply these models’ outputs to generate survival distributions for autonomous systems’ likelihood of loss with distance, we observe that for open water and ice shelf operating environments, the differences between the survival distributions generated by the machine learning algorithm and those generated by the experts are not statistically significant.</p
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
- …
