1,721,040 research outputs found

    Deep learning for text-based credit scoring for micro, small and medium enterprises

    No full text
    Personal credit risk models are built upon a wealth of structured sociodemographic and behavioural data. It tends to be high in volume and low in cost and as a result, personal lending is a highly automated process. This, however, is not true for micro, medium and small business credit processing which is cumbersome and expensive for lenders.Often, a lack of sufficient structured data and the bespoke nature a credit request requires expert judgement on the creditworthiness of an organisation. This occurs in the first instance by a financial analyst who generates a written report, which is then usually passed onto a further assessor who makes the final decision based on the written report combined with other sources of available data.The purpose of this research is to eliminate the requirement for a second stage of assessment - where both the traditional variables and text-based evaluation by the credit agent are considered - by developing Deep Learning models that can capture the rich and dynamic information available from the written financial analyst reports, outputting a probability score that can then be used alongside other structured sources of information.The results suggest that the implementation of a semi-automated process would allow for both a more accurate and cost-effective approach assessing credit risk for micro, small and medium enterprises

    Deep residential representations: Using unsupervised learning to unlock elevation data for geo-demographic prediction

    Full text link
    LiDAR (short for "Light Detection And Ranging" or "Laser Imaging, Detection, And Ranging") technology can be used to provide detailed three-dimensional elevation maps of urban and rural landscapes. To date, airborne LiDAR imaging has been predominantly confined to the environmental and archaeological domains. However, the geographically granular and open-source nature of this data also lends itself to an array of societal, organizational and business applications where geo-demographic type data is utilised. Arguably, the complexity involved in processing this multi-dimensional data has thus far restricted its broader adoption. In this paper, we propose a series of convenient task-agnostic tile elevation embeddings to address this challenge, using recent advances from unsupervised Deep Learning. We test the potential of our embeddings by predicting seven English indices of deprivation (2019) for small geographies in the Greater London area. These indices cover a range of socio-economic outcomes and serve as a proxy for a wide variety of downstream tasks to which the embeddings can be applied. We consider the suitability of this data not just on its own but also as an auxiliary source of data in combination with demographic features, thus providing a realistic use case for the embeddings. Having trialled various model/embedding configurations, we find that our best performing embeddings lead to Root-Mean-Squared-Error (RMSE) improvements of up to 21% over using standard demographic features alone. We also demonstrate how our embedding pipeline, using Deep Learning combined with K-means clustering, produces coherent tile segments which allow the latent embedding features to be interpreted.Comment: 29 pages, 13 figures. V2 - Publishe

    Advanced turbidity prediction for operational water supply planning

    No full text
    Turbidity is an optical quality of water caused by suspended solids that give the appearance of ‘cloudiness’. While turbidity itself does not directly present a hazard to human health, it can be an indication of poor water quality and mask the presence of parasites such as Cryptosporidium. It is, therefore, a recommendation of the World Health Organisation (WHO) that turbidity should not exceed a level of 1 Nephelometric Turbidity Unit (NTU) before chlorination. For a drinking water supplier, turbidity peaks can be highly disruptive requiring the temporary shutdown of a water treatment works. Such events must be carefully managed to ensure continued supply; to recover the supply deficit, water stores must be depleted or alternative works utilised. Machine learning techniques have been shown to be effective for the modelling of complex environmental systems, often used to help shape environmental policy. We contribute to the literature by adopting such techniques for operational purposes, developing a decision support tool that predicts >1 NTU turbidity events up to seven days in advance allowing water supply managers to make proactive interventions. We apply a Generalised Linear Model (GLM) and a Random Forest (RF) model for the prediction of >1 NTU events. AUROC scores of over 0.80 at five of six sites suggest that machine learning techniques are suitable for predicting turbidity peaking events. Furthermore, we find that the RF model can provide a modest performance boost due to its stronger capacity to capture nonlinear interactions in the data

    Novel applications of advanced predictive analytics and artificial intelligence to improve SME competitiveness and access to funding

    No full text
    Small and Medium-sized Enterprises (SMEs) are a collective group of organisations that make a significant societal and economic impact globally. However, these organisations face numerous challenges compared to their larger counterparts. Over the three papers that form this thesis, methodologies are developed from the Artificial Intelligence (AI) and Machine Learning (ML) domains to enhance SME competitiveness and access to funding. The first paper addresses the access to funding challenge driven by information asymmetries and prohibitive costs faced by SME credit lenders. Specifically, a deep language model is applied to loan officer free-text assessments to predict default risk. The study shows that the text alone effectively predicts default and, when combined with traditional credit scoring data, is suitable for partly automating the SME lending process while offering insights to address information asymmetries. The second paper then moves on to the competitiveness challenge. Many SMEs lack the resources and technical expertise to leverage advanced AI models and unstructured data. In this paper, unsupervised computer vision and deep learning methodologies are developed to create user-friendly feature representations for remote sensing data. Specifically, the derived LiDAR imagery representations are tested in a predictive context using socio-economic outcomes for small geographies in Greater London. The results demonstrate that these accessible representations outperform baselines using standard features alone, thus making them suitable for organisations like SMEs.Finally, the third paper considers how SMEs can benefit from enhanced automation and decision-making by adopting AI systems. Focusing on developing a decision support tool to improve debt recovery using customer behavioural data, this study combines deep sequence learning and uplift modelling to identify customers more likely to respond to targeted interventions. When applied to a dataset supplied by a small utility company, the results show significant performance improvements compared to baseline models. As a result, such approaches can assist SMEs in reducing their debt book value and streamlining recovery resource allocation

    The value of text for small business default prediction: a deep learning approach

    No full text
    Compared to consumer lending, Micro, Small and Medium Enterprise (mSME) credit risk modelling is particularly challenging, as, often, the same sources of information are not available. Therefore, it is standard policy for a loan officer to provide a textual loan assessment to mitigate limited data availability. In turn, this statement is analysed by a credit expert alongside any available standard credit data. In our paper, we exploit recent advances from the field of Deep Learning and Natural Language Processing (NLP), including the BERT (Bidirectional Encoder Representations from Transformers) model, to extract information from 60000 textual assessments provided by a lender. We consider the performance in terms of the AUC (Area Under the receiver operating characteristic Curve) and Brier Score metrics and find that the text alone is surprisingly effective for predicting default. However, when combined with traditional data, it yields no additional predictive capability, with performance dependent on the text’s length. Our proposed deep learning model does, however, appear to be robust to the quality of the text and therefore suitable for partly automating the mSME lending process. We also demonstrate how the content of loan assessments influences performance, leading us to a series of recommendations on a new strategy for collecting future mSME loan assessments

    Subjective machines: probabilistic risk assessment based on deep learning of soft information

    No full text
    For several years machine learning methods have been proposed for risk classification. While machine learning methods have also been used for failure diagnosis and condition monitoring, to the best of our knowledge, these methods have not been used for probabilistic risk assessment. Probabilistic risk assessment is a subjective process. The problem of how well machine learning methods can emulate expert judgments is challenging. Expert judgments are based on mental shortcuts, heuristics, which are susceptible to biases. This paper presents a process for developing natural language-based probabilistic risk assessment models, applying deep learning algorithms to emulate experts’ quantified risk estimates. This allows the risk analyst to obtain an a priori risk assessment when there is limited information in the form of text and numeric data. Universal sentence embedding (USE) with gradient boosting regression (GBR) trees trained over limited structured data presented the most promising results. When we apply these models’ outputs to generate survival distributions for autonomous systems’ likelihood of loss with distance, we observe that for open water and ice shelf operating environments, the differences between the survival distributions generated by the machine learning algorithm and those generated by the experts are not statistically significant.</p

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore