Search CORE

1,721,114 research outputs found

Generalized spherical principal component analysis

Author: Leyder Sarah
Verdonck Tim
Raymaekers Jakob
Publication venue
Publication date: 01/01/2024
Field of study

Outliers contaminating data sets are a challenge to statistical estimators. Even a small fraction of outlying observations can heavily influence most classical statistical methods. In this paper we propose generalized spherical principal component analysis, a new robust version of principal component analysis that is based on the generalized spatial sign covariance matrix. Theoretical properties of the proposed method including influence functions, breakdown values and asymptotic efficiencies are derived. These theoretical results are complemented with an extensive simulation study and two real-data examples. We illustrate that generalized spherical principal component analysis can combine great robustness with solid efficiency properties, in addition to a low computational cost

Maastricht University Research Portal

imec Publications (Interuniversity Microelectronics Centre)

Computational Efficient Approximations of the Concordance Probability in a Big Data Setting

Author: Verdonck Tim
Baesens Bart
Ponnet Jolien
Van Oirbeek Robin
Publication venue
Publication date: 01/01/2024
Field of study

Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory power of the model. Contrary to AUC, the concordance probability can also be extended to the situation with a continuous response variable. Due to the staggering size of data sets nowadays, determining this discriminatory measure requires a tremendous amount of costly computations and is hence immensely time consuming, certainly in case of a continuous response variable. Therefore, we propose two estimation methods that calculate the concordance probability in a fast and accurate way and that can be applied to both the discrete and continuous setting. Extensive simulation studies show the excellent performance and fast computing times of both estimators. Finally, experiments on two real-life data sets confirm the conclusions of the artificial simulations.sponsorship: This work was supported by the Allianz Research Chair Prescriptive business analytics in insurance at KU Leuven and the International Funds KU Leuven under Grant C16/15/068. (Allianz Research Chair Prescriptive business analytics in insurance at KU Leuven|C16/15/068, International Funds KU Leuven|C16/15/068)status: Publishe

Lirias

imec Publications (Interuniversity Microelectronics Centre)

Practicable optimization for portfolios that contain nonfungible tokens

Author: Verdonck Tim
serneels
Serneels Sven
Menvouta Emmanuel Jordy
Publication venue
Publication date: 01/01/2023
Field of study

imec Publications (Interuniversity Microelectronics Centre)

Robust and sparse logistic regression

Author: Cornilly Dries
Van Aelst Stefan
Verdonck Tim
Tubex Lise
Publication venue
Publication date: 27/11/2024
Field of study

No Statement Availabl

imec Publications (Interuniversity Microelectronics Centre)

Portfolio optimization using cellwise robust association measures and clustering methods with application to highly volatile markets

Author: Tim Verdonck
Verdonck Tim
Sven Serneels
Serneels Sven
Emmanuel Jordy Menvouta
Menvouta Emmanuel Jordy
Publication venue
Publication date: 19/04/2023
Field of study

This paper introduces the minCluster portfolio, which is a portfolio optimization method combining the optimization of downside risk measures, hierarchical clustering and cellwise robustness. Using cellwise robust association measures, the minCluster portfolio is able to retrieve the underlying hierarchical structure in the data. Furthermore, it provides downside protection by using tail risk measures for portfolio optimization. We show through simulation studies and a real data example that the minCluster portfolio produces better out-of-sample results than mean-variances or other hierarchical clustering based approaches. Cellwise outlier robustness makes the minCluster method particularly suitable for stable optimization of portfolios in highly volatile markets, such as portfolios containing cryptocurrencies

Crossref

Directory of Open Access Journals

imec Publications (Interuniversity Microelectronics Centre)

Interpretable cost-sensitive regression through one-step boosting

Author: Jakob Raymaekers
Tim Verdonck
Verdonck Tim
Decorte Thomas
Thomas Decorte
Raymaekers Jakob
Publication venue
Publication date: 31/12/2023
Field of study

In most practical prediction problems, such as regression and classification, the different types of prediction errors are not equally costly in the decision-making process. Although there exist numerous real-world cost-sensitive regression problems, ranging from loan charge-off forecasting to house price predictions, the literature on cost-sensitive learning mainly focuses on classification and only a few solutions are proposed for regression problems. These regressions are typically characterized by an asymmetric cost structure, where over- and underpredictions of a similar magnitude face vastly different costs. In this paper, we present a one-step boosting method (OSB) for cost-sensitive regression. The proposed methodology leverages a secondary learner to incorporate cost-sensitivity into an already trained cost-insensitive regression model. The secondary learner is defined as a linear function of certain variables deemed interesting for cost-sensitivity. These variables do not necessarily need to be the same as in the already trained model. An efficient optimization algorithm is achieved through iteratively reweighted least squares using the asymmetric cost function. The obtained results become interpretable through bootstrapping, enabling decision makers to distinguish important variables for cost-sensitivity as well as facilitating statistical inference. Applying different cost functions and various initial cost-insensitive learning methods on several public datasets consistently yields a significant reduction in the average misprediction cost, illustrating the excellent performance of our approach

Maastricht University Research Portal

Crossref

imec Publications (Interuniversity Microelectronics Centre)

Fraud Analytics: A Decade of Research -- Organizing Challenges and Solutions in the Field

Author: Tim Verdonck
Christopher Bockel-Rickermann
Bockel-Rickermann Christopher
Verdonck Tim
Verbeke Wouter
Wouter Verbeke
Publication venue
Publication date: 07/12/2022
Field of study

The literature on fraud analytics and fraud detection has seen a substantial increase in output in the past decade. This has led to a wide range of research topics and overall little organization of the many aspects of fraud analytical research. The focus of academics ranges from identifying fraudulent credit card payments to spotting illegitimate insurance claims. In addition, there is a wide range of methods and research objectives. This paper aims to provide an overview of fraud analytics in research and aims to more narrowly organize the discipline and its many subfields. We analyze a sample of almost 300 records on fraud analytics published between 2011 and 2020. In a systematic way, we identify the most prominent domains of application, challenges faced, performance metrics, and methods used. In addition, we build a framework for fraud analytical methods and propose a keywording strategy for future research. One of the key challenges in fraud analytics is access to public datasets. To further aid the community, we provide eight requirements for suitable data sets in research motivated by our research. We structure our sample of the literature in an online database. The database is available online for fellow researchers to investigate and potentially build upon

arXiv.org e-Print Archive

Crossref

imec Publications (Interuniversity Microelectronics Centre)

Robust instance-dependent cost-sensitive classification

Author: Vanderschueren Toon
Verdonck Tim
Verbeke Wouter
De Vos Simon
Publication venue
Publication date: 07/01/2024
Field of study

status: Published onlin

Lirias

imec Publications (Interuniversity Microelectronics Centre)

Data engineering for fraud detection

Author: H\uf6ppner Sebastiaan
Höppner Sebastiaan
Verdonck Tim
Baesens Bart
Publication venue
Publication date: 01/01/2021
Field of study

Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting suspicious transactions is a binary classification problem and therefore many techniques can be applied. Interpretability is however of utmost importance for the management to have confidence in the model and for designing fraud prevention strategies. Moreover, models that enable the fraud experts to understand the underlying reasons why a case is flagged as suspicious will greatly facilitate their job of investigating the suspicious transactions. Therefore, we propose several data engineering techniques to improve the performance of an analytical model while retaining the interpretability property. Our data engineering process is decomposed into several feature and instance engineering steps. We illustrate the improvement in performance of these data engineering steps for popular analytical models on a real payment transactions data set.</p

Lirias

Southampton (e-Prints Soton)

Institutional Repository Universiteit Antwerpen

direpack: A Python 3 package for state-of-the-art statistical dimensionality reduction methods

Author: Tim Verdonck
Verdonck Tim
Sven Serneels
Serneels Sven
Emmanuel Jordy Menvouta
Menvouta Emmanuel Jordy
Publication venue
Publication date: 22/11/2022
Field of study

The direpack package establishes a set of modern statistical dimensionality reduction techniques into the Python universe as a single, consistent package. Several of the methods included are only available as open source through direpack, whereas the package also offers competitive Python implementations of methods previously only available in other programming languages. In its present version, the package is structured in three subpackages for different approaches to dimensionality reduction: projection pursuit, sufficient dimension reduction and robust M estimators. As a corollary, the package also provides access to regularized regression estimators based on these reduced dimension spaces, as well as a set of classical and robust preprocessing utilities, including very recent developments such as generalized spatial signs. Finally, direpack has been written to be consistent with the scikit-learn API, such that the estimators can flawlessly be included into (statistical and/or machine) learning pipelines in that framework

Lirias

Directory of Open Access Journals

Institutional Repository Universiteit Antwerpen

imec Publications (Interuniversity Microelectronics Centre)