Search CORE

1,721,162 research outputs found

Differential Privacy Theory

Author: Oneto L.
Publication venue
Publication date: 01/01/2020
Field of study

The problem of learning from data while preserving the privacy of individual observations has a long history and spans over multiple disciplines [1–3]. One way to preserve privacy is to corrupt the learning procedure with noise without destroying the information that we want to extract. Differential Privacy (DP) is one of the most powerful tools in this context [3, 4]

Archivio istituzionale della ricerca - Università di Genova

Learning fair models and representations

Author: Oneto L.
Publication venue
Publication date: 01/01/2020
Field of study

Machine learning based systems and products are reaching society at large in many aspects of everyday life, including financial lending, online advertising, pretrial and immigration detention, child maltreatment screening, health care, social services, and education. This phenomenon has been accompanied by an increase in concern about the ethical issues that may rise from the adoption of these technologies. In response to this concern, a new area of machine learning has recently emerged that studies how to address disparate treatment caused by algorithmic errors and bias in the data. The central question is how to ensure that the learned model does not treat subgroups in the population unfairly. While the design of solutions to this issue requires an interdisciplinary effort, fundamental progress can only be achieved through a radical change in the machine learning paradigm. In this work, we will describe the state of the art on algorithmic fairness using statistical learning theory, machine learning, and deep learning approaches that are able to learn fair models and data representation

Archivio istituzionale della ricerca - Università di Genova

Preface

Author: Oneto L.
Publication venue
Publication date: 01/01/2020
Field of study

Archivio istituzionale della ricerca - Università di Genova

PAC-Bayes Theory

Author: Oneto L.
Publication venue
Publication date: 01/01/2020
Field of study

It is well known that combining the output of several rules results in much better performance than using any one of them alone. In fact many state-of-the-art algorithms search for a weighted combination of simpler rules [1]: Bagging [2, 3], Boosting [4, 5] and Bayesian approaches [6] or even Kernel methods [7] and Neural Networks [8]

Archivio istituzionale della ricerca - Università di Genova

Resampling Methods

Author: Luca Oneto
Oneto L.
Publication venue
Publication date: 18/07/2019
Field of study

Resampling methods [1–4], also called Out-of-Sample methods, are favoured by practitioners because they work well in many situations and allow the application of simple statistical techniques for estimating the quantities of interest

Crossref

Archivio istituzionale della ricerca - Università di Genova

Conclusions and Further Readings

Author: Luca Oneto
Oneto L.
Publication venue
Publication date: 18/07/2019
Field of study

In this book we tried to provide an intelligible overview of the problems of Model Selection and Error Estimation by focusing on the ideas behind the different Statistical Learning Theory based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice

Crossref

Archivio istituzionale della ricerca - Università di Genova

Compression Bound

Author: Luca Oneto
Oneto L.
Publication venue
Publication date: 18/07/2019
Field of study

Compression bound is probably the simplest yet theoretically grounded approach to MS and EE. The Compression bound [1–3] relies on a simple idea: if an algorithm is able to compress the data provided to learn a rule then the algorithm will generalize

Crossref

Archivio istituzionale della ricerca - Università di Genova

Algorithmic Stability Theory

Author: Luca Oneto
Oneto L.
Publication venue
Publication date: 18/07/2019
Field of study

The notion of Stability [1–3] allows to answer a fundamental question in learning theory: which are the properties that a learning algorithm A should fulfill in order to achieve good generalization performance? Stability answers this question in a very intuitive way: if A selects similar models, even if the training data are (slightly) modified, then we can be confident that the learning algorithm is stable

Crossref

Archivio istituzionale della ricerca - Università di Genova

Complexity-Based Methods

Author: Luca Oneto
Oneto L.
Publication venue
Publication date: 18/07/2019
Field of study

The idea behind the complexity-based methods is that if an algorithm chooses from a small set of rules it will probably generalize. Basically, if we have a small set of rules and one of them has small empirical error, the risk of overfitting the data is small since the probability that this event has happened by chance is small. Vice versa if we have a large set of rules and one of them has small empirical error the risk that this event has happened for chance is high

Crossref

Archivio istituzionale della ricerca - Università di Genova

Computational intelligence identifies alkaline phosphatase (Alp), alpha-fetoprotein (afp), and hemoglobin levels as most predictive survival factors for hepatocellular carcinoma

Author: Chicco D.
Oneto L.
Publication venue
Publication date: 01/01/2021
Field of study

Liver cancer kills approximately 800 thousand people annually worldwide, and its most common subtype is hepatocellular carcinoma (HCC), which usually affects people with cirrhosis. Predicting survival of patients with HCC remains an important challenge, especially because technologies needed for this scope are not available in all hospitals. In this context, machine learning applied to medical records can be a fast, low-cost tool to predict survival and detect the most predictive features from health records. In this study, we analyzed medical data of 165 patients with HCC: we employed computational intelligence to predict their survival, and to detect the most relevant clinical factors able to discriminate survived from deceased cases. Afterwards, we compared our data mining results with those obtained through statistical tests and scientific literature findings. Our analysis revealed that blood levels of alkaline-phosphatase (ALP), alpha-fetoprotein (AFP), and hemoglobin are the most effective prognostic factors in this dataset. We found literature supporting association of these three factors with hepatoma, even though only AFP has been used in a prognostic index. Our results suggest that ALP and hemoglobin can be candidates for future HCC prognostic indexes, and that physicians could focus on ALP, AFP, and hemoglobin when studying HCC records

Archivio istituzionale della ricerca - Università di Genova