1,721,162 research outputs found
Differential Privacy Theory
The problem of learning from data while preserving the privacy of individual observations has a long history and spans over multiple disciplines [1–3]. One way to preserve privacy is to corrupt the learning procedure with noise without destroying the information that we want to extract. Differential Privacy (DP) is one of the most powerful tools in this context [3, 4]
Learning fair models and representations
Machine learning based systems and products are reaching society at large in many aspects of everyday life, including financial lending, online advertising, pretrial and immigration detention, child maltreatment screening, health care, social services, and education. This phenomenon has been accompanied by an increase in concern about the ethical issues that may rise from the adoption of these technologies. In response to this concern, a new area of machine learning has recently emerged that studies how to address disparate treatment caused by algorithmic errors and bias in the data. The central question is how to ensure that the learned model does not treat subgroups in the population unfairly. While the design of solutions to this issue requires an interdisciplinary effort, fundamental progress can only be achieved through a radical change in the machine learning paradigm. In this work, we will describe the state of the art on algorithmic fairness using statistical learning theory, machine learning, and deep learning approaches that are able to learn fair models and data representation
PAC-Bayes Theory
It is well known that combining the output of several rules results in much better performance than using any one of them alone. In fact many state-of-the-art algorithms search for a weighted combination of simpler rules [1]: Bagging [2, 3], Boosting [4, 5] and Bayesian approaches [6] or even Kernel methods [7] and Neural Networks [8]
Resampling Methods
Resampling methods [1–4], also called Out-of-Sample methods, are favoured by practitioners because they work well in many situations and allow the application of simple statistical techniques for estimating the quantities of interest
Conclusions and Further Readings
In this book we tried to provide an intelligible overview of the problems of Model Selection and Error Estimation by focusing on the ideas behind the different Statistical Learning Theory based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice
Compression Bound
Compression bound is probably the simplest yet theoretically grounded approach to MS and EE. The Compression bound [1–3] relies on a simple idea: if an algorithm is able to compress the data provided to learn a rule then the algorithm will generalize
Algorithmic Stability Theory
The notion of Stability [1–3] allows to answer a fundamental question in learning theory: which are the properties that a learning algorithm A should fulfill in order to achieve good generalization performance? Stability answers this question in a very intuitive way: if A selects similar models, even if the training data are (slightly) modified, then we can be confident that the learning algorithm is stable
Complexity-Based Methods
The idea behind the complexity-based methods is that if an algorithm chooses from a small set of rules it will probably generalize. Basically, if we have a small set of rules and one of them has small empirical error, the risk of overfitting the data is small since the probability that this event has happened by chance is small. Vice versa if we have a large set of rules and one of them has small empirical error the risk that this event has happened for chance is high
Computational intelligence identifies alkaline phosphatase (Alp), alpha-fetoprotein (afp), and hemoglobin levels as most predictive survival factors for hepatocellular carcinoma
Liver cancer kills approximately 800 thousand people annually worldwide, and its most common subtype is hepatocellular carcinoma (HCC), which usually affects people with cirrhosis. Predicting survival of patients with HCC remains an important challenge, especially because technologies needed for this scope are not available in all hospitals. In this context, machine learning applied to medical records can be a fast, low-cost tool to predict survival and detect the most predictive features from health records. In this study, we analyzed medical data of 165 patients with HCC: we employed computational intelligence to predict their survival, and to detect the most relevant clinical factors able to discriminate survived from deceased cases. Afterwards, we compared our data mining results with those obtained through statistical tests and scientific literature findings. Our analysis revealed that blood levels of alkaline-phosphatase (ALP), alpha-fetoprotein (AFP), and hemoglobin are the most effective prognostic factors in this dataset. We found literature supporting association of these three factors with hepatoma, even though only AFP has been used in a prognostic index. Our results suggest that ALP and hemoglobin can be candidates for future HCC prognostic indexes, and that physicians could focus on ALP, AFP, and hemoglobin when studying HCC records
- …
