1,721,167 research outputs found
Model selection and error estimation without the agonizing pain
How can we select the best performing data-driven model? How can we rigorously estimate its generalization error? Statistical learning theory (SLT) answers these questions by deriving nonasymptotic bounds on the generalization error of a model or, in other words, by delivering upper bounding of the true error of the learned model based just on quantities computed on the available data. However, for a long time, SLT has been considered only as an abstract theoretical framework, useful for inspiring new learning approaches, but with limited applicability to practical problems. The purpose of this review is to give an intelligible overview of the problems of model selection (MS) and error estimation (EE), by focusing on the ideas behind the different SLT-based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice. We start by presenting the seminal works of the 80s until the most recent results, then discuss open problems and finally outline future directions of this field of research. This article is categorized under: Technologies > Statistical Fundamentals Algorithmic Development > Statistics
Byte The Bullet: Learning on Real-World Computing Architectures
Fast, effective, and reliable models: these are the desiderata of every theorist and practitioner. Machine Learning (ML) algorithms, proposed in the last decades, proved to be effective and reliable in solving complex real-world problems, but they are usually designed without taking into account the underlying computing architecture. On the contrary, the effort of contemplating the exploited computing device is often motivated by application-specific and real-world requirements, such as the need to accelerate the learning process with dedicated/distributed hardware, or to foster energy-sparing requirements of applications based on mobile standalone devices. The ESANN 2014 Byte The Bullet: Learning on Real-World Computing Architectures special session has pooled a compilation of the most recent proposals in this area, by encouraging submissions related to the development and the application of fast, effective, reliable techniques, which consider possibilities, potentialities and constraints of real-world computing architectures as basic cornerstones and motivations
Byte The Bullet: Learning on Real-World Computing Architectures
Fast, effective, and reliable models: these are the desiderata of every theorist and practitioner. Machine Learning (ML) algorithms, proposed in the last decades, proved to be effective and reliable in solving complex real-world problems, but they are usually designed without taking into account the underlying computing architecture. On the contrary, the effort of contemplating the exploited computing device is often motivated by application-specific and real-world requirements, such as the need to accelerate the learning process with dedicated/distributed hardware, or to foster energy-sparing requirements of applications based on mobile standalone devices. The ESANN 2014 Byte The Bullet: Learning on Real-World Computing Architectures special session has pooled a compilation of the most recent proposals in this area, by encouraging submissions related to the development and the application of fast, effective, reliable techniques, which consider possibilities, potentialities and constraints of real-world computing architectures as basic cornerstones and motivations
Learning hardware friendly classifiers through algorithmic risk minimization
Conventional Machine Learning (ML) algorithms do not contemplate computational constraints when learning models: when targeting their implementation on embedded devices, restrictions are related to, for example, limited depth of the arithmetic unit, memory availability, or battery capacity. We propose a new learning framework, i.e. Algorithmic Risk Minimization (ARM), which relies on the notion of stability of a learning algorithm, and includes computational constraints during the learning process. ARM allows to train resource-sparing models and enables to efficiently implement the next generation of ML methods for smart embedded systems. Advantages are shown on a case study conducted in the framework of Human Activity Recognition on Smartphones, on which we show that effective and computationally non-intensive models can be trained from data and implemented on the destination devices
Condition Based Maintenance in Railway Transportation Systems Based on Big Data Streaming Analysis
AbstractStreaming Data Analysis (SDA) of Big Data Streams (BDS) for Condition Based Maintenance (CBM) in the context of Rail Transportation Systems (RTS) is a state-of-the-art field of re- search. SDA of BDS is the problem of analyzing, modeling and extracting information from huge amounts of data that continuously come from several sources in real time through com- putational aware solutions. Among others, CBM for Rail Transportation is one of the most challenging SDA problems, consisting of the implementation of a predictive maintenance system for evaluating the future status of the monitored assets in order to reduce risks related to failures and to avoid service disruptions. The challenge is to collect and analyze all the data streams that come from the numerous on-board sensors monitoring the assets. This paper deals with the problem of CBM applied to the condition monitoring and predictive maintenance of train axle bearings based on sensors data collection, with the purpose of maximizing their Remaining Useful Life (RUL). In particular we propose a novel algorithm for CBM based on SDA that takes advantage of the Online Support Vector Regression (OL-SVR) for predicting the RUL. The novelty of our proposal is the heuristic approach for optimizing the trade-off between the accuracy of the OL-SVR models and the computational time and resources needed in order to build them. Results from tests on a real-world dataset show the actual benefits brought by the proposed methodology
Tuning the distribution dependent prior in the PAC-Bayes framework based on empirical data
In this paper we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution. In particular, following Catoni [1], we refine some recent generalisation bounds on the risk of the Gibbs Classifier, when the prior is defined in terms of the data generating distribution, and the posterior is defined in terms of the observed one. Moreover we show that the prior and the posterior distributions can be tuned based on the observed samples without worsening the convergence rate of the bounds and with a marginal impact on their constants
A local Vapnik-Chervonenkis complexity
We define in this work a new localized version of a Vapnik-Chervonenkis (VC) complexity, namely the Local VC-Entropy, and, building on this new complexity, we derive a new generalization bound for binary classifiers. The Local VC-Entropy-based bound improves on the original Vapnik's results because it is able to discard those functions that, most likely, will not be selected during the learning phase. The result is achieved by applying the localization principle to the original global complexity measure, in the same spirit of the Local Rademacher Complexity. By exploiting and improving a recently developed geometrical framework, we show that it is also possible to relate the Local VC-Entropy to the Local Rademacher Complexity by finding an admissible range for one given the other. In addition, the Local VC-Entropy allows one to reduce the computational requirements that arise when dealing with the Local Rademacher Complexity in binary classification problems
Learning Hardware-Friendly Classifiers through Algorithmic Stability
Most state-of-the-art Machine-Learning (ML) algorithms do not consider the computational constraints of implementing the learned model on embedded devices. These constraints are, for example, the limited depth of the arithmetic unit, the memory availability, or the battery capacity. We propose a new learning framework, the Algorithmic-Risk-Minimization (ARM), which relies on Algorithmic-Stability, and includes these constraints inside the learning process itself. ARM allows to train advanced resource-sparing ML models and to efficiently deploy them on smart embedded systems. Finally, we show the advantages of our proposal on a smartphone-based Human Activity Recognition application by comparing it to a conventional ML approach
Random Forests model selection
Random Forests (RF) of tree classifiers are a popular ensemble method for classification. RF have shown to be effective in many different real world classification problems and nowadays are considered as one of the best learning algorithms in this context. In this paper we discuss the effect of the hyperparameters of the RF over the accuracy of the final model, with particular reference to different theoretically grounded weighing strategies of the tree in the forest. In this way we go against the common misconception which considers RF as an hyperparameter-free learning algorithm. Results on a series of benchmark datasets show that performing an accurate Model Selection procedure can greatly improve the accuracy of the final RF classifier
Tikhonov, Ivanov and Morozov regularization for support vector machine learning
Learning according to the structural risk minimization principle can be naturally expressed as an Ivanov regularization problem. Vapnik himself pointed out this connection, when deriving an actual learning algorithm from this principle, like the well-known support vector machine, but quickly suggested to resort to a Tikhonov regularization schema, instead. This was, at that time, the best choice because the corresponding optimization problem is easier to solve and in any case, under certain hypothesis, the solutions obtained by the two approaches coincide. On the other hand, recent advances in learning theory clearly show that the Ivanov regularization scheme allows a more effective control of the learning hypothesis space and, therefore, of the generalization ability of the selected hypothesis. We prove in this paper the equivalence between the Ivanov and Tikhonov approaches and, for the sake of completeness, their connection to Morozov regularization, which has been show to be useful when effective estimation of the noise in the data is available. We also show that this equivalence is valid under milder conditions on the loss function with respect to Vapnik’s original proposal. These results allows us to derive several methods for performing SRM learning according to an Ivanov or Morozov regularization scheme, but using Tikhonov-based solvers, which have been thoroughly studied in the last decades and for which very efficient implementations have been proposed
- …
