1,721,547 research outputs found

    Complete search for feature selection in decision trees

    Full text link
    The search space for the feature selection problem in decision tree learning is the lattice of subsets of the available features. We design an exact enumeration procedure of the subsets of features that lead to all and only the distinct decision trees built by a greedy top-down decision tree induction algorithm. The procedure stores, in the worst case, a number of trees linear in the number of features. By exploiting a further pruning of the search space, we design a complete procedure for finding δ-acceptable feature subsets, which depart by at most δ from the best estimated error over any feature subset. Feature subsets with the best estimated error are called best feature subsets. Our results apply to any error estimator function, but experiments are mainly conducted under the wrapper model, in which the misclassification error over a search set is used as an estimator. The approach is also adapted to the design of a computational optimization of the sequential backward elimination heuristic, extending its applicability to large dimensional datasets. The procedures of this paper are implemented in a multi-core data parallel C++ system. We investigate experimentally the properties and limitations of the procedures on a collection of 20 benchmark datasets, showing that oversearching increases both overfitting and instability

    A Model-Agnostic Heuristics for Selective Classification

    No full text
    Selective classification (also known as classification with reject option) conservatively extends a classifier with a selection function to determine whether or not a prediction should be accepted (i.e., trusted, used, deployed). This is a highly relevant issue in socially sensitive tasks, such as credit scoring. State-of-the-art approaches rely on Deep Neural Networks (DNNs) that train at the same time both the classifier and the selection function. These approaches are model-specific and computationally expensive. We propose a model-agnostic approach, as it can work with any base probabilistic binary classification algorithm, and it can be scalable to large tabular datasets if the base classifier is so. The proposed algorithm, called SCROSS, exploits a cross-fitting strategy and theoretical results for quantile estimation to build the selection function. Experiments on real-world data show that SCROSS improves over existing methods

    On the Stability of Interpretable Models

    Full text link
    Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process. Bias in data collection and preparation, or in model's construction may severely affect the accountability of the design process. We conduct an experimental study of the stability of interpretable models with respect to feature selection, instance selection, and model selection. Our conclusions should raise awareness and attention of the scientific community on the need of a stability impact assessment of interpretable models

    AUC-based Selective Classification

    No full text
    Selective classification (or classification with a reject option) pairs a classifier with a selection function to determine whether or not a prediction should be accepted. This framework trades off coverage (probability of accepting a prediction) with predictive performance, typically measured by distributive loss functions. In many application scenarios, such as credit scoring, performance is instead measured by ranking metrics, such as the Area Under the ROC Curve (AUC). We propose a model-agnostic approach to associate a selection function to a given probabilistic binary classifier. The approach is specifically targeted at optimizing the AUC. We provide both theoretical justifications and a novel algorithm, called AUCROSS, to achieve such a goal. Experiments show that our method succeeds in trading-off coverage for AUC, improving over existing selective classification methods targeted at optimizing accuracy

    A Graduate Program in Business Informatics: Experiences at the University of Pisa

    No full text
    At the University of Pisa, a graduate program was started in 2002 to prepare professionals with an interdisciplinary skills both in informatics and in business to satisfy the increasing demand by companies to compete using analytics methods. The graduate program focused on Business Intelligence techniques to support decision making. This paper presents the structure of the graduate program, the results achieved in the first six years, and how it has been redesigned to satisfy the requirements of the new ministerial law regard- ing the curricula of the Italian universities

    Re-defining Parkinson's disease

    No full text
    Analyzing non-motor symptoms in Parkinson’s disease (PD) leads to critically re-define and update the disorder itself. The present Editorial encompasses epidemiological and clinical studies on PD patients joined with experimental findings to provide a novel definition of PD based on clinical, neuroanatomical and neurobiological findings. In fact, the plethora of symptoms described in PD patients are due to specific anatomical alterations which cluster in specific disease phenotypes. These PDs differ for disease onset and progression, disease severity and specific cluster of non-motor disturbances. Despite the variety of PD phenotypes, it is now well established that in almost all PD subgroups (except those autosomic recessive selective disorders exemplified by Parkin disease) a core anatomical defnition exists recruiting a variety of brainstem monoamine nuclei. Such a variety of PD pathologies can be defined as monoamine brainstem disorder (MBD)

    Risultati preliminari del monitoraggio di lepidotteri diurni in un sito di interesse comunitario per differenti scopi applicativi

    No full text
    Durante il XX secolo e specialmente negli ultimi 50 anni è stato riportato un declino nel numero specie di lepidotteri diurni presenti in Europa, ed ad una modificazione delle comunità delle farfalle. Tale decremento è stato attributo a diversi fattori: frammentazione e la scomparsa degli habitat e l'intensificazione dell'agricoltura e delle pratiche ad essa connesse. L'introduzione di nuovi prodotti in agricoltura, quali ad esempio le piante geneticamente modificate o i prodotti a base di virus entomopatogeni hanno posto ancor più l'accento sulla necessità di monitorare i lepidotteri non target (principalmente i diurni) sia come data set iniziale sia per i piani di post monitoring. A tal fine sono state selezionate 4 aree rappresentative di quattro differenti tipologie di habitat: uliveto, bosco misto, ampelodesmeto e macchia mediterranea. Per ogni habitat sono stati scelti 4 transetti per il monitoraggio dei lepidotteri diurni. Le aree campione sono site nel demanio forestale di San Martino delle Scale (Palermo) all’interno del Sito di Interesse Comunitario” Raffo Rosso, Monte Cuccio e Vallone Sagana.” La lunghezza dei transetti è pari a 50 m cad. e il tempo di percorrenza nelle varie aree è compreso tra i 3-5 minuti. La larghezza dei transetti è stata di 2.5 metri per lato (totale 5 m). I censimenti sono stati eseguiti ogni 15 giorni durante i mesi compresi tra Maggio e Settembre. I dati faunistici, quelli relativi all'abbondanza e alla fenologia dei voli saranno messi in relazione con la fenologia delle colture maggiormente presenti o che saranno potenzialmente adottate e con i periodi di trattamenti (distanze, deriva dei prodotti ecc.) al fine di dare possibili indicazioni sul potenziale rischio delle differenti specie di lepidotteri
    corecore