1,721,019 research outputs found

    Vers une automatisation efficace et explicable des processus d’apprentissage automatique : Application à l’Industrie 4.0

    No full text
    Machine learning (ML) has penetrated all aspects of the modern life, and brought more convenience and satisfaction for variables of interest. However, building such solutions is a time consuming and challenging process that requires highly technical expertise. This certainly engages many more people, not necessarily experts, to perform analytics tasks. While the selection and the parametrization of ML models require tedious episodes of trial and error. Additionally, domain experts often lack the expertise to apply advanced analytics. Consequently, they intend frequent consultations with data scientists. However, these collaborations often result in increased costs in terms of undesired delays. It thus can lead risks such as human-resource bottlenecks. Subsequently, as the tasks become more complex, similarly the more support solutions are needed for theincreased ML usability for the non-ML masters. To that end, Automated ML(AutoML) is a data-mining formalism with the aim of redureducing human effort and readily improving the development cycle through automation. The field of AutoML aims to make these decisions in a data-driven, objective, and automated way. Thereby, AutoML makes ML techniques accessible to domain scientists who are interested in applying advanced analytics but lack the required expertise. This can be seen as a democratization of ML. AutoML is usually treated as an algorithms selection and parametrization problem. In this regard, existing approaches include Bayesian optimization, evolutionary algorithms as well as reinforcement learning. Theseapproaches have focused on providing user assistance by automating parts or the entire data analysis process, but without being concerned on its impact on the analysis. The goal has generally been focused on the performance factors, thus leaving aside other important and even crucial aspects such as computational complexity, confidence and transparency. In contrast, this thesis aims at developing alternative methods that provide assistance in building appropriate modeling techniques while providing the rationale for the selected models. In particular, we consider this important demand in intelligent assistance as a meta-analysis process, and we make progress towards addressing two challenges in AutoML research. First, to overcome the computational complexity problem, we studied a formulation of AutoML as a recommendation problem, and proposed a new conceptualization of a Meta-Learning (MtL)-based expert system capable of recommending optimal ML pipelines for a given task; Second, we investigated the automatic explainability aspect of the AutoML process to address the problem of the acceptance of, and the trust in such black-boxes support systems. To this end, we have designed and implemented a framework architecture that leverages ideas from MtL to learn the relationship between a new set of datasets meta-data and mining algorithms. This eventually enables recommending ML pipelines according to their potential impact on the analysis. To guide the development of our work, we chose to focus on the Industry 4.0 as a main field of application for all the constraints it offers.Finally, in this doctoral thesis, we focus on the user assistance in the algorithms selection and tuning step. We devise an architecture and build a tool, AMLBID, that provides users support with the aim of improving the analysis and decreasing the amount of time spent in algorithms selection and parametrization. It is a tool that for the first time does not aim at providing data analysis support only, but instead, it is oriented towards positively contributing to the trust-in such powerful support systems by automatically providing a set of explanation levels to inspect the provided results.L’industrie du futur introduit de nouveaux concepts, processus et pratiques conduisant à des mutations profondes dans le pilotage des systèmes d’information associés. Une des problématiques cruciales est l’utilisation de la quantité importante de données, notamment celles produites par les différents dispositifs d’acquisition de données (Cyber Physical Systems, etc.), pour en extraire de la connaissance destinée à la maîtrise des processus de l’entreprise à travers un système d’information évolutif, réactif et adapté aux spécificités de l’industrie 4.0. L’intelligence artificielle et plus particulièrement l’apprentissage automatique fournit les algorithmes, méthodes et outils permettant l’extraction de connaissances et de modèles à partir des données représentant l’activité d’une entreprise et son environnement, et l’apport de plus d’automatisation des processus sous-jacents. Cependant, de nombreuses entreprises ne disposent pas de moyens humains leur permettant de déployer efficacement des solutions d’apprentissage automatique. Cela s’explique notamment par le fait que la construction de telles solutions est un processus long et difficile qui nécessite une expertise hautement technique et intersectorielle et qui est une ressource limitée. Nous nous intéressons donc à ce besoin d’assistance à l’analyse de données, qui commence à recevoir une certaine attention des communautés scientifiques, donnant naissance au domaine dit d’apprentissage automatique automatisé. L’apprentissage automatique automatisé est devenu un domaine en plein essor qui vise à rendre l’application des méthodes d’apprentissage automatique aussi dépourvue d’intervention humaine que possible. A cet égard, les approches existantes se révèlent souvent similaires et peu abouties. Ces approches sont concentrées sur l’assistance de l’utilisateur en automatisant une partie ou l’ensemble du processus d’analyse de données, mais sans se soucier de son impact sur l’analyse. L’objectif a généralement été axé sur les facteurs de performance, laissant ainsi de côté d’autres aspects importants, voire cruciaux, tels que la complexité du calcul, la confiance et la transparence. Cette observation nous a amenés à orienter nos recherches vers le domaine du Meta-Apprentissage (MtL) et à développer des méthodes alternatives qui apportent une aide à la construction des techniques de modélisation appropriées tout en fournissant le rationnel des modèles ML sélectionnés. En particulier, nous considérons cette demande importante d’assistance intelligente comme un processus de méta-analyse, et nous progressons vers la résolution de deux défis de la recherche en AutoML. Dans un premier temps, pour palier au problème de la complexité du calcul, nous avons étudié une formulation de l’AutoML en tant que problème de recommandation, puis proposé une nouvelle conceptualisation d’un système expert basé sur le MtL capable de recommander des pipelines ML optimaux pour une tâche donnée. Dans un second temps, nous avons traité l’explicabilité du processus d’aide à la décision de l’AutoML pour prendre en compte la problématique de l’acceptation et la confiance en ces systèmes généralement vus comme des boîtes noires

    Vers une automatisation efficace et explicable des processus d’apprentissage automatique : Application à l’Industrie 4.0

    No full text
    Machine learning (ML) has penetrated all aspects of the modern life, and brought more convenience and satisfaction for variables of interest. However, building such solutions is a time consuming and challenging process that requires highly technical expertise. This certainly engages many more people, not necessarily experts, to perform analytics tasks. While the selection and the parametrization of ML models require tedious episodes of trial and error. Additionally, domain experts often lack the expertise to apply advanced analytics. Consequently, they intend frequent consultations with data scientists. However, these collaborations often result in increased costs in terms of undesired delays. It thus can lead risks such as human-resource bottlenecks. Subsequently, as the tasks become more complex, similarly the more support solutions are needed for theincreased ML usability for the non-ML masters. To that end, Automated ML(AutoML) is a data-mining formalism with the aim of redureducing human effort and readily improving the development cycle through automation. The field of AutoML aims to make these decisions in a data-driven, objective, and automated way. Thereby, AutoML makes ML techniques accessible to domain scientists who are interested in applying advanced analytics but lack the required expertise. This can be seen as a democratization of ML. AutoML is usually treated as an algorithms selection and parametrization problem. In this regard, existing approaches include Bayesian optimization, evolutionary algorithms as well as reinforcement learning. Theseapproaches have focused on providing user assistance by automating parts or the entire data analysis process, but without being concerned on its impact on the analysis. The goal has generally been focused on the performance factors, thus leaving aside other important and even crucial aspects such as computational complexity, confidence and transparency. In contrast, this thesis aims at developing alternative methods that provide assistance in building appropriate modeling techniques while providing the rationale for the selected models. In particular, we consider this important demand in intelligent assistance as a meta-analysis process, and we make progress towards addressing two challenges in AutoML research. First, to overcome the computational complexity problem, we studied a formulation of AutoML as a recommendation problem, and proposed a new conceptualization of a Meta-Learning (MtL)-based expert system capable of recommending optimal ML pipelines for a given task; Second, we investigated the automatic explainability aspect of the AutoML process to address the problem of the acceptance of, and the trust in such black-boxes support systems. To this end, we have designed and implemented a framework architecture that leverages ideas from MtL to learn the relationship between a new set of datasets meta-data and mining algorithms. This eventually enables recommending ML pipelines according to their potential impact on the analysis. To guide the development of our work, we chose to focus on the Industry 4.0 as a main field of application for all the constraints it offers.Finally, in this doctoral thesis, we focus on the user assistance in the algorithms selection and tuning step. We devise an architecture and build a tool, AMLBID, that provides users support with the aim of improving the analysis and decreasing the amount of time spent in algorithms selection and parametrization. It is a tool that for the first time does not aim at providing data analysis support only, but instead, it is oriented towards positively contributing to the trust-in such powerful support systems by automatically providing a set of explanation levels to inspect the provided results.L’industrie du futur introduit de nouveaux concepts, processus et pratiques conduisant à des mutations profondes dans le pilotage des systèmes d’information associés. Une des problématiques cruciales est l’utilisation de la quantité importante de données, notamment celles produites par les différents dispositifs d’acquisition de données (Cyber Physical Systems, etc.), pour en extraire de la connaissance destinée à la maîtrise des processus de l’entreprise à travers un système d’information évolutif, réactif et adapté aux spécificités de l’industrie 4.0. L’intelligence artificielle et plus particulièrement l’apprentissage automatique fournit les algorithmes, méthodes et outils permettant l’extraction de connaissances et de modèles à partir des données représentant l’activité d’une entreprise et son environnement, et l’apport de plus d’automatisation des processus sous-jacents. Cependant, de nombreuses entreprises ne disposent pas de moyens humains leur permettant de déployer efficacement des solutions d’apprentissage automatique. Cela s’explique notamment par le fait que la construction de telles solutions est un processus long et difficile qui nécessite une expertise hautement technique et intersectorielle et qui est une ressource limitée. Nous nous intéressons donc à ce besoin d’assistance à l’analyse de données, qui commence à recevoir une certaine attention des communautés scientifiques, donnant naissance au domaine dit d’apprentissage automatique automatisé. L’apprentissage automatique automatisé est devenu un domaine en plein essor qui vise à rendre l’application des méthodes d’apprentissage automatique aussi dépourvue d’intervention humaine que possible. A cet égard, les approches existantes se révèlent souvent similaires et peu abouties. Ces approches sont concentrées sur l’assistance de l’utilisateur en automatisant une partie ou l’ensemble du processus d’analyse de données, mais sans se soucier de son impact sur l’analyse. L’objectif a généralement été axé sur les facteurs de performance, laissant ainsi de côté d’autres aspects importants, voire cruciaux, tels que la complexité du calcul, la confiance et la transparence. Cette observation nous a amenés à orienter nos recherches vers le domaine du Meta-Apprentissage (MtL) et à développer des méthodes alternatives qui apportent une aide à la construction des techniques de modélisation appropriées tout en fournissant le rationnel des modèles ML sélectionnés. En particulier, nous considérons cette demande importante d’assistance intelligente comme un processus de méta-analyse, et nous progressons vers la résolution de deux défis de la recherche en AutoML. Dans un premier temps, pour palier au problème de la complexité du calcul, nous avons étudié une formulation de l’AutoML en tant que problème de recommandation, puis proposé une nouvelle conceptualisation d’un système expert basé sur le MtL capable de recommander des pipelines ML optimaux pour une tâche donnée. Dans un second temps, nous avons traité l’explicabilité du processus d’aide à la décision de l’AutoML pour prendre en compte la problématique de l’acceptation et la confiance en ces systèmes généralement vus comme des boîtes noires

    Leveraging the Automated Machine Learning for Arabic Opinion Mining: A Preliminary Study on AutoML Tools and Comparison to Human Performance

    No full text
    International audienceDespite the broad range of Machine Learning (ML) algorithms, there are no clear guidelines on how to identify the optimal algorithm and corresponding hyperparameters configurations given an Opinion Mining (OM) problem. In ML, this is known as the Algorithm Selection Problem (ASP). Although Automatic Algorithm Selection or AutoML has proven to be successful in many areas of ASP, it has hardly been explored in OM. This paper explores the benefits of using AutoML in this field. To this end, this work examines to what extent AutoML can be competitive against ad hoc methods (manually select and tune ML pipelines) on Arabic opinion mining modeled from a supervised learning perspective. We compare four state-of-the-art AutoML tools on 10 different popular datasets to human performance. Experimental results show that the AutoML technology can be considered as a powerful approach to support the ML algorithm selection problem in opinion mining

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Unlocking the Black Box: Towards Interactive Explainable Automated Machine Learning

    No full text
    International audienceAutomated machine learning (AutoML) has transformed the process of selecting optimal machine learning (ML) models by autonomously searching for the most appropriate ones and fine-tuning associated hyperparameters. This eliminates the burdensome task of trial-and-error selection and parametrization of ML algorithms. Nonetheless, the lack of transparency and explainability poses a significant challenge when using AutoML, as it hampers user trust in the system’s recommendations. Consequently, users often allocate more resources to the search process, resulting in reduced efficiency of the AutoML systems. To address this challenge, we propose an interactive and explainable AutoML framework that enables users to understand the reasoning behind the recommendations and diagnose any limitations of the suggested models using various explainable AI methods. Additionally, our framework provides the possibility of automated performance refinement. To operationalize the framework, we introduce AMLExplainer, an XAI system for interactive and interpretable AutoML that visualizes and performs all stages of the proposed pipeline(s) within the widely used Bootstrap Dash environment

    Automated machine learning hyperparameters tuning through meta-guided Bayesian optimization

    No full text
    International audienceThe selection of one or more optimized Machine Learning (ML) algorithms and the configuration of significant hyperparameters are among the crucial but challenging tasks for the advanced data analytics using ML methodologies. However, it is one of the essential tasks in order to apply the ML-based solutions to deal with the real-world problems. In this regard, Bayesian Optimization (BO) is a popular method for optimizing black-box functions. But, yet it is deficient for large-scale problems because it fails to leverage the knowledge from historical applications. The major challenge in this aspect is due to the waste of function evaluations on bad design choices of ML hyperparameters. To address this issue, we propose to integrate Bayesian Optimization via Meta-Guidance. Consequently, Meta-Guided Bayesian Optimization provides means to use the knowledge from previous optimization cycles on similar tasks. This capability takes the form of pre-requisite to decide the specific parts of the input space to be evaluated next. In this regard, we intend to guide the BO with a functional ANOVA of configurations as suggested by a meta-learning process. In this paper, we demonstrate, with the help of a large collection of hyperparameters optimization benchmark, that the proposed Meta-Guided Optimization approach is about 3 times faster than the vanilla BO. Thence, it achieves a new state-of-the-art performance as proved by the experiments on 09 classification datasets
    corecore