1,721,021 research outputs found

    Enhancing Fairness in Classification Tasks with Multiple Variables: A Data- and Model-Agnostic Approach

    No full text
    Nowadays assuring that search and recommendation systems are fair and do not apply discrimination among any kind of population has become of paramount importance. Those systems typically rely on machine learning algorithms that solve the classification task. Although the problem of fairness has been widely addressed in binary classification, unfortunately, the fairness of multi-class classification problem needs to be further investigated lacking well-established solutions. For the aforementioned reasons, in this paper, we present the Debiaser for Multiple Variables, a novel approach able to enhance fairness in both binary and multi-class classification problems. The proposed method is compared, under several conditions, with the well-established baseline. We evaluate our method on a heterogeneous data set and prove how it overcomes the established algorithms in the multi-classification setting, while maintaining good performances in binary classification. Finally, we present some limitations and future improvements

    Third workshop on social media for personalization and search (SoMePeAS 2019)

    No full text
    Social media platforms have become powerful tools to collect the preferences of the users and get to know them more. Indeed, in order to build profiles about what they like or dislike, a system does not only have to rely on explicitly given preferences (e.g., ratings) or on implicitly collected data (e.g., from the browsing sessions). In the middle, there lie opinions and preferences expressed through likes, textual comments, and posted content. Being able to exploit social media to mine user behavior and extract additional information leads to improvements in the accuracy of personalization and search technologies, and to better targeted services to the users. In this workshop, we aim to collect novel ideas in this field and to provide a common ground for researchers working in this area

    Challenges and Solutions to the Student Dropout Prediction Problem in Online Courses

    No full text
    Online courses and e-degrees, although present since the mid-1990, have received enormous attention only in the last decade. Moreover, the new Coronavirus disease (COVID-19) outbreak forced many nations (e.g. Italy, the US, and other countries) to massively push their education system towards an online environment. Academics now are also looking at the crisis as an opportunity for universities to adopt digital technologies for teaching more broadly. But they will have to understand what possible ways of evaluating and effectively teaching will be in this new scenario. The depicted overview, in conjunction with the utility and ubiquitous access to the educational platforms of online courses, entails a vast amount of enrolments. Nevertheless, a high enrolment rate usually translates into a significant dropout (or withdrawal) rate of students (40-80% of online students drop out). Student dropout prediction (SDP) consists of modelling and fore-casting student behaviour when interacting with e-learning platforms. It is a significant phenomenon that has repercussions on online institutions, the involved students and professors. Early approaches tended to perform manual analytic examinations to devise retention strategies. Recent research has adopted automated policies to thoroughly exploit the advantages of student activities(hereafter e-tivities) in the e-platforms and identify at-risk students. These approaches include machine learning and deep learning techniques to predict the student dropout status. Therefore, being able to cope with the trend shifting of student interactions with the course platforms in real-time has become of paramount importance. In this tutorial, we comprehensively overview the SDP problem in the literature. We provide mathematical formalisation to the different definitions proposed, and we introduce simple and complex predictive methods adhering to the following: Student dropout definition, Input modelling, Underlying machine and deep learning techniques, Evaluation measures, Datasets, and privacy concerns

    Twixonomy visualization interface: How to wander around user preferences

    Full text link
    User interfaces have become essential tools for a user to interact with a recommender system. In the two most common settings, the user interface either helps users to collect their preferences, or to provide an explanation of the generated recommendations. In this paper, we present the Twixonomy Visualization Interface, a tool that allows users both to explore their preferences and to discover new ones. Preferences are represented by a Wikipedia Category DAG connected with the initial (primitive) preferences implicitly or explicitly expressed by the user. Our tool can be considered as an integration of a recommender system since, by exploring the DAG, the user can both analyse the connections between his/her preferred items and other semantically related items or categories, and understand the motivations for new, serendipitous recommendations

    Graph-based selective outlier ensembles

    No full text
    An ensemble technique is characterized by the mechanism that generates the components and by the mechanism that combines them. A common way to achieve the consensus is to enable each component to equally participate in the aggregation process. A problem with this approach is that poor components are likely to negatively affect the quality of the consensus result. To address this issue, alternatives have been explored in the literature to build selective classifier and cluster ensembles, where only a subset of the components contributes to the computation of the consensus. Of the family of ensemble methods, outlier ensembles are the least studied. Only recently, the selection problem for outlier ensembles has been discussed. In this work we define a new graph-based class of ranking selection methods. A method in this class is characterized by two main steps: (1) Mapping the rankings onto a graph structure; and (2) Mining the resulting graph to identify a subset of rankings. We define a specific instance of the graph-based ranking selection class. Specifically, we map the problem of selecting ensemble components onto a mining problem in a graph. An extensive evaluation was conducted on a variety of heterogeneous data and methods. Our empirical results show that our approach outperforms state-of-the-art selective outlier ensemble techniques

    The forget-set identification problem

    Full text link
    Machine Unlearning (MU) is the problem of removing the influence of user's unwanted evidence from a trained machine-learning model. MU is typically formulated so that the input unwanted evidence corresponds to a subset of the training set utilized to train the model upstream, which is commonly referred to as the "forget set". However, this requirement is often difficult to satisfy in real-world scenarios, as users may be unaware of the peculiarities of the training set or simply they do not have access to it. In a more realistic setting, users provide their unwanted evidence in a form that is more abstract than or anyway different from a precise subset of training data. In such cases, executing MU methods requires an essential and challenging preliminary step, which, to the best of our knowledge, has never been addressed so far: identifying the forget set based on user's unwanted evidence. In this paper, we fill this important gap in the MU literature and introduce the Forget-Set Identification (ForSId) problem: given a trained machine-learning model, an "unwanted set" of samples (evidence to unlearn), and a "wanted set" of samples (evidence to retain), identify the forget set as a subset of the training set, such that the similarity in the predictions of the original model and the model retrained on the training data remaining after the removal of the forget set is: (i) low on the unwanted set, indicating that the unwanted samples have been effectively unlearned by the model, and (ii) high on the wanted set, to ensure that the model keeps its original performance on the data to be retained. We define ForSId as an optimization problem, prove its NP-hardness, and devise an algorithm based on a theoretical connection to Red-Blue Set Cover. Our ForSId is a novel complementary problem to MU. It serves as a foundational step to be performed before executing MU methods, allowing for extending the range of applicability of MU to all those settings where user's unlearning evidence does not correspond to (or is too hard to be directly expressed in terms of) a forget set. We conduct extensive experiments based on the exact unlearning task (which is the most reliable one) on several real-world datasets and settings, involving nontrivial baselines. Results demonstrate high performance of our proposed algorithm and clear superiority over the baselines

    Predicting disease genes for complex diseases using random watcher-walker

    No full text
    In this paper we propose an extended version of random walks, named Random Watcher-Walker (RW2), to predict disease-genes relations on the Human Interactome network. RW2 is able to learn rich representations of disease genes (or gene products) features by jointly considering functional and connectivity patterns surrounding proteins. Our method successfully compares with the best-known system for disease gene prediction and other state-of-the-art graph-based methods. We perform sensitivity analysis and apply perturbations to ensure robustness. Differently from previous studies, our results demonstrate that connectivity alone is not sufficient to classify disease-related genes
    corecore