1,720,969 research outputs found

    Meta Transfer Learning for Early Success Prediction in MOOCs

    No full text
    Despite the increasing popularity of massive open online courses (MOOCs), many suffer from high dropout and low success rates. Early prediction of student success for targeted intervention is therefore essential to ensure no student is left behind in a course. There exists a large body of research in success prediction for MOOCs, focusing mainly on training models from scratch for individual courses. This setting is impractical in early success prediction as the performance of a student is only known at the end of the course. In this paper, we aim to create early success prediction models that can be transferred between MOOCs from different domains and topics. To do so, we present three novel strategies for transfer: 1) pre-training a model on a large set of diverse courses, 2) leveraging the pre-trained model by including meta information about courses, and 3) fine-tuning the model on previous course iterations. Our experiments on 26 MOOCs with over 145,000 combined enrollments and millions of interactions show that models combining interaction data and course information have comparable or better performance than models which have access to previous iterations of the course. With these models, we aim to effectively enable educators to warm-start their predictions for new and ongoing courses

    How Close are Predictive Models to Teachers in Detecting Learners at Risk?

    No full text
    Detecting learners in need of support is a complex process for both teachers and machines.Most prior work has devised visualization tools that allow teachers to do so by analyzing educational indicators.Other recent efforts have been devoted to models that predict whether learners might be at risk.However, the question on how teacher-like is the model behaving under this detection task still remains unanswered.In this paper, we investigate the (dis)agreement between teachers and model decisions, using a real-world flipped course as a case study.From the model perspective, we considered a well-known neural network, trained on educational indicators extracted from online pre-class logs.To gather teachers’ understanding, we employed a crowd sourcing approach including over 360 human intelligence tasks from 60 university teachers.We asked each recruited teacher to analyze visualizations pertaining to four relevant educational indicators of a given learner, and reason about their probability of failing the course (and so requiring support).Learners presented to teachers were selected to address different aspects of model confidence and (in)accuracy.Our results show that teacher and model predictions diverged for students who passed the course, while predictions were similar for students who failed the course.Moreover, confidence and correctness were more aligned in teachers than the model, reducing the unknown risks originally present in models.The source code is available at https://github.com/epfl-ml4ed/unknown-unknowns

    L2D 2021: First International Workshop on Enabling Data-Driven Decisions from Learning on the Web

    No full text
    By offering courses and resources, learning platforms on the Web have been attracting lots of participants, and the interactions with these systems have generated a vast amount of learning-related data. Their collection, processing and analysis have promoted a significant growth of learning analytics and have opened up new opportunities for supporting and assessing educational experiences. To provide all the stakeholders involved in the educational process with a timely guidance, being able to understand student's behavior and enable models which provide data-driven decisions pertaining to the learning domain is a primary property of online platforms, aiming at maximizing learning outcomes. In this workshop, we focus on collecting new contributions in this emerging area and on providing a common ground for researchers and practitioners (Website: https://mirkomarras.github.io/l2d-wsdm2021)

    Evaluating the Explainers: Black-Box Explainable Machine Learning for Student Success Prediction in MOOCs

    No full text
    Neural networks are ubiquitous in applied machine learning for education. Their pervasive success in predictive performance comes alongside a severe weakness, the lack of explainability of their decisions, especially relevant in human-centric fields. We implement five state-of-the-art methodologies for explaining black-box machine learning models (LIME, PermutationSHAP, KernelSHAP, DiCE, CEM) and examine the strengths of each approach on the downstream task of student performance prediction for five massive open online courses. Our experiments demonstrate that the families of explainers do not agree with each other on feature importance for the same Bidirectional LSTM models with the same representative set of students. We use Principal Component Analysis, Jensen-Shannon distance, and Spearman’s rank-order correlation to quantitatively cross-examine explanations across methods and courses. Furthermore, we validate explainer performance across curriculum-based prerequisite relationships. Our results come to the concerning conclusion that the choice of explainer is an important decision and is in fact paramount to the interpretation of the predictive results, even more so than the course the model is trained on. Source code and models are released at http://github.com/epfl-ml4ed/evaluating-explainers

    Generalisable Methods for Early Prediction in Interactive Simulations for Education

    No full text
    Interactive simulations allow students to discover the underlying principles of a scientific phenomenon through their own exploration. Unfortunately, students often struggle to learn effectively in these environments. Classifying students’ interaction data in the simulations based on their expected performance has the potential to enable adaptive guidance and consequently improve students’ learning. Previous research in this field has mainly focused on a-posteriori analyses or investigations limited to one specific predictive model and simulation. In this paper, we investigate the quality and generalisability of models for an early prediction of conceptual understanding based on clickstream data of students across interactive simulations. We first measure the students’ conceptual understanding through their in-task performance. Then, we suggest a novel type of features that, starting from clickstream data, encodes both the state of the simulation and the action performed by the student. We finally propose to feed these features into GRU-based models, with and without attention, for prediction. Experiments on two different simulations and with two different populations show that our proposed models outperform shallow learning baselines and better generalise to different learning environments and populations. The inclusion of attention into the model increases interpretability in terms of effective inquiry. The source code is available on Github

    Do Not Trust a Model because It is Confident: Uncovering and Characterizing Unknown Unknowns to Student Success Predictors in Online-Based Learning

    No full text
    Student success models might be prone to develop weak spots, i.e., examples hard to accurately classify due to insufficient representation during model creation. This weakness is one of the main factors undermining users' trust, since model predictions could for instance lead an instructor to not intervene on a student in need. In this paper, we unveil the need of detecting and characterizing unknown unknowns in student success prediction in order to better understand when models may fail. Unknown unknowns include the students for which the model is highly confident in its predictions, but is actually wrong. Therefore, we cannot solely rely on the model's confidence when evaluating the predictions quality. We first introduce a framework for the identification and characterization of unknown unknowns. We then assess its informativeness on log data collected from flipped courses and online courses using quantitative analyses and interviews with instructors. Our results show that unknown unknowns are a critical issue in this domain and that our framework can be applied to support their detection. The source code is available at https://github.com/epfl-ml4ed/unknown-unknowns

    Trusting the Explainers: Teacher Validation of Explainable Artificial Intelligence for Course Design

    No full text
    Deep learning models for learning analytics have become increasingly popular over the last few years; however, these approaches are still not widely adopted in real-world settings, likely due to a lack of trust and transparency. In this paper, we tackle this issue by implementing explainable AI methods for black-box neural networks. This work focuses on the context of online and blended learning and the use case of student success prediction models. We use a pairwise study design, enabling us to investigate controlled differences between pairs of courses. Our analyses cover five course pairs that differ in one educationally relevant aspect and two popular instance-based explainable AI methods (LIME and SHAP). We quantitatively compare the distances between the explanations across courses and methods. We then validate the explanations of LIME and SHAP with 26 semi-structured interviews of university-level educators regarding which features they believe contribute most to student success, which explanations they trust most, and how they could transform these insights into actionable course design decisions. Our results show that quantitatively, explainers significantly disagree with each other about what is important, and qualitatively, experts themselves do not agree on which explanations are most trustworthy. All code, extended results, and the interview protocol are provided at https://github.com/epfl-ml4ed/trusting-explainers

    Identifying and Comparing Multi-dimensional Student Profiles Across Flipped Classrooms

    No full text
    Flipped classroom (FC) courses, where students complete pre-class activities before attending interactive face-to-face sessions, are becoming increasingly popular. However, many students lack the skills, resources, or motivation to effectively engage in pre-class activities. Profiling students based on their pre-class behavior is therefore fundamental for teaching staff to make better-informed decisions on the course design and provide personalized feedback. Existing student profiling techniques have mainly focused on one specific aspect of learning behavior and have limited their analysis to one FC course. In this paper, we propose a multi-step clustering approach to model student profiles based on pre-class behavior in FC in a multi-dimensional manner, focusing on student effort, consistency, regularity, proactivity, control, and assessment. We first cluster students separately for each behavioral dimension. Then, we perform another level of clustering to obtain multi-dimensional profiles. Experiments on three different FC courses show that our approach can identify educationally-relevant profiles regardless of the course topic and structure. Moreover, we observe significant academic performance differences between the profiles

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore