Search CORE

1,721,079 research outputs found

Recommended from our members

AI-Generated Summaries for Course Selection

Author: Gagnon Kathryn Jane
Publication venue
Publication date: 2025
Field of study

Many university students use course evaluation guides to select courses. However, these guides do not present course feedback in a way conducive to the course selection process; many offer lists of written comments without providing tools for students to easily analyze these data. A simple improvement to such guides would be the inclusion of AI-generated summaries of the comments. This paper implements a summarization tool for the Harvard Course Evaluation Guide which efficiently summarizes feedback comments through few-shot prompting of ChatGPT with a focus on capturing the overall quality, instructor quality, and workload of each course. Using summaries generated in this way, ChatGPT is better able to rate the qualities of a course than random sampling, using summaries generated through zero-shot prompting, and using the verbatim first five feedback comments. A user study investigating the difference between course selection using feedback comments and using summaries of the comments generated by the summarization tool did not find statistically significant differences. However, summaries might potentially improve understanding of a course’s workload, and qualitative feedback suggested AI-generated summaries offer distinct advantages, especially in terms of cost. Therefore, AI-generated summaries cannot replace feedback comments, but tangibly improve the course selection process.Computer Scienc

Harvard University - DASH

Recommended from our members

Socrates Sim: A Dialog Simulation Framework to Support Task Completion Dialog Research

Author: Dalal Dhairya
Publication venue
Publication date: 2019
Field of study

In this thesis, we propose an end-to-end dialog simulation framework, called Socrates Sim, to support task completion dialog research. The goal of the framework is to provide a set of tools that will simulate conversations between a user simulator and a dialog agent in order evaluate the performance of the dialog agent and generate annotated data. Specifically, Socrates Sim framework allows the researcher to define the custom dialog domains, build user simulators, and run multiple simulations with a provided dialog agent. To demonstrate the flexibility of the framework to generalize to new domains, we will implement end-to-end simulations for the restaurant recom- mendation and move booking use case. The framework is implemented in Python 3.6 and made available on github (https://github.com/dhairyadalal/socrates).Software Engineerin

Harvard University - DASH

Recommended from our members

Analyzing Easy Data Augmentation Techniques for Text Classification

Author: Wong Carolyn
Publication venue
Publication date: 2021
Field of study

In natural language processing, text classification is the task of assigning a category to a given text example. Text classification has a variety of applications ranging from automated processing of customer reviews to spam detection. Current state-of-the-art approaches for text classification tasks use neural language models. These models are resource-intensive, requiring large amounts of labeled training data. However, training data may not always be available in large quantities, especially for low-resource languages, and labeled data is often laborious to obtain. Consequently, it is desirable to understand the factors contributing to text classification models' performance. I address several questions about which factors contribute to the high performance achieved by the current state-of-the-art neural models. To do so, I analyze traditional and neural methods for a diverse range of text classification tasks. I study various properties such as model assumptions and word vector representations to determine the effect of each of these features on text classification performance. On the best performing models from these understandings, I evaluate existing data augmentation techniques for text classification proposed by Wei and Zou (2019), which are methods that perform simple text editing operations to generate new training examples. However, such existing data augmentation techniques require external datasets or knowledge about the semantic properties of words. To this end, I propose and assess a novel length-based method that does not require external linguistic knowledge. This method replaces words with other words of similar length, as word length closely reflects the average information content and conceptual complexity of words in English (Piantadosi, Tily, and Gibson, 2011; Lewis and Frank, 2016). I demonstrate that this length-based technique adds consistent gains for several of the evaluated text classification tasks

Harvard University - DASH

Recommended from our members

Examining the Authenticity of Plato’s Epistle VII through Deep Learning

Author: Perry Jordan Bliss
Publication venue
Publication date: 2021
Field of study

Plato’s Epistle VII, a text in which the famous Athenian philosopher describes his political involvement in the affairs of 4th-century B.C.E Syracuse, has long been considered dubious by classical philologists. In particular, scholars have scrutinized two sections of the letter, in the first of which Plato gives political advice contrary to other claims made in his other works, and in the second of which Plato digresses from his political narrative to discuss a philosophical doctrine known as the Theory of Forms. Specifically, some scholars have raised the possibility of textual interpolation, whereby inauthentic passages might have been added to an otherwise authentic text. This paper sets out to apply computational methodology from deep learning to provide further insight on such a long-standing problem in Platonic scholarship. As such, I developed a bidirectional long-short-term memory (LSTM) recurrent neural network (RNN) with trainable word embeddings to classify units of roughly 100 words of Ancient Greek text as belonging to Plato or one of six other Ancient Greek prose authors. Given Ancient Greek’s rich morphology, special care was taken to formulate an optimal pre-processing approach: of four methods — plaintext, lemmatization, byte-pair encoding (BPE), and a lemmatization-BPE ensemble — the ensemble exhibited the highest test accuracy (89.28%), improving significantly upon a Naïve Bayes baseline model (70.93%). Applied to Epistle VII, this model reveals that the letter seems mostly authentic, except for two markedly more spurious sections, one of which corresponds nearly perfectly with the boundaries of the section consisting of political advice to the Sicilians. Such a result provides further support to the pre-existing claim that this section is an interpolation by a non-Platonic author within an otherwise Platonic text

Harvard University - DASH

Recommended from our members

Causal Mediation Analysis Reveals Syntactic Agreement Mechanisms in Neural Language Models

Author: Finlayson Matthew
Publication venue
Publication date: 2021
Field of study

Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts. Although this is well established, the mechanisms by which neural language models achieve syntactic agreement are still not well understood. As a remedy, this thesis applies causal mediation analysis to pre-trained neural language models to locate model components and discover mechanisms responsible for predicting correctly inflected verbs. In particular, we investigate the magnitude of models’ grammatical inflections preferences, as well as compare which neurons process subject-verb agreement across sentences with different syntactic structures. In our results, we uncover both similarities and differences across architectures and model sizes, and get a glimpse at the within-model mechanisms that produce number agreement. Notably, we learn that larger models do not necessarily learn stronger preferences, we observe two distinct mechanisms for producing subject- verb agreement depending on the syntactic structure of the input sentence, and we find that language models rely on similar sets of neurons when given sentences with similar syntactic structure

Harvard University - DASH

Rich Linguistic Structure from Large-Scale Web Data

Author: Elif Yamangil
Yamangil Elif
Publication venue
Publication date: 01/01/2013
Field of study

The past two decades have shown an unexpected effectiveness of Web-scale data in natural language processing. Even the simplest models, when paired with unprecedented amounts of unstructured and unlabeled Web data, have been shown to outperform sophisticated ones. It has been argued that the effectiveness of Web-scale data has undermined the necessity of sophisticated modeling or laborious data set curation. In this thesis, we argue for and illustrate an alternative view, that Web-scale data not only serves to improve the performance of simple models, but also can allow the use of qualitatively more sophisticated models that would not be deployable otherwise, leading to even further performance gains.Engineering and Applied Science

CiteSeerX

Harvard University - DASH

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Recommended from our members

Interactive AI to Support Human-Human Communication

Author: Huber Bernd
Publication venue
Publication date: 2020
Field of study

Such important bases of our society as healthcare, education, and productivity typically rely on effective communication between humans. Human-human communication in such settings is often challenging, as it requires advanced communication skills that are not available to everyone. This dissertation argues that systems that leverage models or data about communication can be used to ultimately improve communication. Through two main kinds of studies, this dissertation characterizes challenges when modeling communication from data, as well as when applying these approaches, and it formalizes the problem in such settings. The dissertation introduces systems to model spoken and written communication. It further defines recommendation systems that identify patterns in communication and provides suggestions to people on how to improve their communication. The dissertation also presents designs, implementations and evaluations of systems based on the communication models in the domains of productivity, social media conversations, healthcare, and video broadcasting. The results of experiments evaluating these mechanism show that, compared to current practice, communication models generate new insights, and our AI-human interfaces lead to improved outcomes. The main implication of this dissertation is that design of AI algorithms and user interfaces impact how people communicate with each other. Importantly, technology makes teaching communication skills more accessible, democratizing skills that were only available to experts.Engineering and Applied Sciences - Computer Scienc

Harvard University - DASH

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository