Search CORE

1,721,166 research outputs found

Hierarchical Dirichlet scaling process

Author: Dongwoo Kim
Alice Oh
KIM DONGWOO
OH ALICE
Oh Alice
Publication venue
Publication date: 01/01/2017
Field of study

We present the hierarchical Dirichlet scaling process (HDSP), a Bayesian nonparametric mixed membership model. The HDSP generalizes the hierarchical Dirichlet process to model the correlation structure between metadata in the corpus and mixture components. We construct the HDSP based on the normalized gamma representation of the Dirichlet process, and this construction allows incorporating a scaling function that controls the membership probabilities of the mixture components. We develop two scaling methods to demonstrate that different modeling assumptions can be expressed in the HDSP. We also derive the corresponding approximate posterior inference algorithms using variational Bayes. Through experiments on datasets of newswire, medical journal articles, conference proceedings, and product reviews, we show that the HDSP results in a better predictive performance than labeled LDA, partially labeled LDA, and author topic model and a better negative review classification performance than the supervised topic model and SVM.11Nsciescopu

KAIST Institutional Repository

Crossref

Springer - Publisher Connector

포항공과대학교

A come-from-behind win or a blown-save loss: Perspectives in baseball

Author: Oh Alice Haeyun
Publication venue
Publication date: 2005
Field of study

KAIST Institutional Repository

Generalizing Weisfeiler-Lehman Kernels to Subgraphs

Author: Kim Dongkwan
Oh Alice Haeyun
Publication venue
Publication date: 24/04/2025
Field of study

KAIST Institutional Repository

Topical interest & degree of involvement of bilingual editors in Wikipedia

Author: Kim Sooyoung
Oh Alice Haeyun
Publication venue
Publication date: 2016
Field of study

Language reveals a lot of information about its speakers. Speakers of one language usually share common cultural habits or regional characteristics, and their similarities become more obvious within the context where there are multiple languages in use. We focus on studying bilingual users of Wikipedia, one of the largest multilingual user-generated content platforms. In Wikipedia, we can observe the patterns in the English edition, where users of multiple languages come together to express their thoughts and interests in the common language of English. To understand the specific topics edited by bilingual users, we analyze them in terms of revision counts, topics, and country names. We find that bilingual users are generally interested in more local topics, and their language is highly related with their topics. Also, we observe that the topical diversity decreases with the proportion of English edits, and more concentrates on topics related with countries and cultures

KAIST Institutional Repository

Time-Aware Representation Learning for Time-Sensitive Question Answering

Author: Son Jungbin
Oh Alice Haeyun
Publication venue
Publication date: 2023
Field of study

KAIST Institutional Repository

Pythonpad

Author: Alice Oh
Park Jungkook
Jeongmin Byun
Jungkook Park
Oh Alice Haeyun
Byun Jeongmin
Publication venue
Publication date: 05/03/2021
Field of study

We propose Pythonpad, an open-source JavaScript library that supports web-based Python programming exercises. Unlike other standalone web-based programming tools, Pythonpad can be easily integrated into other websites. Although it runs learners' Python code in client-side web browsers, Pythonpad supports a file system, building and importing external modules, and many essential built-in Python libraries to teach basic programming concepts in CS1 classes. © 2021 Owner/Author

KAIST Institutional Repository

Crossref

CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

Author: Seonwoo Yeon
Oh Alice Haeyun
Lee Changyoon
Publication venue
Publication date: 13/07/2022
Field of study

We introduce CS1QA, a dataset for code-based question answering in the programming education domain. CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python, and 17,698 unannotated chat data with code. Each question is accompanied with the student’s code, and the portion of the code relevant to answering the question. We carefully design the annotation process to construct CS1QA, and analyze the collected dataset in detail. The tasks for CS1QA are to predict the question type, the relevant code snippet given the question and the code and retrieving an answer from the annotated corpus.Results for the experiments on several baseline models are reported and thoroughly analyzed. The tasks for CS1QA challenge models to understand both the code and natural language. This unique dataset can be used as a benchmark for source code comprehension and question answering in the educational setting

KAIST Institutional Repository

DREsS: Dataset for Rubric-based Essay Scoring on EFL Writing

Author: 안소연
한지은
Oh Alice Haeyun
Yoo Haneul
Publication venue
Publication date: 27/07/2025
Field of study

KAIST Institutional Repository

Generating Baseball Summaries from Multiple Perspectives by Reordering Content

Author: Oh Alice Haeyun
Shrobe Howard
Publication venue
Publication date: 2008
Field of study

KAIST Institutional Repository

Conversation model fine-tuning for classifying client utterances in counseling dialogues

Author: Kim Donghyun
Oh Alice Haeyun
Park Sungjoon
Publication venue
Publication date: 2019
Field of study

The recent surge of text-based online counseling applications enables us to collect and analyze interactions between counselors and clients. A dataset of those interactions can be used to learn to automatically classify the client utterances into categories that help counselors in diagnosing client status and predicting counseling outcome. With proper anonymization, we collect counselor-client dialogues, define meaningful categories of client utterances with professional counselors, and develop a novel neural network model for classifying the client utterances. The central idea of our model, ConvMFiT, is a pre-trained conversation model which consists of a general language model built from an out-of-domain corpus and two role-specific language models built from unlabeled in-domain dialogues. The classification result shows that ConvMFiT outperforms state-of-the-art comparison models. Further, the attention weights in the learned model confirm that the model finds expected linguistic patterns for each category

KAIST Institutional Repository