1,720,971 research outputs found
Assessing the impact of automatic dependency annotation on the measurement of phraseological complexity in L2 Dutch
The extraction of phraseological units operationalized in phraseological complexity measures (Paquot, 2019) relies on automatic dependency annotations, yet the suitability of annotation tools for learner language is often overlooked. In the present article, two Dutch dependency parsers, Alpino (van Noord, 2006) and Frog (van den Bosch et al., 2007), are evaluated for their performance in automatically annotating three types of dependency relations (verb + direct object, adjectival modifier, and adverbial modifier relations) across three proficiency levels of L2 Dutch. These observations then serve as the basis for an investigation into the impact of automatic dependency annotation on phraseological sophistication measures. Results indicate that both learner proficiency and the type of dependency relation function as moderating factors in parser performance. Phraseological complexity measures computed on the basis of both automatic and manual dependency annotations demonstrate moderate to high correlations, reflecting a moderate to low impact of automatic annotation on subsequent analyses
Exploring Lexicogrammatical Features as Predictors of Writing Quality in Second Language Spanish: A Multivariate Analysis
This dissertation investigates the relationship between the use of lexicogrammatical features and L2 Spanish writing quality using a multivariate modeling approach. While a substantial body of research has demonstrated that lexical diversity and sophistication are strong predictors of L2 writing quality in English (Besten & Granger, 2014; Kim et al., 2018; Kyle & Crossley, 2016, inter alia), comparatively little is known about if and how these features can predict L2 Spanish writing quality in the form of human judgements. The current study addresses this gap by analyzing a corpus of 400 graded essays written by L2 Spanish learners and examining the predictive power of various indices of lexical richness, including lexical diversity, word frequency, bigram strength of association, and bigram dependencies, on human judgments of writing quality. To do so, the study compares multiple operationalizations of morphological normalization (e.g., raw forms, lemmas, and lemmas tagged with partial or full verbal information) to evaluate their impact on the calculation of lexical indices. The results show that operationalizations retaining more verb morphology tend to improve the predictive power of certain indices, especially for lexical diversity and bigram dependencies. Final models combining multiple features account for approximately 30–34% of the variance in human ratings of writing quality, indicating the value of a multivariate approach. Findings from this dissertation contribute to our understanding of productive lexical proficiency in L2 Spanish and highlight the importance of tailoring computational indices and morphological strategies to the typological characteristics of the target language. Implications are discussed for second language acquisition theory, writing assessment practices, and pedagogy.2026-08-0
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Measuring lexicogrammatical complexity and sophistication in second language English production: Development and validation of argument structure construction-based indices
Grounded in a usage-based constructionist framework, this study conceptualizes language as a network of entrenched form-meaning pairings (i.e., constructions) that shape individual grammars through repeated exposure. While traditional measures of syntactic complexity capture some facets of L2 proficiency, they seldom treat constructions as the focus of analysis. Indices derived from argument structure constructions (ASCs), which map syntactic arguments onto semantic roles and thus encode fundamental human experiences, provide a complementary perspective. Nevertheless, even though studies consistently show that learners’ ASC usage becomes more complex and sophisticated with increasing proficiency, scalable and systematic methods for extracting and analyzing ASC-based indices remain limited.
To address this gap, the present study introduces ASC analyzer, an open-source NLP tool that builds on a RoBERTa-based ASC tagger trained on a gold-standard treebank of L1 and L2 English. The analyzer automatically labels ASCs and computes a suite of ASC-based indices (i.e., diversity, proportion, frequency, and verb-construction strength of association) for large-scale corpus analyses.
Empirical validation in an L2 speaking-assessment task shows that these ASC-based indices yield nuanced insights into how learners at different oral-proficiency levels deploy constructions and verbs while completing the same task and possess solid predictive power for speaking scores. When combined with additional lexicogrammatical measures, they further boost the model’s explanatory power. A parallel study of L2 writing corroborates these findings: adding ASC-based indices not only outperforms traditional syntactic-complexity metrics in isolation but also enhances models that already include syntactic and lexicogrammatical predictors. The results demonstrate that ASC-based analysis offers a valuable contribution to multivariate frameworks that seek to capture the complex interplay between grammatical form and lexical choice in L2 production.2026-02-0
Automatic Analysis of Epistemic Stance-Taking in Academic English Writing: A Systemic Functional Approach
Existing linguistic textual measures that investigate features of academic writing often focus on lexis, syntax, and cohesion, despite writing skills being considered more complex and multifaceted (e.g., Sparks et al., 2014). For this reason, writing assessment researchers seek ways to measure and assess various textual features beyond the traditional ones, including discourse moves and steps (Cotos, 2014), source use (Burstein et al., 2018; Kyle, 2020), and essay argument structures (Fiacco et al., 2022). The present dissertation attempts to extend this research by proposing an automated analysis of rhetorical discourse features of epistemic stance-taking strategies.
Drawing on a theoretical framework of the engagement system from Appraisal Analysis (Martin & White, 2005), which originates from the Sydney School of the systemic functional discourse analysis tradition, the dissertation develops and evaluates a series of end-to-end machine learning models to conduct automated engagement resource analysis. The experiment in Study 1 indicated that the developed system can perform as well as (or even outperform) trained annotators’ intercoder agreement. Study 2 uses the natural language processing (NLP) systems to conduct the first large-scale analysis of engagement resources in university written assignments across genres and disciplines. The findings suggested that the registers of university writings are far more complex and nuanced than simple characterization by genres or disciplines.
Study 3 investigates whether the developed measures of rhetorical features of engagement can provide additional information above and beyond the traditional linguistic measures at the levels of lexis, syntax, and cohesion, for modeling professional ratings of essay qualities in a standardized second language proficiency assessment. The results indicate that the features of engagement (particularly the diversity of rhetorical strategies) can complement the existing measures in predicting essay quality.
The three studies together indicate that the proposed machine-learning approach is beneficial to scale up the analysis of rhetorical discourse features in academic writing for research and educational purposes. The dissertation concludes with a call for increased collaboration among discourse analysts, second language researchers, assessment researchers, and computational linguists to define essential textual features for writing assessments across contexts and automate the analysis of such constructs (Lu, 2021, Burstein et al., 2016).2025-07-2
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
