1,720,983 research outputs found

    Identifying specific textual units of documents taken from large corpora. Comparing methods.

    No full text
    Actes JADT'2006 en ligne, (8th international Conference on Textual Data statistical Analysis

    Chronological analysis of textual data and curve clustering: preliminary results based on wavelets

    No full text
    In textual analysis, many corpora include texts which have a chronological order. The temporal evolution of (key) words is relevant in order to highlight the distinctive features of the chronological corpus. In a typical bag-of-words approach data are organized in word-type x time-point contingency tables. Such discrete data can be thought of as continuous objects represented by functional relationships. The aims of this study are identifying a specific sequential pattern for each word as a functional object, and determining prototype patterns representing clusters of words portraying a similar evolution. We propose the application of a flexible waveletbased model for curve clustering to a corpus of end-of-year addresses delivered by the ten Presidents of Italian Republic in the period 1949-2011

    Can Correspondence Analysis Challenge Transformers in Authorship Attribution Tasks?

    Full text link
    With reference to a large corpus of 76 Italian contemporary popular mystery novels by 16 different authors, this study aims to assess the performance of large language models in an authorship attribution test. The results obtained through both transformers and correspondence analysis vector representations are compared and contrast in machine learning classification tasks. Although in previous works transformers have been shown to perform better than other alternatives, in this case, correspondence analysis wins the challenge. Results support the hypothesis that specialized large corpora require tailor-made representations

    Portraying the life cycle of ideas in social psychology through functional (textual) data analysis: a toolkit for digital history

    Full text link
    This paper presents a method for the digital history of a discipline (social psychology in this application) through the analysis of scientific publications. The titles of a comprehensive set of papers published in the Journal of Personality and Social Psychology (1965–2021) were collected, yielding a total of 10,222 items. The corpus thus constructed underwent several stages of preprocessing until the final conversion into a terms x time-points matrix, where terms are stemmed words and multi-words. After normalizing frequencies via a chi square-like transformation, clusters of words portraying similar temporal patterns were identified by functional (textual) data analysis and distance-based curve clustering. Among the best candidates in terms of the number of clusters, the solutions with six, nine and thirteen clusters (from lower to higher resolution) have been chosen and the nesting relationship demonstrated. They reveal—at different levels of granularity—increasing, decreasing, and stable keywords trends, highlighting methods, theories, and application domains that have become more popular in recent years, lost popularity, or have remained in common use. Moreover, this method allows to highlight historical issues (such as crises in the discipline or debates over the use of terms). The results highlight the core topics of social psychology in the past and today, underlying the crucial contribution of this method for the digital history of a discipline
    corecore