1,720,975 research outputs found

    Functional model-based curve clustering for discovering temporal patterns in chronological corpora

    No full text
    In many applications of textual analysis corpora are characterized by a temporal structure, i.e. they include texts which have a chronological order (e.g.: political discourses, institutional documents, articles published in newspapers, messages posted to a blog, etc.). The temporal trend of key topics or words is crucial to disclose distinctive features of texts and of the corpus as a whole. In the frame of bag-of-words approaches, the temporal course of a word is represented as a sequence of frequencies across time, i.e. corresponds to a specific row of a word-type x time-point contingency table. Such discrete data can be thought of as a discrete observation of a curve, that is as a functional observation. In chronological corpora data are typically sparse over time. Thus, there are many cells in the contingency table with small counts or zeros. These zeros are due to the large number of word-types (vocabulary entries) with a relatively low number of associated word-tokens—intrinsic feature of textual data commonly known as large p small n problem—as well as to the size of time-point subcorpora. In terms of number of documents and of their size in word-tokens, the richness of information and the regularity of the corresponding signal could be highly variable across time. Time series represented by frequencies of words pose some specific issues: high-dimensional data, individual (word) variability, irregular and peak-like curves. Identifying the temporal patterns of words as functional curves, and clustering these into consistent groups with words portraying a similar pattern of evolution, are the main objectives of this study. In this work we focus on methods for model-based curve clustering in presence of the specific issues above mentioned. Curve clustering has longly been studied using splines, however they are not appropriate when dealing with high-dimensional data and cannot be used to model irregular functions such as spot and peak-like curves. On the contrary, wavelet representation can accomodate a wider range of functional shapes and proves more flexible than splines. In our work we suitably fit a recent class of wavelet-based functional clustering mixed models to the setting of chronological corpora. We consider for inference both a frequentist framework (resorting to the EM-algorithm for maximum likelihood estimation provided by the recently developed R package curvclust) and a Bayesian version. A further interesting issue consists in disentangling lower-scale patterns from the higher-level ones in order to detect the importance of a possible ”regime” factor (e.g. the President’s term of office in a corpus of end-of-year presidential addresses, see) relatively to the temporal evolution of a chronological corpus. We show that investigation into wavelet coefficients domain turns out to be useful to inspect on different scales of the process. A number of graphical tools are proposed to deal with such multiscale situations. Procedures are tested using different text genres: political and institutional discourses (written texts for oral delivery), press (written newspaper articles), literary works (ancient and modern narrative texts)

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Il lavoro raccontato dai laureati: analisi lessico-testuale delle professioni

    No full text
    Nel contributo si propone un'analisi esplorativa delle dichiarazioni verbali, fornite da un campione di laureati dell'Università di Padova, sulla professione svolta sei mesi dopo la laurea. Lo studio, condotto con metodi di analisi statistica lessico-testuale, intende confrontare le descrizioni fornite dai laureati con quelle previste dalle liste pre-codificate proposte dalle Facoltà e ricercare le specificità rispetto alle attività professionali svolte. Le indcazioni che emergono, seppure di carattere preliminare in quanto riferite alle prime esperienze occupazionali dopo la laurea e limitate a tre Facoltà (Agraria, Economia e Statistica), forniscono interessanti e utili indicazioni ai fini della rilevazione del contenuto dell'attività professionale

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Considerazioni per (non) concludere

    No full text
    In questo capitolo conclusivo l'autrice, responsabile scientifica della ricerca, discute i risultati dei contributi del volume collettaneo

    Shaping the history of words

    No full text
    In textual analysis, many corpora include texts in chronological order and in many cases this temporal connotation is crucial to understanding of their inner structure. In a typical bag-of-words approach, data are organized in contingency tables, the rows reporting the frequency of each word over time-points (shown in columns). These discrete data (temporal patterns for frequencies) may be viewed as continuous objects represented by functional relationships. This study aimed at identifying a specific sequential pattern for each word as a functional object and at grouping these word patterns in clusters. A model-based clustering procedure is proposed, with specific reference to a corpus of end-of-year messages delivered by the ten Presidents of the Italian Republic covering the period from 1949 to 2011

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado
    corecore