1,721,109 research outputs found

    Discourse cues: Further evidence for the core contributor distinction

    Get PDF
    Moser and Moore (1995, to appear) carried out a corpus study of discourse cues in tutorial dialogue. Their annotation uses Relational Discourse Analysis (RDA), which distinguishes core elements (nuclei-like) from contributors (satellite-like). In their discussion of these results, Moser and Moore propose that clauses in the contributor-core order are harder to understand than clauses in core-contributor order, but do not attempt to explain why the "hard'' order is ever used. Here, we recruit evidence from work by Stevenson and her collaborators, which substantiates the empirical claim. We then suggest that by distinguishing information structure (given-new) from intentional structure (core-contributor), wecan explain why hard orders are surprisingly frequent. We note, however, that this cannot be the whole story, and show how the hierarchical RDA structure helps account for differences between discourse cues such as since, so, this means, and therefore

    Differentiating Document Type and Author Personality for Linguistic Features

    Full text link
    There are many ways to profile a collection of documents. This paper presents highlight from a body of work that has looked at individual differences in the language of personal weblogs. Firstly, we present a measure of linguistic contextuality that can be used to profile and rank genres. When applied to weblogs, we will show they are similar to school essays, yet significantly less contextual than e-mail. We then look at individual variation of language, as due to the personality of the author. We show that with just a few linguistic features, it is possible to explain significant proportions of variance within personality traits

    Differentiating Document Type and Author Personality for Linguistic Features

    Full text link
    There are many ways to profile a collection of documents. This paper presents highlight from a body of work that has looked at individual differences in the language of personal weblogs. Firstly, we present a measure of linguistic contextuality that can be used to profile and rank genres. When applied to weblogs, we will show they are similar to school essays, yet significantly less contextual than e-mail. We then look at individual variation of language, as due to the personality of the author. We show that with just a few linguistic features, it is possible to explain significant proportions of variance within personality traits

    Whose Thumb is It Anyway?: Classifying Author Personality from Weblog Text

    Full text link
    We report initial results on the relatively novel task of automatic classification of author personality. Using a corpus of personal weblogs, or 'blogs', we investigate the accuracy that can be achieved when classifying authors on four important personality traits. We explore both binary and multiple classification, using differing sets of n-gram features. Results are promising for all four traits examined

    Whose Thumb is It Anyway?: Classifying Author Personality from Weblog Text

    Full text link
    We report initial results on the relatively novel task of automatic classification of author personality. Using a corpus of personal weblogs, or 'blogs', we investigate the accuracy that can be achieved when classifying authors on four important personality traits. We explore both binary and multiple classification, using differing sets of n-gram features. Results are promising for all four traits examined

    The Language of Weblogs: A study of genre and individual differences

    Full text link
    Institute for Communicating and Collaborative SystemsThis thesis describes a linguistic investigation of individual differences in online personal diaries, or 'blogs.' There is substantial evidence of gender differences in language (Lakoff, 1975), and to a lesser extent linguistic projection of personality (Pennebaker & King, 1999). Recent work has investigated these latter differences in the area of computer-mediated communication (CMC), specifically e-mail (Gill, 2004). This thesis employs a number of analytic techniques, both top-down (dictionary-based) and bottom-up (data-driven), in order to explore personality and gender differences in the language of blogs. A corpus was constructed by asking authors to submit a month of text and complete a sociobiographic questionnaire. The corpus consists of over 400,000 words and five-factor personality data (Buchanan, 2001) for 71 subjects. The thesis begins by framing blogs in the context of other genres, both CMC and traditional, in order to show both the distinctiveness and representativeness of the genre. Top-down content analysis techniques are then employed to investigate the relationship between personality and linguistic features. A number of features correlate with each trait, but upon regression, very little variance is explained. Bottom-up techniques are more successful. The corpus was stratified into high, low and neutral personality groups to identify distinctive collocations for each. Returning to the raw personality scores, it becomes clear that even a small amount of n-gram context helps account for much more variance in personality. A measure of contextuality (Heylighen & Dewaele, 2002) shows that authors considered high in Agreeableness pay more attention to differences between their extra-linguistic context and that of their audience. Attention turns to gender, where similar methods are applied to investigate gender differences in language. Many previous findings are confirmed in the blog corpus. In addition, women are found to write more in their blogs than men. More generally, using the British National Corpus, it is shown that women are more contextual, except in the least contextual of genres (academic writing) where there is no difference. The study concludes by confirming that both gender and personality are projected by language in blogs; furthermore, approaches which take the context of language features into account can be used to detect more variation than those which do not

    An Event-Driven Distribution Model for Automatic Insertion of Illustrations in Narrative Discourse: A Study Based on the Shahnama Narrative

    Full text link
    Institute for Communicating and Collaborative SystemsBook designers and manuscript artists have inserted illustrations into narrative works for centuries now. This practice is an intelligent behaviour that requires specialised knowledge of the text and the external parameters affecting the selection and placement criteria. This thesis offers a model for automation of illustration insertion into a narrative discourse. The model presented here is a significant improvement to the crudest method of dividing the text into equal parts and inserting one illustration into each part. This study starts from the position that narratives are expressions of mental representations of a sequence of events in various modes of discourse. Here, this mental representation is referred to as ‘the story’. When coupled with a mode of discourse, the story becomes a narrative. Thus, a story can be expressed as oral, written, pictorial, or film narratives. If they all express the same sequence of events, they are telling the same story. In an illustrated narrative, while the written discourse expresses the event sequence in the form of sentences, illustrations depict them using pictorial elements. The insertion of illustration into written narrative is analogous to collating two texts into one, based on their event content. In this process, sentential representation of events are collated against the pictorial expressions of the same events. Thus, for the purposes of automation, this study claims that an investigation into the locations of events can lead to potential locations for illustration insertions. However, the list of potential illustration locations can be improved further through eliminating the events that are not depictable. This model is also able to further improve on the insertion policy by incorporating event constraints as parameters for event priorities. If a set of event types is given preference in the illustration policy, the model is able to prioritise the list accordingly. Furthermore, the model is able to allow the samedegree of customisation for preferred characters, locations, or time in the story. The prioritisation can be applied to the entire narrative, or smaller chunks of the narrative text such as chapters or sections. The model is developed via the study of the verb roots of sentences – denoting the event types – in the discourse of Mohl’s critical edition of the Shāhnāma, the Persian epic composed by Abu al Qāsium Firdausī in 400/1010. A collection of 109 illustrated manuscripts of the Shāhnāma was considered in this study. These manuscripts come from various traditions of Persian paintings and cover a long period from the early 14th century to the late 19th century. A population of nearly 6,000 Shāhnāma illustrations were annotated. Each illustration is linked to a sentence in the narrative. The bottom-up approach to the study of verb distribution in the written discourse against the illustration location distribution indicates that illustration distribution follows the same trend as that of the depictable event distribution in the discourse. Particular event tokens displayed a high rate of illustration rendering them as all time favourite events. In summary, this study claims that investigation into the distribution of events in a narrative discourse provides a model for the insertion of illustrations into a narrative work

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore