1,720,980 research outputs found

    Towards a more general model of interlinear text

    Full text link
    The interlinear glossed text (IGT) is a complex object, the complexity of its structure depending on factors such as origin, intended use, languages involved etc. Developing tools and workflows for integrated linguistic analysis environments calls for particular attention to those aspects which in many common cases can be disregarded as insignificant; thus, collaborating for ELAN–FLEx integration was particularly motivating for this paper. IGT is often conceived of as a tree: the root node corresponds to the whole text, subdivided into smaller units (sentences, words, morphemes). Each unit has a number of associated annotations, generally one per information type, like sentence translation, part-of-speech label, morpheme gloss. However, an IGT can easily amount to a large set of trees. Unresolved ambiguities of all kinds are one reason for it. Each pair of alternative analyses (e.g. two concurrent parses of a word) implies two distinct trees, identical except for the node in question and all its descendants. The more ambiguities arise, the more underlying trees should be posited. Still, all trees in such a tree family stem from a single analyzed object (transcript, original orthographic representation). Storing entire trees for each combination of relevant alternatives being utterly inefficient, a more compact storage model is needed. Turning to the media dimension, an accurate transcript of a spontaneous discourse is most often unsuitable for a grammatical analysis without some preprocessing (normalization) dealing with various speech errors, incomprehensible fragments etc. to produce a grammatically correct and coherent text for subsequent grammatical analysis – whereas the “raw” transcript feeds phonological and possibly discourse analysis. We thus get two distinct texts, interconnected but giving rise to independent (families of) analysis trees; only one of them is linked directly to the media timeline. In some scenarios, more than one media-based timeline emerge which need to be interlinked (cf. BOLD framework: sound annotations to sound events; retelling experiments, e.g. pear stories; sign languages translated from/into spoken languages). The reference axis may not be properly a timeline (text, path through a complex graphic image). One should mention further complicating factors such as multi-speaker and multi-lingual settings, collaboration and versioning. The overall structure (an XML sketch will be presented) might grow unreasonably complex for any specialized analysis component to handle. It may thus be efficient to use an intermediate repository, e.g. a unified underlying RDF representation [Nakhimovsky et al. 2012], to which all changes made in specific tools are merged. References Bow, Cathy, Baden Hughes and Steven Bird. 2003. Towards a General Model of Interlinear Text. Nakhimovsky, Alexander, Jeff Good, Tom Myers. 2012. Interoperability of Language Documentation Tools and Materials for Local Communities // Digital Humanities 2012

    INEL Kamas Corpus

    No full text
    Corpus Citation Gusev, Valentin; Klooster, Tiina. 2018. “INEL Kamas Corpus.” Version 0.1. Publication date 2018-12-31. https://hdl.handle.net/11022/0000-0007-CAE6-2. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). 2018. The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information status. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu, as well as the digitized fragment of the surviving copy of Kai Donner’s phonograph recording provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA). Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by KOTUS Archive, Helsinki. Scanned pages from [Joki 1944] containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society

    INEL Dolgan Corpus

    No full text
    Corpus Citation Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie. 2019. "INEL Dolgan Corpus." Version 1.0. Publication date 2019-08-31. https://hdl.handle.net/11022/0000-0007-CAE7-1. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Dolgan corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Dolgan language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Dolgan corpus is composed of texts from different sources: 1. Published folklore texts from an edited volume ("Fol'klor Dolgan", P.E. Efremov 2000), 2. Transcripts of recordings obtained from the Taymyr House of Folk Art (TDNT) in Dudinka (1970s-2000s), 3. Transcripts from the collection of Dr. Eugénie Stapert recorded on several fieldwork trips in 2007-2010, 4. Transcripts of recordings made on a fieldwork trip in 2017. The first group as well as parts of the third group were already transcribed and translated, the rest of the recordings was transcribed and translated within the INEL project. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information structure/information status. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities

    INEL Selkup Corpus

    No full text
    Corpus Citation Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2018. INEL Selkup Corpus. Version 0.1. Publication date 2018-12-31. Archived in Hamburger Zentrum für Sprachkorpora. https://hdl.handle.net/11022/0000-0007-CAE5-3. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). 2018. The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Selkup corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Selkup language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Selkup corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on Selkup in almost all regions where the Selkup people lived in 1962–1977. Most texts in the corpus originate from the handwritten part of the archive, the others come from sound recordings made by A.I. Kuzmina, transcribed and translated within the INEL project. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information status. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Sound materials of Angelina Kuzmina were transcribed and translated by native speakers of Selkup: Svetlana Nikitichna Sankevich (Kunina), oral transcription and Russian translation of texts in Northern dialects Evgeniya Sergeevna Smorgunova (Irikova), oral and written transcription and Russian translation of audio texts in Northern dialects Valentina Vladimirovna Tamel`kina, oral transcription and Russian translation of audio texts in Northern dialects The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy, Humboldt Research Fellow at IFUU, Hamburg Universit

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    INEL Kamas Corpus

    No full text
    Corpus Citation Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta. 2019. "INEL Kamas Corpus." Version 1.0. Publication date 2019-12-15. http://hdl.handle.net/11022/0000-0007-DA6E-9. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of syntactic functions, semantic roles, Russian borrowings and code-switching. Some texts also have annotations for information status. New in release 1.0 The totality of Klavdiya Plotnikova’s transcripts are now published, including all the tapes from the KOTUS archive, as well as the two recordings of Aleksandra Semyonova (21 more texts in total). All the texts are now annotated for syntactic functions and semantic roles. Numerous corrections in glosses and other annotations. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA). Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by the Institute for the Languages of Finland archive, Helsinki (KOTUS). Scanned pages from the Kai Donners Kamassisches Wörterbuch (Joki 1944) containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society. The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy. Partner Organizations The INEL project benefited greatly from cooperation with our partner institutions: Institute of the World Culture, M.V. Lomonosov Moscow State University, Moscow Department of Languages of the Peoples of Siberia, Tomsk State Pedagodical University, Tomsk Institute of Philology, Siberian Branch of Russian Academy of Sciences, Novosibirsk Taymyr House of Folk Art, Dudinka Arctic State Institute Culture and Arts, Yakuts

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods
    corecore