1,720,959 research outputs found

    UXWN: analysis and improvement of a logical form resource for NLP

    No full text
    Logical Form is an exceptionally important linguistic representation for highly demanding semantically related tasks like Question Answering. In this work I present different types of LF and in particular I investigate those resources that provide a LF of the WordNet Glosses. I then take a closer look to one of them, eXtended WordNet, by analysing its weaknesses and strengths. After classifying the most common errors of this resource, I semi automatically correct them and the result is a new resource: United eXtended WordNet

    Deftor at SemEval-2016 Task 14: Taxonomy enrichment using definition vectors

    No full text
    In this paper we describe the participation of the Joint Research Centre, EC, and Ca' Foscari University in task 14 - Semantic Taxonomy Enrichment at SemEval 2016. The algorithm which we propose transforms each candidate definition into a term vector, where each dimension represents a term and its value is calculated by TF.IDF. We attach the candidate term as a hyponym to the WordNet synset with the most similar definition. The results we obtained are en- couraging, considering the simplicity of our approach. The obtained F measure is below the average, but above one of the baselines

    A Logical Form Parser for Correction and Consistency Checking of LF resources

    Full text link
    In this paper we present ongoing work for the correction of Extended WordNet (XWN), the most extended freely downloadable resource of Logical Forms (LFs) – by the Human Language Technology Research Institute (HLTRI) of University of Texas at Dallas (UTD). In a previous paper we reported on type and number of errors detected in the 140,000 entries of the resource, which amounted to some 30%. This didn’t include problems related to inconsistencies from disconnected variables which were not computable at the time. We now created an LF parser that parses each entry after appropriate transformations. The parser has been created to count the number of disconnected variables, be they object variables or predicate event variables: the result is 56% of LFs containing some disconnected variable. We devised two procedures for correction: one lexical and the other structural which eventually allowed a dramatic reduction: the final count is now 24%. Additional work has been carried out to improve the general consistency by manual intervention on "inconsistent" outputs signaled by the parser and has reduce the number of errors to a reasonable percentage for such a resource, that is less that 15%

    SenTube: A Corpus for Sentiment Analysis on YouTube Social Media

    No full text
    In this paper we present SenTube -- a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop classifiers for several important NLP tasks: (i) sentiment analysis, (ii) text categorization (relatedness of a comment to video and/or product), (iii) spam detection, and (iv) prediction of comment informativeness. The SenTube corpus favors the development of research on indexing and searching YouTube videos exploiting information derived from comments. The corpus will cover several languages: at the moment, we focus on English and Italian, with Spanish and Dutch parts scheduled for the later stages of the project. For all the languages, we collect videos for the same set of products, thus offering possibilities for multi- and cross-lingual experiments. The paper provides annotation guidelines, corpus statistics and annotator agreement details

    Identifying Citation Contexts: a Review of Strategies and Goals.

    Full text link
    The Citation Contexts of a cited entity can be seen as little tesserae that, fit together, can be exploited to follow the opinion of the scientific community towards that entity as well as to summarize its most important contents. This mosaic is an excellent resource of information also for identifying topic specific synonyms, indexing terms and citers’ motivations, i.e. the reasons why authors cite other works. Is a paper cited for comparison, as a source of data or just for additional info? What is the polarity of a citation? Different reasons for citing reveal also different weights of the citations and different impacts of the cited authors that go beyond the mere citation count metrics. Identifying the appropriate Citation Context is the first step toward a multitude of possible analysis and researches. So far, Citation Context have been defined in several ways in literature, related to different purposes, domains and applications. In this paper we present different dimensions of Citation Context investigated by researchers through the years in order to provide an introductory review of the topic to anyone approaching this subject.Possiamo pensare ai Contesti Citazionali come tante tessere che, unite, possono essere sfruttate per seguire l’opinione della comunità scientifica riguardo ad un determinato lavoro o per riassumerne i contenuti più importanti. Questo mosaico di informazioni può essere utilizzato per identificare sinonimi specifici e Index Terms nonchè per individuare i motivi degli autori dietro le citazioni. Identificare il Contesto Citazionale ottimale è il primo passo per numerose analisi e ricerche. Il Contesto Citazionale è stato definito in diversi modi in letteratura, in relazione a differenti scopi, domini e applicazioni. In questo paper presentiamo le principali dimensioni testuali di Contesto Citazionale investigate dai ricercatori nel corso degli anni

    Treebanks of Logical Forms: they are Useful Only if Consistent

    Full text link
    Logical Forms are an exceptionally important linguistic representation for highly demanding semantically related tasks like Question/ Answering and Text Understanding, but their automatic production at runtime is higly error-prone. The use of a tool like XWNet and other similar resources would be beneficial for all the NLP community, but not only. The problem is: Logical Forms are useful as long as they are consistent, otherwise they would be useless if not harmful. Like any other resource that aims at providing a meaning representation, LFs require a big effort in manual checking order to reduce the number of errors to the minimum acceptable – less than 1% - from any digital resource. As will be shown in detail in the paper, the available resources – XWNet, WN30-lfs, ILF - suffer from lack of a careful manual checking phase, and the number of errors is too high to make the resource usable as is. We classified mistakes by their syntactic or semantic type in order to facilitate a revision of the resource that we intend to do using regular expressions. We also commented extensively on semantic issues and on the best way to represent them in Logical Forms

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Investigating Facets to Characterise Citations for Scholars

    No full text
    Citations within academic literature keep gaining more importance both for the work of scholars and for improving digital libraries related tools and services. We present in this article the preliminary results of an investigation on the characterisations of citations whose objective is to propose a framework for globally enriching citations with explicit information about their nature, role and characteristics. This article focuses on the set of properties we are studying to support the automatic analysis of large corpora of citations. This model is grounded on a literature review also detailed here, and has been submitted to a group of several hundreds of scholars of all disciplines in the form of a survey. The results confirm that these properties are perceived as useful

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore