1,721,010 research outputs found

    Detection and Aptness: A study in metaphor detection and aptness assessment through neural networks and distributional semantic spaces

    No full text
    Metaphor is one of the most prominent, and most studied, figures of speech. While it is considered an element of great interest in several branches of linguistics, such as semantics, pragmatics and stylistics, its automatic processing remains an open challenge. First of all, the semantic complexity of the concept of metaphor itself creates a range of theoretical complications. Secondly, the practical lack of large scale resources for machine learning approaches forces researchers to work under conditions of data scarcity. This compilation thesis provides a set of experiments to (i) automatically detect metaphors and (ii) assess a metaphor's aptness with respect to a given literal equivalent. The first task has already been tackled by a number of studies. We approach it as a way to assess the potentialities and limitations of our approach, before dealing with the second task. For metaphor detection we were able to use existing resources, while we created our own dataset to explore metaphor aptness assessment, which constitutes the most innovative part of this research. In all of the studies presented here, I have used a combination of word embeddings and neural networks. This combination appears particularly effective since pre-trained word embeddings can provide the networks with information necessary to deal with metaphors under conditions of data scarcity. To deal with metaphor aptness assessment, we frame the problem as a case of paraphrase identification. Given a sentence containing a metaphor, the task is to find the best literal paraphrase from a set of candidates. We build a dataset designed for this task, that allows a gradient scoring of various paraphrases with respect to a reference sentence, so that paraphrases are ordered according to their degree of aptness. Therefore, we can use it for both binary classification and ordering tasks. This dataset is annotated through crowd sourcing by an average of 20 annotators for each pair. We then design a deep neural network to be trained on this dataset. We show that its architecture is able achieve encouraging levels of performance, despite the serious limitations of data scarcity in which it is applied. In the final experiment of this compilation, more context is added to a sub-section of the dataset in order to study the effect of extended context on metaphor aptness rating. We show that extended context changes human perception of metaphor aptness and that this effect is reproduced by our neural classifier. The conclusion of the last study is that extended context compresses aptness scores towards the center of the scale, raising low ratings and decreasing high ratings given to paraphrase candidates outside of any context

    Deep-learning the Ropes: Modeling Idiomaticity with Neural Networks

    Full text link
    In this work we explore the possibility of training a neural network to classify and rank idiomatic expressions under constraints of data scarcity. We discuss our results comparing them both to other unsupervised models designed to perform idiom detection and to similar supervised classifiers trained to detect metaphoric bigrams

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Contextual Distribution for Textual Alignments

    No full text

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    L'alignement Automatique des Traductions d'Homère : le Péril de l'Incalculable

    No full text
    Le traitement automatique des langues en matière de traduction a une visée maximaliste : le principe est de fournir utilitairement une solution à toutes les apories de la langue. Tout traducteur automatique va d’abord considérer la langue comme un espace géométrique (permettant une évolution dans un espace-langue) nécessaire (chaque élément de langue faisant système). Le traducteur automatique génère une traduction en partant du principe que chaque mot dans la langue source et dans la langue cible a une place qui n’est que minoritairement arbitraire, ce qui peut être encore contrôlé par la prise en compte de la fréquence à laquelle cet arbitraire se reproduit. Ainsi, le traducteur automatique construit des modèles, identifiant la structure de la séquence source et la reproduisant en la transformant dans la séquence cible. La place des mots dans la géométrie d’une séquence ne doit donc pas être totalement arbitraire, et la réussite du traducteur est proportionnelle à la longueur du corpus qui lui permet de créer et modifier ses modèles. Notre programme d’alignement automatique de traductions part du même principe. Nous disposons d’un corpus de 207 traductions françaises de l’Odyssée d’Homère, et nous souhaitons les comparer automatiquement. Le corpus est constitué de traductions rééditées au moins une fois, depuis le XVIème siècle jusqu’au XXème siècle. Entendons par aligneur de traduction un programme permettant de découper chacun des textes en séries de séquences logiques identifiables, d’associer chacune des séquences d’un texte source à un texte cible et de donner à l’intérieur de ces séquences alignées à chaque mot ou expression source son équivalent dans la séquence cible. A la différence du traducteur automatique, puisque nous ne sommes pas tenus de produire une traduction, mais que nous nous contentons d’aligner des traductions existantes les unes aux autres, nous proposons à l’utilisateur de déterminer le degré de tolérance aux séquences qui resteront non alignées, et donc considérées comme inassociables au texte source. Enfin, pour procéder à l’alignement de toutes les traductions entre elles (en attribuant des identifiants à chacune de leurs séquences en fonction du texte source), nous devons aussi envisager le texte source grec de façon maximaliste : nous n’excluons aucun fragment qui aurait pu être potentiellement traduit dans une des traductions. Si le fragment ajouté est apocryphe, cela est visible dans l’interface, puisque seuls quelques traducteurs l’auront traduit. A l’inverse, si une traduction ne traduit pas un des passages du texte source, cela est aussi visible (le texte cible est signalé comme lacunaire). Le mérite de cette approche est que l’intraduisible devient une affaire de choix du lecteur : selon que l’utilisateur du programme concède ou non au programme la possibilité de laisser des séquences non alignables, la traduction peut être lacunaire ou maximaliste. Dans l’un ou l’autre des cas, le choix est moins celui de l’intraduisible que de l’incalculable. Notre algorithme de calcul d’alignement de séquences n’implique pas, ou à très faible degré, de reconnaissance ou d’identification sémantique. Nous ne déterminons pas l’association du texte source au texte cible en fonction du sens des mots, mais en fonction de leur dissémination géométrique, en prenant le texte comme une structure complexe globale. Cette approche déplace donc le problème fréquemment rencontré de l’intraduisible : il n’y a pas tant d’intraduisible d’une langue à une autre, comme aporie du sens à tous ses niveaux d’une langue à une autre, mais de l’incalculable d’un système linguistique, comme élément géométriquement déterminé, à un autre. Nous mettrons ici en parallèle la démarche de l’homme-traducteur et celle de la machine-traductrice face au texte homérique, puis nous verrons que les apories rencontrées sont empiriquement d’une seule nature, l’incompatibilité d’un schéma langagier à un autre, sémantique ou statistique. Enfin nous verrons que l’approche traditionnelle de la traduction dans le traitement automatique des langues ne saurait se dispenser à long terme d’une approche philologique et sémantique de la langue

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Detection and Aptness: A study in metaphor detection and aptness assessment through neural networks and distributional semantic spaces

    No full text
    Metaphor is one of the most prominent, and most studied, figures of speech. While it is considered an element of great interest in several branches of linguistics, such as semantics, pragmatics and stylistics, its automatic processing remains an open challenge. First of all, the semantic complexity of the concept of metaphor itself creates a range of theoretical complications. Secondly, the practical lack of large scale resources for machine learning approaches forces researchers to work under conditions of data scarcity. This compilation thesis provides a set of experiments to (i) automatically detect metaphors and (ii) assess a metaphor's aptness with respect to a given literal equivalent. The first task has already been tackled by a number of studies. We approach it as a way to assess the potentialities and limitations of our approach, before dealing with the second task. For metaphor detection we were able to use existing resources, while we created our own dataset to explore metaphor aptness assessment, which constitutes the most innovative part of this research. In all of the studies presented here, I have used a combination of word embeddings and neural networks. This combination appears particularly effective since pre-trained word embeddings can provide the networks with information necessary to deal with metaphors under conditions of data scarcity. To deal with metaphor aptness assessment, we frame the problem as a case of paraphrase identification. Given a sentence containing a metaphor, the task is to find the best literal paraphrase from a set of candidates. We build a dataset designed for this task, that allows a gradient scoring of various paraphrases with respect to a reference sentence, so that paraphrases are ordered according to their degree of aptness. Therefore, we can use it for both binary classification and ordering tasks. This dataset is annotated through crowd sourcing by an average of 20 annotators for each pair. We then design a deep neural network to be trained on this dataset. We show that its architecture is able achieve encouraging levels of performance, despite the serious limitations of data scarcity in which it is applied. In the final experiment of this compilation, more context is added to a sub-section of the dataset in order to study the effect of extended context on metaphor aptness rating. We show that extended context changes human perception of metaphor aptness and that this effect is reproduced by our neural classifier. The conclusion of the last study is that extended context compresses aptness scores towards the center of the scale, raising low ratings and decreasing high ratings given to paraphrase candidates outside of any context

    Textual Alignment and Semantic Analysis of the Homeric Poems and selected Italian Translations between the XVIII and the XXI century

    No full text
    The aim of this work is both to build a program which automatically aligns the original Homeric poems with the Italian translations of them - literary and free translations included – produced over a span of time that goes from the XVIII to the XXI century and to show what kind of analysis this alignments could allow. Through time, translations have changed trying at the same time to respect the text and to adapt to the aesthetic paradigms of the epoch they belonged to and of the translator himself. After a brief history of Italian translations of Homer, where I give a chronological account of the principal Italian translations of the Homeric poems between the XIV and the XXI centuries, I develop the two main parts of my work. In Part I, I explain the working principles of the textual aligner. After a summary of the state of the art in textual alignment in section 1.1 and an explanation of the reasons that drove me to chose proper names as anchor words (section 1.2), I proceed to give a detailed explanation of the program's mechanics in three sections. In section 2.1 I give an overview of the algorithm in its main steps; in section 2.2 I explain in detail how the text is segmented and how the anchor words are extracted and paired; in section 2.3 I summarize the principles of the Needlemann-Wunsch algorithm; in section 2.4 I explain the mechanisms of the post-processing phase, where the alignment results are refined and enhanced. Some examples of the behavior of the aligner on different kind of translations are given in section 2.5. Section 2.6 gives a very brief account on the performance of the aligner for translations in different European languages. Part II is devoted to the analysis of Italian translations of Homer. Sections 3.1 and 3.2 supply the state of the art and an explanation of the fundamental principles of distributional semantics. To analyze Italian translations, I chose a set of Ancient Greek terms and a set of their Italian translations and I studied the similarity of those terms both in the Ancient Greek and Italian texts. Section 4.1 presents the selected terms and explains how the Ancient Greek words were chosen. To find their most diffused Italian counterparts I used both manual inspection and a method of automatic extraction to which section 4.2 is dedicated. Chapter 5 shows the results of such analysis: section 5.1 discusses some quantitative aspects of Italian translations as the average period length or the semantic distance, and section 5.2 considers in detail the distributional similarities between the selected words in Ancient Greek and Italian texts. Finally, sections 5.3 and 5.4 examine some polysemy issues related to translation as the ways various multivocal words present in Homer were translated over time
    corecore