1,721,178 research outputs found

    PtiClic et PtiClic-Kids : jeux avec les mots permettant une acquisition lexicale par le joueur et par la machine

    No full text
    This paper presents two lexical games, PtiClic and PtiClic-Kids, which are based on two lexical acquisition methods, namely Latent Semantic Analysis (LSA) and JeuxDeMots (JDM). We, first detail those two methods, which even if they both produce relations between terms, differ in several aspects : value, types, and directionality of the relations but also the way they are obtained. We present then the benefits of combining them in order to overcome their respective drawbacks. Secondly, we present how these games allow a lexical learning for the users and lexical relation acquisition for the computer. Finally, the overall architecture of the system and the obtained information are described as well as their benefit for research in NLP. Those games allow to gather data on age level for terms and may help to constitute such a lexicon, that would be very useful for applications like text generation.Cet article présente deux jeux lexicaux qui permettent à l’utilisateur d’acquérir ou de consolider des connaissances sur les mots, et à la machine de construire une ontologie généraliste. PtiClic et PtiClic-Kids se fondent sur deux méthodes d’acquisition lexicale, à savoir l’Analyse Sémantique Latente (LSA) et JeuxDeMots (JDM). Nous présenterons d’abord ces deux méthodes. Nous exposerons ensuite l’intérêt à combiner ces deux méthodes afin de combler les lacunes de chacune au travers de ces deux jeux. Enfin, nous détaillerons ces jeux, c’est-à-dire le public visé, les différences, etc. Nous expliquerons comment ils permettent une double acquisition : de vocabulaire par les utilisateurs et lexicale par la machine. Ceci a donc un intérêt à la fois en TICE et en TALN. Quant aux données recueillies, elles peuvent conduire à la constitution d’un lexique lié à l’âge d’acquisition des mots et des relations entre mots, dont de multiples applications, comme la correction ou la génération de textes, peuvent tirer profit.Zampa Virginie, Lafourcade Mathieu. PtiClic et PtiClic-Kids : jeux avec les mots permettant une acquisition lexicale par le joueur et par la machine. In: Sciences et Technologies de l'Information et de la Communication pour l'Éducation et la Formation, volume 18, 2011. TICE. pp. 135-156

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Endogenous consolidation of lexico-semantic networks

    No full text
    Développer des ressources lexico-sémantiques pour le Traitement Automatique des Langues Naturelles est un enjeu majeur du domaine. Ces ressources explicitant notamment des connaissances que seuls les humains possèdent, ont pour but de permettre aux applications de TALNune compréhension de texte assez fine et complète. De nouvelles approches populaires de construction de ces dernières impliquant l'externalisation ouverte (crowdsourcing) émergent en TALN. Elles ont confirmé leur efficacité et leur pertinence. Cependant, les ressources obtenues ne sont pas exemptes d'informations erronées ou de silences causés par l'absence de certaines relations sémantiques pertinentes et primordiales pour la bonne qualité. Dans ce travail de recherche, nous prenons comme exemple d'étude le réseau lexico-sémantique du projet JeuxDeMots et nous proposons un système de consolidation endogène pour ce type de réseaux.Ce système se base principalement sur l'enrichissement du réseau par l'inférence et l'annotation de nouvelles relations à partir de celles existantes, ainsi que l'extraction de règles d'inférence permettant de (re)générer une grande partie du réseau. Enfin, un langage dédié de manipulation du système de consolidation et du réseau lexico-sémantique est conçu et un premier prototype a été implémenté.Developing lexico-semantic resources is a major issue in the Natural Language Processing field.These resources, by making explicit inter alia some knowledge possessed only by humans, aim at providing the ability of a precise and complete text understanding to NLP tasks. Popular resources-building strategies involving crowdsourcing are flowering in NLP and are proved to be successful. However, the resulted resources are not free of errors and lack some important semantic relations. In this PhD thesis, we used the french lexico-semantic network from the project JeuxDeMots as a case-study. We designed an endogenous consolidation system for this type of networks based on inferring and annotating new semantic relations using the already existing ones, as well as extracting and proposing inference rules able to (re)generate a considerable part of the network. In addition, we conceived a domain specific language for manipulating the consolidation system along with the network itself and a prototype was implemented

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado

    Inference in lexico-semantic network built by crowdsourcing

    No full text
    Grâce à la démocratisation des nouvelles technologies de communications nous disposons d'une quantité croissante de ressources textuelles, faisant du Traitement Automatique du Langage Naturel (TALN) une discipline d'importance cruciale tant scientifiquement qu'industriellement. Aisément disponibles, ces données offrent des opportunités sans précédent et, de l'analyse d'opinion à la recherche d'information en passant par l’analyse sémantique de textes les applications sont nombreuses.On ne peut cependant aisément tirer parti de ces données textuelles dans leur état brut et, en vue de mener à bien de telles tâches il semble indispensable de posséder des ressources décrivant les connaissances sémantiques, notamment sous la forme de réseaux lexico-sémantiques comme par exemple celui du projet JeuxDeMots. La constitution et la maintenance de telles ressources restent cependant des opérations difficiles, de part leur grande taille mais aussi à cause des problèmes de polysémie et d’identification sémantique. De plus, leur utilisation peut se révéler délicate car une part significative de l'information nécessaire n'est pas directement accessible dans la ressource mais doit être inférée à partir des données du réseau lexico-sémantique.Nos travaux cherchent à démontrer que les réseaux lexico-sémantiques sont, de par leur nature connexionniste, bien plus qu'une collection de faits bruts et que des structures plus complexes telles que les chemins d’interprétation contiennent davantage d'informations et permettent d'accomplir de multiples opérations d'inférences. En particulier, nous montrerons comment utiliser une base de connaissance pour fournir des explications à des faits de haut niveau. Ces explications permettant a minima de valider et de mémoriser de nouvelles informations.Ce faisant, nous pouvons évaluer la couverture et la pertinence des données de la base ainsi que la consolider. De même, la recherche de chemins se révèle utile pour des problèmes de classification et de désambiguïsation, car ils sont autant de justifications des résultats calculés.Dans le cadre de la reconnaissance d'entité nommées, ils permettent aussi bien de typer les entités et de les désambiguïser (l'occurrence du terme Paris est-il une référence à la ville, et laquelle, ou à une starlette ?) en mettant en évidence la densité des connexions entre les entités ambiguës, leur contexte et leur type éventuel.Enfin nous proposons de tourner à notre avantage la taille importante du réseau JeuxDeMots pour enrichir la base de nouveaux faits à partir d'un grand nombre d'exemples comparables et par un processus d'abduction sur les types de relations sémantiques pouvant connecter deux termes donnés. Chaque inférence s’accompagne d’explications pouvant être validées ou invalidées offrant ainsi un processus d’apprentissage.Thanks to the democratization of new communication technologies, there is a growing quantity of textual resources, making Automatic Natural Language Processing (NLP) a discipline of crucial importance both scientifically and industrially. Easily available, these data offer unprecedented opportunities and, from opinion analysis to information research and semantic text analysis, there are many applications.However, this textual data cannot be easily exploited in its raw state and, in order to carry out such tasks, it seems essential to have resources describing semantic knowledge, particularly in the form of lexico-semantic networks such as that of the JeuxDeMots project. However, the constitution and maintenance of such resources remain difficult operations, due to their large size but also because of problems of polysemy and semantic identification. Moreover, their use can be tricky because a significant part of the necessary information is not directly accessible in the resource but must be inferred from the data of the lexico-semantic network.Our work seeks to demonstrate that lexico-semantic networks are, by their connexionic nature, much more than a collection of raw facts and that more complex structures such as interpretation paths contain more information and allow multiple inference operations to be performed. In particular, we will show how to use a knowledge base to provide explanations to high-level facts. These explanations allow at least to validate and memorize new information.In doing so, we can assess the coverage and relevance of the database data and consolidate it. Similarly, the search for paths is useful for classification and disambiguation problems, as they are justifications for the calculated results.In the context of the recognition of named entities, they also make it possible to type entities and disambiguate them (is the occurrence of the term Paris a reference to the city, and which one, or to a starlet?) by highlighting the density of connections between ambiguous entities, their context and their possible type.Finally, we propose to turn the large size of the JeuxDeMots network to our advantage to enrich the database with new facts from a large number of comparable examples and by an abduction process on the types of semantic relationships that can connect two given terms. Each inference is accompanied by explanations that can be validated or invalidated, thus providing a learning process
    corecore