1,721,104 research outputs found

    Les entités nommées pour le traitement automatique des langues

    No full text
    Le monde numérisé et connecté produit de grandes quantités de données. Analyser automatiquement le langage naturel est un enjeu majeur pour les applications de recherches sur le Web, de suivi d'actualités, de fouille, de veille, d'opinion, etc. Les recherches menées en extraction d'information ont montré l'importance de certaines unités, telles que les noms de personnes, de lieux et d’organisations, les dates ou les montants. Le traitement de ces éléments, les « entités nommées », a donné lieu au développement d'algorithmes et de ressources utilisées par les systèmes informatiques. Théorique et pratique, cet ouvrage propose des outils pour définir ces entités, les identifier, les lier à des bases de connaissance ou pour procéder à l’évaluation des systèmes.DHLA

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Text Classification for Subjective Phenomena on Disaggregated Data and Rater Behaviour

    No full text
    Phenomena such as emotional experience and offensive language perception are highly subjective in nature. Yet, the dominant approach in building automatic emotion and hate speech detection systems is based on the opinion of the majority. Recently, however, a personalised or human-centred approach has been proposed by the computational social scientists. In the current paper, we propose a novel method for modelling individual perspective in emotion detection and abusive language recognition, following existing works in this area (Miłkowski et al., 2021). We show that the personalised approach that implements our Personalisation Metric (PM) outperforms traditional majority-based methods in regard to subjective phenomena such as emotion and abusive language detection. Proposed method could be successfully used in the development of more accurate classification models suitable for the opinions of individuals as well as in recommendation system

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado

    Quality of Research Articles and Neural Language Models : Applications to the Biomedical Domain

    No full text
    La qualité des articles de recherche dans le domaine biomédical est importante, elle permet par exemple d'assurer une prise de décision clinique correcte par les médecins. Cependant, l'augmentation du nombre d'articles publiés chaque année rend l'évaluation de cette qualité par des experts difficile. Ainsi, l'utilisation de méthodes de traitement automatique des langues (TAL) peut s'avérer utile pour les assister. Cette qualité peut également être un enjeu pour l'apprentissage des modèles utilisés en TAL pour les tâches du domaine biomédical. En effet, ces modèles sont souvent ajustés sur de larges corpus d'articles de recherche du domaine afin d'obtenir de meilleures performances pour les tâches spécifiques au domaine. Il est donc important de vérifier quel type de critères de qualité peut avoir un impact lors de l'adaptation de ces modèles. Ainsi, dans cette thèse, nous nous intéressons dans un premier temps à la détection automatique de problèmes de qualité dans les articles à l'aide de modèles neuronaux, puis dans un second temps à la sélection de données pour l'entraînement de ces modèles. Pour la détection de critères de qualité, nous nous penchons particulièrement sur les articles de recherche rapportant des essais cliniques. Nous tentons d'identifier des problèmes n'ayant pas été explorés auparavant ou tentons d'améliorer les méthodes employées. Ces problèmes sont la cohérence entre un article et le registre associé, ainsi que la complétude de l'article. Pour la cohérence des articles, nous affinons des encodeurs bidirectionnels (du domaine général et adaptés au domaine médical) sur des corpus spécifiques aux tâches considérées et produisons un système utilisant ces modèles. Nous développons ensuite une interface graphique pour aider les experts du domaine à accéder et visualiser nos méthodes. Ensuite, pour détecter la complétude, nous utilisons de larges modèles de langue autorégressifs (en testant des modèles pour le domaine général ou biomédical) en reformulant la tâche d'évaluation de critères de qualité en tant que tâche de question-réponse et en tirant parti des méthodes d'apprentissage en contexte. Enfin, nous sélectionnons des données dans un corpus d'articles de recherche biomédicale afin de préentraîner un modèle de langue de type encodeur bidirectionnel pour son adaptation au domaine biomédical, en utilisant un critère de confiance : l'impact des journaux.The quality of research articles in the biomedical domain is important. For example, it can ensure that clinicians make correct clinical decisions. However, the increasing number of articles published each year makes it difficult for experts to assess this quality. Natural language processing (NLP) methods may therefore prove helpful in assisting them. This quality may also be an issue when training the models used in NLP for biomedical tasks. Indeed, these models are often fine-tuned on large corpora of in-domain research articles to obtain better performance for domain-specific tasks. It is therefore important to verify which type of quality criteria can have an impact when fitting these models. Thus, in this thesis, we are interested firstly in the automatic detection of quality problems in articles using neural models, and secondly in data selection for training these models. For the detection of quality criteria, we are particularly interested in research articles reporting on clinical trials. We identify problems that have not been explored before or try to improve the methods employed. These include consistency between an article and the associated registry, as well as completeness of the article. For article consistency, we fine-tune bidirectional encoders (from the general domain and adapted to the medical domain) on task-specific corpora and produce a system using these models. We then develop a graphical web interface to help domain experts access and visualize our methods. Then, to detect completeness, we use large autoregressive language models (testing models for the general or biomedical domain) by reformulating the quality criteria evaluation task as a question-answering task and taking advantage of in-context learning methods. Finally, we select data from a corpus of biomedical research articles to pre-train a bidirectional encoder for biomedical domain adaptation, using a confidence criterion: journal impact

    koamabayili/VECTRON-author-checklist: VECTRON author checklist

    No full text
    We have done our best to complete the author checklist relating to the use of animals in the hut study. Note that the objective for the hut study was to evaluate the IRS treatment applications for residual efficacy against Anopheles mosquitoes, including the local An. coluzzii mosquito population. Cows were only used to attract mosquitoes into the huts and no tests were carried out directly on the cows. The author checklist is intended for use with studies where experiments are carried out on animals, which is why we have had such difficulty in completing this for the hut study, as many of the questions do not relate to how the cows were used
    corecore