1,721,015 research outputs found
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
ENABLING EFFECTIVE ARABIC INFORMATION RETRIEVAL ON THE WEB AND SOCIAL MEDIA
Arabic is one of the most dominant languages on the Web and social media. The huge and ever-growing Arabic user generated content, further motivated by the ongoing political unrest in the region, created an immense need for Information Retrieval (IR) systems to support users in consuming and analyzing Arabic content at such scale. In the past decade, tasks like ad hoc retrieval, event detection, document summarization, and fake news detection became of great importance to Arab users. However, research on developing IR systems for these tasks over Arabic content is severely lacking, as compared to higher-resource languages like English. This dissertation makes an argument that the main reason behind the slow progress in the development of Arabic IR systems is the lack of language resources. In particular, there is a severe shortage of standardized, large-scale, and representative test collections and annotated datasets, needed for system training and evaluation. The main goal of this dissertation is to motivate research on Arabic IR by providing necessary evaluation resources, baseline systems, and alternative approaches to training and evaluation of IR systems. To that end, two IR tasks were identified as important and underdeveloped for Arabic content, namely, ad hoc retrieval, and misinformation detection. Each task was investigated over two domains: the Web, and social media (Twitter in particular). For the ad hoc retrieval task, an approach for constructing test collections without the need for a shared-task evaluation campaign is proposed. As a result, two large-scale and manually annotated test collections were constructed starting from recent snapshots of each of the ArabicWeb and Arabic Twittersphere. Moreover, state-of-the-art retrieval models that were previously tested over English content, were benchmarked over the newtest collections, providing baseline performance for future systems. The constructed test collections were proved to include high quality annotations, motivating creation of similar test collections for other problems and domains, with relatively low cost. As for the misinformation detection problem, I focus on two components that are usually part of the claim verification pipeline followed to address this problem. In particular, this work tackles two problems: (1) claim check-worthiness identification, and (2) evidence retrieval for verification. Claim check-worthiness detection is the problem of identifying claims that should be prioritized for verification. Once a claim is identified to be verified, evidence retrieval involves searching for documents that contain information supporting or denying the claim. This thesis describes the process of creating the first Arabic annotated datasets for the two tasks. Furthermore, for claim check-worthiness detection, studied within the social media domain, I extensively study whether we can avoid creating a dedicated Arabic training dataset to train an effective system for the task. To achieve that, I consider cross-lingual transfer learning, where a supervised model trained on non-Arabic data is applied to an Arabic test set. The study demonstrated that cross-lingual transfer learning from some languages to Arabic is comparable to monolingual models exclusively trained on Arabic. For evidence retrieval, I study the suitability of relying on topical relevance as the main approach to evaluate the task in the Web domain. Moreover, I run an extended study on the effectiveness of Web search systems in retrieving documents containing evidenceas opposed to topically relevant documents to a claim. My study shows that pages (retrieved by a commercial search engine) that are topically-relevant to a claim are not always useful for verifying it. Given the aforementioned finding, I investigate and identify characteristics or features specific to evidential pages. Furthermore, preliminary experiments show that effectiveness of a supervised evidential pages retrieval model that employs them has a 5.3% increased recall of evidential pages over the search engine
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
QU at TREC-2014: Online Clustering with Temporal and Topical Expansion for Tweet Timeline Generation
In this work, we present our participation in the microblog track in TREC-2014, building upon our first participation last year. We present our approaches for the two tasks of this year: temporally-anchored ad-hoc search and tweet timeline generation. For the ad-hoc search task, we used topical expansion in addition to temporal models to perform retrieval. Our results show that our run based on the typical pseudo relevance feedback query expansion outperformed all of our other runs with a relatively high mean average precision (MAP). As for the timeline generation task, we approached this problem using online incremental clustering of tweets retrieved for a given query. Our approach allows the dynamic creation of "semantic" clusters while providing a framework for detecting redundant tweets and selecting representative ones to be added to the final timeline. The results demonstrate that using incremental clustering of tweets retrieved through a temporal retrieval model produced the best effectiveness among the submitted runs.This work was made possible by NPRP grant# NPRP 6-1377-1-257 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
