1,721,079 research outputs found
ENABLING EFFECTIVE ARABIC INFORMATION RETRIEVAL ON THE WEB AND SOCIAL MEDIA
Arabic is one of the most dominant languages on the Web and social media. The huge and ever-growing Arabic user generated content, further motivated by the ongoing political unrest in the region, created an immense need for Information Retrieval (IR) systems to support users in consuming and analyzing Arabic content at such scale. In the past decade, tasks like ad hoc retrieval, event detection, document summarization, and fake news detection became of great importance to Arab users. However, research on developing IR systems for these tasks over Arabic content is severely lacking, as compared to higher-resource languages like English. This dissertation makes an argument that the main reason behind the slow progress in the development of Arabic IR systems is the lack of language resources. In particular, there is a severe shortage of standardized, large-scale, and representative test collections and annotated datasets, needed for system training and evaluation. The main goal of this dissertation is to motivate research on Arabic IR by providing necessary evaluation resources, baseline systems, and alternative approaches to training and evaluation of IR systems. To that end, two IR tasks were identified as important and underdeveloped for Arabic content, namely, ad hoc retrieval, and misinformation detection. Each task was investigated over two domains: the Web, and social media (Twitter in particular). For the ad hoc retrieval task, an approach for constructing test collections without the need for a shared-task evaluation campaign is proposed. As a result, two large-scale and manually annotated test collections were constructed starting from recent snapshots of each of the ArabicWeb and Arabic Twittersphere. Moreover, state-of-the-art retrieval models that were previously tested over English content, were benchmarked over the newtest collections, providing baseline performance for future systems. The constructed test collections were proved to include high quality annotations, motivating creation of similar test collections for other problems and domains, with relatively low cost. As for the misinformation detection problem, I focus on two components that are usually part of the claim verification pipeline followed to address this problem. In particular, this work tackles two problems: (1) claim check-worthiness identification, and (2) evidence retrieval for verification. Claim check-worthiness detection is the problem of identifying claims that should be prioritized for verification. Once a claim is identified to be verified, evidence retrieval involves searching for documents that contain information supporting or denying the claim. This thesis describes the process of creating the first Arabic annotated datasets for the two tasks. Furthermore, for claim check-worthiness detection, studied within the social media domain, I extensively study whether we can avoid creating a dedicated Arabic training dataset to train an effective system for the task. To achieve that, I consider cross-lingual transfer learning, where a supervised model trained on non-Arabic data is applied to an Arabic test set. The study demonstrated that cross-lingual transfer learning from some languages to Arabic is comparable to monolingual models exclusively trained on Arabic. For evidence retrieval, I study the suitability of relying on topical relevance as the main approach to evaluate the task in the Web domain. Moreover, I run an extended study on the effectiveness of Web search systems in retrieving documents containing evidenceas opposed to topically relevant documents to a claim. My study shows that pages (retrieved by a commercial search engine) that are topically-relevant to a claim are not always useful for verifying it. Given the aforementioned finding, I investigate and identify characteristics or features specific to evidential pages. Furthermore, preliminary experiments show that effectiveness of a supervised evidential pages retrieval model that employs them has a 5.3% increased recall of evidential pages over the search engine
ARABIC QUESTION ANSWERING ON THE HOLY QUR'AN
In this dissertation,we address the need for an intelligent machine reading at scale (MRS) Question Answering (QA) system on the Holy Qur'an, given the permanent interest of inquisitors and knowledge seekers in this sacred and fertile knowledge resource. We adopt a pipelined Retriever-Reader architecture for our system to constitute (to the best of our knowledge) the first extractive MRS QA system on the Holy Qur'an. We also construct QRCD as the first extractive Qur'anic Reading Comprehension Dataset, composed of 1,337 question-passage-answer triplets for 1,093 question-passage pairs that comprise single-answer and multi-answer questions in modern standard Arabic (MSA). We then develop a sparse bag-of-words passage retriever over an index of Qur'anic passages expanded with Qur'an-related MSA resources to help in bridging the gap between questions posed in MSA and their answers in Qur'anic Classical Arabic (CA). Next, we introduce CLassical AraBERT (CL-AraBERT for short), a new AraBERT-based pre-trained model that is further pre-trained on about 1.05B-word Classical Arabic dataset (after being initially pre-trained on MSA datasets), to make it a better fit for NLP tasks on CA text such as the Holy Qur'an. We leverage cross-lingual transfer learning from MSA to CA, and fine-tune CL-AraBERT as a reader using a couple of MSA-based MRC datasets followed by fine-tuning it on our QRCD dataset, to bridge the above MSA-to-CA gap, and circumvent the lack of MRC datasets in CA. Finally, we integrate the retriever and reader components of the end-to-end QA system such that the top k retrieved answer-bearing passages to a given question are fed to the fine-tuned CL-AraBERT reader for answer extraction. We first evaluate the retriever and the reader components independently, before evaluating the end-to-end QA system using Partial Average Precision (pAP). We introduce pAP as an adapted version of the traditional rank-based Average Precision measure, which integrates partial matching in the evaluation over multi-answer and single-answer questions. Our experiments show that a passage retriever over a BM25 index of Qur'anic passages expanded with two MSA resources significantly outperformed a baseline retriever over an index of Qur'anic passages only. Moreover, we empirically show that the fine-tuned CL-AraBERT reader model significantly outperformed the similarly finetuned AraBERT model, which is the baseline. In general, the CL-AraBERT reader performed better on single-answer questions in comparison to multi-answer questions. Moreover, it has also outperformed the baseline over both types of questions. Furthermore, despite the integral contribution of fine-tuning with the MSA datasets in enhancing the performance of the readers, relying exclusively on those datasets (without MRC datasets in CA, e.g., QRCD) may not be sufficient for our reader models. This finding demonstrates the relatively high impact of the QRCD dataset (despite its modest size). As for the QA system, it consistently performed better on single-answer questions in comparison to multi-answer questions. However, our experiments provide enough evidence to suggest that a native BERT-based model architecture fine-tuned on the MRC task may not be intrinsically optimal for multi-answer questions
ON RELEVANCE FILTERING FOR REAL-TIME TWEET SUMMARIZATION
Real-time tweet summarization systems (RTS) require mechanisms for capturing relevant tweets, identifying novel tweets, and capturing timely tweets. In this thesis, we tackle the RTS problem with a main focus on the relevance filtering. We experimented with different traditional retrieval models.
Additionally, we propose two extensions to alleviate the sparsity and topic drift challenges that affect the relevance filtering. For the sparsity, we propose leveraging word embeddings in Vector Space model (VSM) term weighting to empower the system to use semantic similarity alongside the lexical matching. To mitigate the effect of topic drift, we exploit explicit relevance feedback to enhance profile representation to cope with its development in the stream over time.
We conducted extensive experiments over three standard English TREC test collections that were built specifically for RTS. Although the extensions do not generally exhibit better performance, they are comparable to the baselines used.
Moreover, we extended an event detection Arabic tweets test collection, called EveTAR, to support tasks that require novelty in the system's output. We collected novelty judgments using in-house annotators and used the collection to test our RTS system. We report preliminary results on EveTAR using different models of the RTS system.This work was made possible by NPRP grants # NPRP 7-1313-1-245 and # NPRP 7-1330-2-483 from the Qatar National Research Fund (a member of Qatar Foundation)
SparkIR: a Scalable Distributed Information Retrieval Engine over Spark
Search engines have to deal with a huge amount of data (e.g., billions of
documents in the case of the Web) and find scalable and efficient ways to produce
effective search results. In this thesis, we propose to use Spark framework, an in
memory distributed big data processing framework, and leverage its powerful
capabilities of handling large amount of data to build an efficient and scalable
experimental search engine over textual documents. The proposed system, SparkIR,
can serve as a research framework for conducting information retrieval (IR)
experiments. SparkIR supports two indexing schemes, document-based partitioning
and term-based partitioning, to adopt document-at-a-time (DAAT) and term-at-a-time
(TAAT) query evaluation methods. Moreover, it offers static and dynamic pruning to
improve the retrieval efficiency. For static pruning, it employs champion list and
tiering, while for dynamic pruning, it uses MaxScore top k retrieval. We evaluated the
performance of SparkIR using ClueWeb12-B13 collection that contains about 50M
English Web pages. Experiments over different subsets of the collection and
compared the Elasticsearch baseline show that SparkIR exhibits reasonable efficiency
and scalability performance overall for both indexing and retrieval. Implemented as
an open-source library over Spark, users of SparkIR can also benefit from other Spark
libraries (e.g., MLlib and GraphX), which, therefore, eliminates the need of usin
RETRIEVAL OF AUTHORITIES AND THEIR EVIDENCE FOR RUMOR VERIFICATION IN ARABIC SOCIAL MEDIA
Social media platforms have become a medium for rapidly spreading rumors along with emerging events. Those rumors may have a lasting effect on users' opinion even after it is debunked, and may continue to influence them if not replaced with convincing evidence. Journalists, or even normal users, who attempt to verify a rumor over social media, try to find a trusted source of evidence that can help them confirm or deny that specific rumor. A strong source of evidence for verifying a rumor is an authority who has the "real knowledge or power" to verify it if asked to. This dissertation contributes towards addressing the problem of rumor verification in social media. We propose augmenting the traditional rumor verification pipeline, which considers the propagation networks and the Web as sources of evidence, by incorporating authorities as another source of evidence. Specifically, in this dissertationwe introduce the problem of rumor verification using evidence from authorities which we believe can help fact-checkers and automated rumor verification systems to find the right authorities and evidence from their Twitter timelines, hence helping in the verification process. First, we propose authority finding in Twitter. We then suggest incorporating those retrieved authorities by detecting their stance towards rumors in Twitter, and retrieving evidence from their timeline tweets. Finally, we propose rumor verification using evidence retrieved from those authorities. To address the problem, we construct and release three datasets targeting the Arabic language namely 1) the first Authority FINding in Twitter (AuFIN) which comprises 150 rumors (expressed in tweets) associated with a total of 1,044 authority accounts and a user collection of 395,231 Twitter accounts (members of 1,192,284 unique Twitter lists), 2) the first Authority STance towards Rumors (AuSTR) which comprises 811 (rumor tweet, authority tweet) pairs relevant to 292 unique rumors, 3) the first Authority- Rumor-Evidence Dataset (AuRED) which comprises 160 rumors expressed in tweets and 692 Twitter timelines of authorities comprising about 34k annotated tweets in total. We propose a hybrid retrieval authority finding model that combines lexical and semantic signals in addition to user profiles and network features. Furthermore, we investigate the usefulness of existing Arabic datasets for stance towards claims for detecting the stance of authorities. Finally, we study the effectiveness of existing factchecking models for evidence retrieval from authorities and rumor verification using the retrieved evidence. Our experimental results suggest that Twitter lists and network features such as followers, and followees count, adopted previously for topic expert finding models, play a crucial role in authority finding; however, they are insufficient. This motivates the need to explore other features to differentiate experts from authorities. Moreover, our proposed hybrid model incorporating lexical, semantic, and user network features achieved a modest performance, 0.41 as precision at depth 1, which indicates that finding authorities is a challenging task, and that there is still room for continued enhancement. Our results also highlighted that adopting existing Arabic stance datasets for claim verification is somewhat useful but clearly insufficient for detecting the stance of authorities. Moreover, we found that AuSTR solely, despite the limited size, can be sufficient for detecting the stance of authorities achieving a performance of 0.84 macro-F1 and 0.78 F1 on debunking tweets. Our investigation on the effectiveness of existing fact-checking (claim verification using evidence from Wikipedia pages) models on our problem highlighted that although evidence retrieval for fact-checking models performrelativelywell on evidence retrieval from authorities, establishing strong baselines achieving 0.70 as recall at depth 5, there is still a big room for improvement. However, existing claim verification for fact-checking models perform poorly on rumor verification using evidence from authorities, 0.42 as macro-F1, no matter how good the retrieval performance is. Moreover, existing fact-checking datasets showed a potential in transfer learning to our problem, however, further investigation using different setups and datasets is required. Furthermore, drawing upon our experiments, we discuss failure factors and make recommendations for future research directions in addressing this problem. Additionally, our approach establishes a strong baseline for future studies targeting automatic rumor verification in social media, and our constructed datasets can facilitate further research on the problem. Finally, our proposed system can be integrated into verification systems, and can be also exploited by fact-checkers or journalists to find trusted sources of evidence
Real-time Tweet Summarization Mobile Application
With the emergence of the massive volume of content through social media platforms,
users are getting overwhelmed with information, though searching for the topic
will give you filtered information that interests you. Yet, if the user is subscribed
to multiple topics one of them might shadow that topic of interest. The project was
created to address this issue through offering a mobile application for users to define
their topics of interest.
The application named Real-time Twitter Summarization (RTS) offers a novel approach
where user gets to not only choose the topic, but to decide on frequency and relevancy
of the pushed tweets related to the topic. The application also provides a real time
summarization, and offers notification once topic related novel tweet was created.
These functionalities are solutions that were not provided in similar applications.
This project has not only been developed to be fully functional, but to also be usable
in the simplest format. Scalability of the tweet summarization Engine was tested , to
check if the application layer did cause delay or not.
It is important to mention that this work is an extension to the previous work of
Suwaileh, Reem and Hasanain, Maram [7] submitted as a participation of “Real-Time
Summarization Track” [3]
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
LOCATION MENTION PREDICTION FROM DISASTER TWEETS
While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract and resolve geolocation information. This dissertation focuses on the Location Mention Prediction (LMP) problem that consists of Location Mention Recognition (LMR) and Location Mention Disambiguation (LMD) tasks. Our work contributes to studying two main factors that influence the robustness of LMP systems: (i) the dataset used to train the model, and (ii) the learning model. As for the training dataset, we study the best training and evaluation strategies to exploit existing datasets and tools at the onset of disaster events. We emphasize that the size of training data matters and recommend considering the data domain, the disaster domain, and geographical proximity when training LMR models. We further construct the public IDRISI datasets, the largest to date English and first Arabic datasets for the LMP tasks. Rigorous analysis and experiments show that the IDRISI datasets are diverse, and domain and geographically generalizable, compared to existing datasets. As for the learning models, the LMP tasks are understudied in the disaster management domain. To address this, we reformulate the LMR and LMD modeling and evaluation to better suit the requirements of the response authorities. Moreover, we introduce competitive and state-of-the-art LMR and LMD models that are compared against a representative set of baselines for both Arabic and English languages
Building a Test Collection for Significant-Event Detection in Arabic Tweets
With the increasing popularity of microblogging services like Twitter, researchers discov-
ered a rich medium for tackling real-life problems like event detection. However, event
detection in Twitter is often obstructed by the lack of public evaluation mechanisms
such as test collections (set of tweets, labels, and queries to measure the eectiveness of
an information retrieval system). The problem is more evident when non-English lan-
guages, e.g., Arabic, are concerned. With the recent surge of signicant events in the
Arab world, news agencies and decision makers rely on Twitters microblogging service to
obtain recent information on events. In this thesis, we address the problem of building a
test collection of Arabic tweets (named EveTAR) for the task of event detection.
To build EveTAR, we rst adopted an adequate denition of an event, which is a
signicant occurrence that takes place at a certain time. An occurrence is signicant if
there are news articles about it. We collected Arabic tweets using Twitter's streaming
API. Then, we identied a set of events from the Arabic data collection using Wikipedias
current events portal. Corresponding tweets were extracted by querying the Arabic data
collection with a set of manually-constructed queries. To obtain relevance judgments for
those tweets, we leveraged CrowdFlower's crowdsourcing platform.
Over a period of 4 weeks, we crawled over 590M tweets, from which we identied 66
events that cover 8 dierent categories and gathered more than 134k relevance judgments.
Each event contains an average of 779 relevant tweets. Over all events, we got an average
Kappa of 0.6, which is a substantially acceptable value. EveTAR was used to evalu-
ate three state-of-the-art event detection algorithms. The best performing algorithms
achieved 0.60 in F1 measure and 0.80 in both precision and recall. We plan to make
our test collection available for research, including events description, manually-crafted
queries to extract potentially-relevant tweets, and all judgments per tweet. EveTAR is
the rst Arabic test collection built from scratch for the task of event detection. Addi-
tionally, we show in our experiments that it supports other tasks like ad-hoc search
- …
