1,721,042 research outputs found
Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs
Large scale knowledge graphs (KGs) such as Freebase are generally incomplete. Reasoning over multi-hop (mh) KG paths is thus an important capability that is needed for question answering or other NLP tasks that require knowledge about the world. mh-KG reasoning includes diverse scenarios, e.g., given a head entity and a relation path, predict the tail entity; or given two enti- ties connected by some relation paths, predict the unknown relation between them. We present ROPs, recurrent one-hop predictors, that predict entities at each step of mh-KB paths by using recurrent neural networks and vector representations of entities and relations, with two benefits: (i) modeling mh-paths of arbitrary lengths while updating the entity and relation representations by the training signal at each step; (ii) handling different types of mh-KG reasoning in a unified framework. Our models show state-of-the-art for two important multi-hop KG reasoning tasks: Knowledge Base Completion and Path Query Answerin
Author Profiling for Abuse Detection
The rapid growth of social media in recent years has fed into some highly undesirable phenomena such as proliferation of hateful and offensive language on the Internet. Previous research suggests that such abusive content tends to come from users who share a set of common stereotypes and form communities around them. The current state-of-the-art approaches to abuse detection are oblivious to user and community information and rely entirely on textual (i.e., lexical and semantic) cues. In this paper, we propose a novel approach to this problem that incorporates community-based profiling features of Twitter users. Experimenting with a dataset of 16k tweets, we show that our methods significantly outperform the current state of the art in abuse detection. Further, we conduct a qualitative analysis of model characteristics. We release our code, pre-trained models and all the resources used in the public domain
Author Profiling for Abuse Detection
The rapid growth of social media in recent years has fed into some highly undesirable phenomena such as proliferation of hateful and offensive language on the Internet. Previous research suggests that such abusive content tends to come from users who share a set of common stereotypes and form communities around them. The current state-of-the-art approaches to abuse detection are oblivious to user and community information and rely entirely on textual (i.e., lexical and semantic) cues. In this paper, we propose a novel approach to this problem that incorporates community-based profiling features of Twitter users. Experimenting with a dataset of 16k tweets, we show that our methods significantly outperform the current state of the art in abuse detection. Further, we conduct a qualitative analysis of model characteristics. We release our code, pre-trained models and all the resources used in the public domain
Efficient deep processing of japanese
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
It's Only Morpho-Logical: Modeling Agreement in Cross-Linguistic Dependency Parsing
Thesis (Master's)--University of Washington, 2012I propose a linguistically motivated set of features to model morphological agreement and add them to MSTParser, a graph-based dependency parser (McDonald et al., 2006). Compared to the parser's built-in morphological features, the new feature set is much smaller and more accurate. Results across 21 treebanks containing varying amounts of morphological annotation demonstrate increases in accuracy of up to 5.3% absolute. Experiments are performed to investigate exactly how the features enhance performance. While some of the improvement results from the feature set capturing information unrelated to morphology, there is still significant improvement, up to 4.6% absolute, due to the agreement model. This thesis includes background on morphological agreement and dependency parsing, details on MSTParser and modifications made to it, information about the treebanks collected and the steps taken to normalize them, and descriptions of experiments and results
Multi-predicate Constructions in Nuuchahnulth
Thesis (Ph.D.)--University of Washington, 2019This dissertation documents and models two types of multi-predicate constructions in Nuuchahnulth: serial verb constructions, and a construction involving the suffix -(q)ḥ, which is called the predicate linker. I define a serial verb construction (SVC) as any clause with two verbs present and no overt coordinating element. I document the circumstances under which this occurs, give its grammatical constraints, and classify SVCs in Nuuchahnulth into four syntactically distinct categories. I also examine the linker suffix and provide a grammatical description for it. Unlike SVCs, the linker coordinates two elements which serve as predicates in the syntax, a category which includes not just verbs, but common nouns and adjectives as well. I use the properties of the linker and SVCs to shed light on words that are category-ambiguous. Finally, this is all implemented inside of a DELPH-IN style computational grammar within the head-driven phrase structure grammar (HPSG) framework. My analyses are then tested against a set of speaker-vetted sentences illustrating the phenomena
An analysis of translation divergence patterns using PanLex translation pairs
Thesis (Master's)--University of Washington, 2012This analysis was performed to understand the patterns of translation divergences occurring in high and low frequency verbs, and to test the hypothesis that high frequency verbs are more prone to translation divergences than low frequency ones. Four types of divergences were considered: Thematic, Conflational, Categorial, and Structural (Dorr, 1990), with samples from three language pairs: Italian to French, Italian to English and English to Thai. The analysis is also an evaluation of the possibility of using the online multilingual dictionary PanLex (Baldwin et al., 2010) to automatically derive transfer rules, as part of a larger effort to create a machine translation system based on customizable language-specific grammars for both source and target languages, using semantic representations in the format of Minimal Recursion Semantics, or MRS, (Copestake et al. 2005) as the input and output of the transfer stage. Based on the samples analyzed, this evaluation suggests that manual transfer rules creation and tweaking of automatic rules would be most needed for high frequency verbs, while low frequency verbs seem likely to have a lower translation divergence error rate
Action Nominals in the Grammar Matrix
Thesis (Master's)--University of Washington, 2024This thesis describes the addition of a library for action nominal constructions (ANCs) to the LinGO Grammar Matrix customization system. Actions nominals are nominalized verbs which refer to an action or process and are often used cross-linguistically to mark clausal complements and adverbial clauses. They occupy an intermediate state between nouns and verbs, having the external distribution of a noun phrase, but often still retaining certain verbal properties. In this thesis, I build on the existing analysis for nominalized clauses in the Grammar Matrix, but shift away from an approach where the dual nominal and verbal characteristics of action nominals are explained based on what level in the tree nominalization occurs to one that relies primarily on lexical rules. This change is motivated by a desire to expand the typological range of nominalization patterns the Matrix can handle while also more closely reflecting the hybrid syntactic nature of action nominals. I present an HPSG analysis of action nominals and the implementation of that analysis within the Grammar Matrix. I develop the library using a combination of pseudo and illustrative languages (English [eng], Hixkaryana [hix], Russian [rus], Korean [kor]) and then test on small testsuites from five held-out languages (Wayana [way], Maltese [mit], Dutch [nld], Lango [laj], Finnish [fin]. The system achieved on average 95.2% coverage and 7.0% over-generation on the held-out data
- …
