1,721,107 research outputs found
Cross-Domain Topic Classification for Political Texts
We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach,
an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled
target corpus from another domain. The ability to use existing training data makes this method significantly
more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic
models: the method can be more specifically targeted to a research question and the resulting topics are
easier to validate and interpret. We demonstrate the method using the case of labeled party platforms
(source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To
illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics
Divided government, delegation, and civil service reform
This paper sheds new light on the drivers of civil service reform in U.S. states. We first demonstrate theoretically that divided government is a key trigger of civil service reform, providing nuanced predictions for specific configurations of divided government. We then show empirical evidence for these predictions using data from the second half of the 20th century: states tended to introduce these reforms under divided government, and in particular when legislative chambers (rather than legislature and governor) were divided
RELATIO: Text Semantics Capture Political and Economic Narratives
ISSN:1047-1987ISSN:1476-4989ISSN:1476-498
More laws, more growth? Evidence from U.S. states
This paper analyzes the conditions under which more legislation contributes to
economic growth. In the context of U.S. states, we apply natural language processing tools to measure legislative flows for the years 1965-2012. We implement a
novel shift-share design for text data, where the instrument for legislation is leaveone-out legal-topic flows interacted with pre-treatment legal-topic shares. We find
that at the margin, higher legislative output causes more economic growth. Consistent with more complete laws reducing ex post hold-up, we find that the effect is
driven by the use of contingent clauses, is largest in sectors with high relationshipspecific investments, and is increasing with local economic uncertaint
Measuring discretion and delegation in legislative texts : methods and application to US states
Bureaucratic discretion and executive delegation are central topics in political economy and political science. The previous empirical literature has measured discretion and delegation by manually coding large bodies of legislation. Drawing from computational linguistics, we provide an automated procedure for
measuring discretion and delegation in legal texts to facilitate large-scale empirical analysis. The method uses information in syntactic parse trees to identify legally relevant provisions, as well as agents and delegated actions. We undertake two applications. First, we produce a measure of bureaucratic discretion by looking at the level of legislative detail for US states and find that this measure increases after reforms giving agencies more independence. This eect is consistent with an agency cost model, where a more independent bureaucracy requires more specific instructions (less discretion) to avoid bureaucratic drift. Second, we construct measures of delegation to governors in state legislation. Consistent with previous estimates using non-text metrics, we find that executive delegation increases under unified governmen
Elections and divisiveness : theory and evidence
This article provides a theoretical and empirical analysis of how politicians allocate their time across issues. When voters are uncertain about an incumbent’s preferences, there is a pervasive incentive to “posture” by spending too much time on divisive issues (which are more informative about a politician’s preferences) at the expense of time spent on common-values issues (which provide greater benefit to voters). Higher transparency over the politicians’ choices can exacerbate the distortions. These theoretical results motivate an empirical study of how Members of the US Congress allocate time across issues in their floor speeches. We find that US senators spend more time on divisive issues when they are up for election, consistent with electorally induced posturing. In addition, we find that US house members spend more time on divisive issues in response to higher news transparency
Evaluating Document Representations for Content-based Legal Literature Recommendations
Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincar'e), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincar'e citation embeddings. Combining fastText and Poincar'e in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at https://github.com/malteos/legal-document-similarity/
Evaluating Document Representations for Content-based Legal Literature Recommendations
Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincar'e), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincar'e citation embeddings. Combining fastText and Poincar'e in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at https://github.com/malteos/legal-document-similarity/
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
