1,721,107 research outputs found

    Cross-Domain Topic Classification for Political Texts

    Full text link
    We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics

    Divided government, delegation, and civil service reform

    No full text
    This paper sheds new light on the drivers of civil service reform in U.S. states. We first demonstrate theoretically that divided government is a key trigger of civil service reform, providing nuanced predictions for specific configurations of divided government. We then show empirical evidence for these predictions using data from the second half of the 20th century: states tended to introduce these reforms under divided government, and in particular when legislative chambers (rather than legislature and governor) were divided

    More laws, more growth? Evidence from U.S. states

    No full text
    This paper analyzes the conditions under which more legislation contributes to economic growth. In the context of U.S. states, we apply natural language processing tools to measure legislative flows for the years 1965-2012. We implement a novel shift-share design for text data, where the instrument for legislation is leaveone-out legal-topic flows interacted with pre-treatment legal-topic shares. We find that at the margin, higher legislative output causes more economic growth. Consistent with more complete laws reducing ex post hold-up, we find that the effect is driven by the use of contingent clauses, is largest in sectors with high relationshipspecific investments, and is increasing with local economic uncertaint

    Measuring discretion and delegation in legislative texts : methods and application to US states

    Full text link
    Bureaucratic discretion and executive delegation are central topics in political economy and political science. The previous empirical literature has measured discretion and delegation by manually coding large bodies of legislation. Drawing from computational linguistics, we provide an automated procedure for measuring discretion and delegation in legal texts to facilitate large-scale empirical analysis. The method uses information in syntactic parse trees to identify legally relevant provisions, as well as agents and delegated actions. We undertake two applications. First, we produce a measure of bureaucratic discretion by looking at the level of legislative detail for US states and find that this measure increases after reforms giving agencies more independence. This eect is consistent with an agency cost model, where a more independent bureaucracy requires more specific instructions (less discretion) to avoid bureaucratic drift. Second, we construct measures of delegation to governors in state legislation. Consistent with previous estimates using non-text metrics, we find that executive delegation increases under unified governmen

    Elections and divisiveness : theory and evidence

    Full text link
    This article provides a theoretical and empirical analysis of how politicians allocate their time across issues. When voters are uncertain about an incumbent’s preferences, there is a pervasive incentive to “posture” by spending too much time on divisive issues (which are more informative about a politician’s preferences) at the expense of time spent on common-values issues (which provide greater benefit to voters). Higher transparency over the politicians’ choices can exacerbate the distortions. These theoretical results motivate an empirical study of how Members of the US Congress allocate time across issues in their floor speeches. We find that US senators spend more time on divisive issues when they are up for election, consistent with electorally induced posturing. In addition, we find that US house members spend more time on divisive issues in response to higher news transparency

    Evaluating Document Representations for Content-based Legal Literature Recommendations

    No full text
    Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincar'e), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincar'e citation embeddings. Combining fastText and Poincar'e in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at https://github.com/malteos/legal-document-similarity/

    Evaluating Document Representations for Content-based Legal Literature Recommendations

    No full text
    Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincar'e), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincar'e citation embeddings. Combining fastText and Poincar'e in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at https://github.com/malteos/legal-document-similarity/

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore