1,721,622 research outputs found

    Victoria Stodden: Scholarly Communication in the Era of Big Data and Big Computation

    No full text
    Victoria Stodden gave the keynote address for Open Access Week 2015. "Scholarly communication in the era of big data and big computation" was sponsored by the University Libraries, Computational Modeling and Data Analytics, the Department of Computer Science, the Department of Statistics, the Laboratory for Interdisciplinary Statistical Analysis (LISA), and the Virginia Bioinformatics Institute. Victoria Stodden is an associate professor in the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. She completed both her PhD in statistics and her law degree at Stanford University. Her research centers on the multifaceted problem of enabling reproducibility in computational science. This includes studying adequacy and robustness in replicated results, designing and implementing validation systems, developing standards of openness for data and code sharing, and resolving legal and policy barriers to disseminating reproducible research.Virginia Tech. University LibrariesVirginia Tech. Division of Computational Modeling and Data AnalyticsVirginia Tech. Department of Computer ScienceVirginia Tech. Department of StatisticsVirginia Tech. Laboratory for Interdisciplinary Statistical Analysis (LISA)Virginia Bioinformatics Institut

    DEplain-APA

    No full text
    DEplain: A corpus for German Text Simplification This repository contains the corpus called DEplain-APA for German text simplification (document and sentence simplification). The corpus contains Austrian nexts text provided by the APA - Austria Presse Agentur eG. All of the sentence-wise aligned pairs (complex-simple) are manually aligned. The following table summarizes the most important meta data of the corpus. meta data value language DE-AT (Austrian German) domain news source language level B1 target language level A2 # document pairs (total, train/dev/test) 483 (387/48/48) # sentence pairs (total, train/dev/test) 13,122 (10,660/1,231/1,231) # complex sentences 25,607 # simple sentences 26,471 For more information, please have a look at our paper. If you use this corpus, please also cite our paper and name APA - Austria Presse Agentur eG as data provider: Regina Stodden, Omar Momen, and Laura Kallmeyer. 2023. DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16441–16463, Toronto, Canada. Association for Computational Linguistics

    MASSIVE DATA, THE DIGITIZATION OF SCIENCE, AND REPRODUCIBILITY OF RESULTS

    No full text
    As the scientific enterprise becomes increasingly computational and data-driven, the nature of the information communicated must change. Without inclusion of the code and data with published computational results, we are engendering a credibility crisis in science. Controversies such as ClimateGate, the microarray-based drug sensitivity clinical trials under investigation at Duke University, and retractions from prominent journals due to unverified code suggest the need for greater transparency in our computational science. In this talk I argue that the scientific method be restored to (1) a focus on error control as central to scientific communication and (2) complete communication of the underlying methodology producing the results, ie. reproducibility. I outline barriers to these goals based on recent survey work (Stodden 2010), and suggest solutions such as the “Reproducible Research Standard” (Stodden 2009), giving open licensing options designed to create an intellectual property framework for scientists consonant with longstanding scientific norms
    corecore