1,721,622 research outputs found
Victoria Stodden: Scholarly Communication in the Era of Big Data and Big Computation
Victoria Stodden gave the keynote address for Open Access Week 2015. "Scholarly communication in the era of big data and big computation" was sponsored by the University Libraries, Computational Modeling and Data Analytics, the Department of Computer Science, the Department of Statistics, the Laboratory for Interdisciplinary Statistical Analysis (LISA), and the Virginia Bioinformatics Institute. Victoria Stodden is an associate professor in the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. She completed both her PhD in statistics and her law degree at Stanford University. Her research centers on the multifaceted problem of enabling reproducibility in computational science. This includes studying adequacy and robustness in replicated results, designing and implementing validation systems, developing standards of openness for data and code sharing, and resolving legal and policy barriers to disseminating reproducible research.Virginia Tech. University LibrariesVirginia Tech. Division of Computational Modeling and Data AnalyticsVirginia Tech. Department of Computer ScienceVirginia Tech. Department of StatisticsVirginia Tech. Laboratory for Interdisciplinary Statistical Analysis (LISA)Virginia Bioinformatics Institut
DEplain-APA
DEplain: A corpus for German Text Simplification
This repository contains the corpus called DEplain-APA for German text simplification (document and sentence simplification). The corpus contains Austrian nexts text provided by the APA - Austria Presse Agentur eG. All of the sentence-wise aligned pairs (complex-simple) are manually aligned. The following table summarizes the most important meta data of the corpus.
meta data
value
language
DE-AT (Austrian German)
domain
news
source language level
B1
target language level
A2
# document pairs (total, train/dev/test)
483 (387/48/48)
# sentence pairs (total, train/dev/test)
13,122 (10,660/1,231/1,231)
# complex sentences
25,607
# simple sentences
26,471
For more information, please have a look at our paper. If you use this corpus, please also cite our paper and name APA - Austria Presse Agentur eG as data provider:
Regina Stodden, Omar Momen, and Laura Kallmeyer. 2023. DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16441–16463, Toronto, Canada. Association for Computational Linguistics
Recommended from our members
Trust Your Science? Open Your Data and Code
This is a view on the reproducibility of computational sciences by Victoria Stodden. It contains information on the Reproducibility, Replicability, and Repeatability of code created by the other sciences. Stodden also talks about the rising prominence of computational sciences as we are in the digital age and what that means for the future of science and collecting data
MASSIVE DATA, THE DIGITIZATION OF SCIENCE, AND REPRODUCIBILITY OF RESULTS
As the scientific enterprise becomes increasingly computational and data-driven, the nature of the information communicated must change. Without inclusion of the code and data with published computational results, we are engendering a credibility crisis in science. Controversies such as ClimateGate, the microarray-based drug sensitivity clinical trials under investigation at Duke University, and retractions from prominent journals due to unverified code suggest the need for greater transparency in our computational science. In this talk I argue that the scientific method be restored to (1) a focus on error control as central to scientific communication and (2) complete communication of the underlying methodology producing the results, ie. reproducibility. I outline barriers to these goals based on recent survey work (Stodden 2010), and suggest solutions such as the “Reproducible Research Standard” (Stodden 2009), giving open licensing options designed to create an intellectual property framework for scientists consonant with longstanding scientific norms
Recommended from our members
How Technology Is (Rapidly) Expanding the Scope of the Law in Statistics
Power point presentation on how technology is expanding the scope law has in statistics. Stodden goes into policy in terms of the ever growing enterprise of computational science, the update of the scientific method, different methods for code sharing and licensing (such as creative commons), and the way an updated scientific method would have an influence on reproducibility
Recommended from our members
Reproducibility in Computational Science: Framing the Concept
Power point presentation on the “Reproducibility in Computational Science” by Victoria Stodden going over the definitions of reproducibility, implementations of the scientific method in different fields, how this applies to policy makers, journal editors, and agencies such as the NSF that award grants for projects
Recommended from our members
Data Management and Sharing Policies in the NSF and the NIH
A power point presentation on data management and sharing policies in regards to the NSF and NIH foundations. Victoria Stodden explains the impact of computational methods as a central part of the scientific enterprise, how the scientific method should be updated, the role policy plays in terms of the NSF guidelines and how data should be shared and protected in terms of congressional policy
Recommended from our members
The Credibility Crisis and Computational Science: Accountability and Public Health
Power point presentation on “The Credibility Crisis and Computational Science” in terms of “Accountability and Public Health” in which Victoria Stodden goes into policy in terms of the ever growing enterprise of computational science, the update of the scientific method, different methods for code sharing and licensing (such as creative commons), and the way an updated scientific method would have an influence on reproducibility
Recommended from our members
Innovation and Growth through Open Access to Scientific Research: Three Ideas for High-Impact Rule Changes
A paper on Data Policies by Victoria Stodden where she explores the framing principles that should be applied to the reproduction of computational research and results and how those principles should be used to guide scientific policy during the digital age
Recommended from our members
Scientific Practice Today and the Scientific Method: Responding to the Credibility Crisis
Power point presentation on scientific practices today and the scientific method in terms of computational science. Stodden goes into policy in terms of the ever growing enterprise of computational science, the update of the scientific method, different methods for code sharing and licensing (such as creative commons), and the way an updated scientific method would have an influence on reproducibility
- …
