CEDA Repository
Not a member yet
1250 research outputs found
Sort by
BADC User Statistics Report 2013
The British Atmospheric Data Centre (BADC) came into existence in 1994 to respond to the needs of the Natural Environmental Research Council’s (NERC) desire for a dedicated UK data centre for atmospheric research. Originally the Geophysical Data Facility (GDF), operated by the Science and Engineering Research Council (SERC), served less than 200 registered users, from which the BADC’s registered user community has now grown to over 22,500 users. During the intervening period the BADC archive has grown to over 1 Pb. of accessible online data and was amalgamated with the NERC Earth Observation Data Centre (NEODC) in 2005 to form the Centre for Environmental Data Archival (CEDA).
This report presents details of the current active user base with a historical review where suitable information was available to the author. The primary sources of information for this review were the user database maintained by CEDA, HTTP and FTP download logs and BADC website access logs.
It is hoped that this historical review will help to provide insights into the BADC user community to enable CEDA to continue to provide improved user services primarily targeted towards its core user community, while also enabling support for an ever diversifying user community
Publishing data?! How do you do that then?
Data are a vital part of the scientific process, but all too often they languish at the bottom of cupboards or on servers, without any documentation, or any credit given to the researcher who created and/or manages them. Now, more than ever, data is of interest not only to researchers, but also to the funders and members of the general public, and there is greater pressure on institutions and researchers to make their data available for further reuse. Journals and data repositories are providing new platforms for researchers to make their data available, and also to ensure that the data authors and managers get the credit they deserve for creating, maintaining and sharing their data. This talk will discuss advances in data citation and publication and is suitable for anyone who works with data
Descriptions of MOHC perturbed physics ensembles (PPE) experiments investigating changes in African temperature and precipitation
Data from two perturbed physics ensembles developed by the Met Office Hadley Centre as part of the QUMP (Quantifying Uncertainty in Model Predictions) project. These data were used to investigate changes in African temperature and precipitation associated with global warming for a special issue on the future of African rainforests
Connecting data repositories and publishers for data publication
This presentation discusses the requirements for connecting data repositories and journal publishers for data publication, in the context of work done by the BADC and the PREPARDE project http://proj.badc.rl.ac.uk/preparde
Guidelines for Data Publication: Outputs from the PREPARDE project (public)SPM1.38, Fri 12 April, 12:15â13:15, R3
At this meeting, the PREPARDE project (http://www2.le.ac.uk/projects/preparde) will present draft guidelines for data publication processes and will solicit feedback on them from members of the community. These processes include peer-review of data, data repository accreditation and cross-linking and workflows for data publication
Science Support: The Building Blocks of Active Data Curation
While the scientific method is built on reproducibility and transparency, and results are published in peer reviewed literature, we have come to the digital age of very large datasets (now of the order of petabytes and soon exabytes) which cannot be published in the traditional way. To preserve reproducibility and transparency, active curation is necessary to keep and protect the information in the long term, and “science support” activities provide the building blocks for active data curation.
With the explosive growth of data in all fields in recent years, there is a pressing urge for data centres to now provide adequate services to ensure long-term preservation and digital curation of project data outputs, however complex those may be. Science support provides advice and support to science projects on data and information management, from file formats through to general data management awareness. Another purpose of science support is to raise awareness in the science community of data and metadata standards and best practice, engendering a culture where data outputs are seen as valued assets. At the heart of Science support is the Data Management Plan (DMP) which sets out a coherent approach to data issues pertaining to the data generating project. It provides an agreed record of the data management needs and issues within the project. The DMP is agreed upon with project investigators to ensure that a high quality documented data archive is created. It includes conditions of use and deposit to clearly express the ownership, responsibilities and rights associated with the data. Project specific needs are also identified for data processing, visualization tools and data sharing services.
As part of the National Centre for Atmospheric Science (NCAS) and National Centre for Earth Observation (NCEO), the Centre for Environmental Data Archival (CEDA) fulfills this science support role of facilitating atmospheric and Earth observation data generating projects to ensure successful management of the data and accompanying information for reuse and repurpose. Specific examples at CEDA include science support provided to FAAM (Facility for Airborne Atmospheric Measurements) aircraft campaigns and large-scale modelling projects such as UPSCALE, the largest ever PRACE (Partnership for Advanced Computing in Europe) computational project, dependent on CEDA to provide the high-performance storage, transfer capability and data analysis environment on the “super-data-cluster” JASMIN.
The impact of science support on scientific research is conspicuous: better documented datasets with an increasing collection of metadata associated to the archived data, ease of data sharing with the use of standards in formats and metadata and data citation. These establish a high-quality of data management ensuring long-term preservation and enabling re-use by peer scientists which ultimately leads to faster paced progress in science
Data Citation and Publication by NERCâs Environmental Data Centres
Data are the foundation upon which scientific progress rests. Historically speaking, data were a scarce resource, but one which was (relatively) easy to publish in hard copy, as tables or graphs in journal papers. With modern scientific methods, and the increased ease in collecting and analysing vast quantities of data, there arises a corresponding difficulty in publishing this data in a form that can be considered part of the scientific record. It is easy enough to âpublishâ the data to a Web site, but as anyone who has followed a broken link knows, there is no guarantee that the data will still be in place, or will not have changed, since it was first put online. A crucial part of science is the notion of reproducibility: if a dataset is used to draw important conclusions, and then the dataset changes, those conclusions can no longer be re-validated by someone else
E-infrastructure for climate and atmospheric science research
Recent government in e-Infrastructure will transform aspects of environmental science by supporting both fundamental science and innovative uses of environmental data by the commmercial sector. The STFC Centre for Environmental Data Archival (CEDA) is heavily involved in two major projects: JASMIN - a NERC funded facility which will support both data archival and scientific data analysis, and CEMS - the Facility for Climate and Environmental Monitoring from Space - aimed at fostering knowledge exchange and commercial exploitation of environmental data. JASMIN and CEMS will share some hardware.
In this presentation, we concentrate on JASMIN, which will consist of multi-Petabyte fast reliable storage and co-located data analysis compute at the STFC Rutherford Appleton Laboratory, with satellite installations at Reading, Leeds and Bristol Universities. JASMIN is a response to the growing use of direct numerical simulation in the environmental sciences resulting in much higher demand for high performance computing. This growth in HPC is accompanied by a transition in its nature, with data intensive HPC becoming an ever increasing part of the mix. (For example, at the time of writing CEDA is currently evaluating the requirements in terms of storage and co-located analysis compute for three grants each of which is expected to produce in excess of 0.5 PB of data over the next three years - this on top of known data acquisition already measured in PB. Clearly every grant round could bring similar requirements.)
Such data intensive HPC is being carried out on on many different supercomputers, so it is no longer satisfactory to assume that putting storage alongside the HPC will solve the analysis problem (since such a solution, alone, could result in an NxN data transfer problem for data comparison between results on N supercomputers). Inevitably one needs to reduce the data transfer problem down to as close to Nx1 as possible - hence JASMIN - a facility configured for data storage AND analysis. For analysis, JASMIN will deploy a "private cloud" to allow the community to develop their own analysis environment using their favourite operating system configuration. JASMIN will also be used, along with a large tape facilities provided by STFC, to provide persistent storage for the archival and curation functions which CEDA also provides. These storage and computing advances will be supported by high-bandwidth network connectivity between key collaborating institutions (particularly supercomputing sites), both within the UK and in the Europe, and new light paths have been established alongside the JASMIN activity.
JASMIN: Joint Analysis System Meeting e-Infrastructure Need
Improving the foundations of the scientific record: data citation and publication by the NERC data centres
Metadata for data discovery: The NERC Data Catalogue Service
A presentation explaining metadata for data discovery, covering:
- NERC, Science and Data Centres
– NERC Discovery Metadata
– The Data Catalogue Service
– NERC Data Services
– A Case study showing generation of Metadata and doing something useful with i