1,721,006 research outputs found

    Americas Datasets

    No full text
    Peer-reviewed raster-based population distribution datasets having a resolution of 3 arc seconds (approximately 100m at the equator) and created using a Random Forest-based dasymetric mapping approach (Stevens et al., 2015; see Other References in the Metadata) to disaggregate official population count data for 28 countries located in Latin America and the Caribbean &ndash; FILENAME CONVENTION: ISO_ppp/pph_v2b_YEAR_UNadj.tif = Country (identified by its unique ISO code) population per pixel (ppp)/per hectare (pph) dataset referring to a specific year (YEAR) adjusted to match United Nations national estimates (UNadj) and produced using the version 2b (v2b) of the WorldPop-RF code available at: http://dx.doi.org/10.6084/m9.figshare.1491490 (Stevens et al., 2015; see Related Material in the Metadata).</span

    High-resolution gridded population datasets for Latin America and the Caribbean in 2010, 2015, and 2020

    No full text
    The Latin America and the Caribbean region is one of the most urbanized regions in the world, with a total population of around 630 million that is expected to increase by 25% by 2050. In this context, detailed and contemporary datasets accurately describing the distribution of residential population in the region are required for measuring the impacts of population growth, monitoring changes, supporting environmental and health applications, and planning interventions. To support these needs, an open access archive of high-resolution gridded population datasets was created through disaggregation of the most recent official population count data available for 28 countries located in the region. These datasets are described here along with the approach and methods used to create and validate them. For each country, population distribution datasets, having a resolution of 3 arc seconds (approximately 100?m at the equator), were produced for the population count year, as well as for 2010, 2015, and 2020. All these products are available both through the WorldPop Project website and the WorldPop Dataverse Repository

    Assessing the spatial sensitivity of a random forest model: Application in gridded population modeling

    No full text
    Gridded human population data provide a spatial denominator to identify populations at risk, quantify burdens, and inform our understanding of human-environment systems. When modeling gridded population, the information used for training the model may differ in spatial resolution than what is produced by the model prediction. This case arises when approaching population modeling from a top-down, dasymetric approach in which one redistributes coarse administrative unit level population data (i.e., source unit) to a finer scale (i.e., target unit). However, often overlooked are issues associated with the differing variance across the scale, spatial autocorrelation and bias in sampling techniques. In this study, we examine the effects of intentionally biasing our sampling from the source to target scale within the context of a weighted, dasymetric mapping approach. The weighted component is based on a Random Forest estimator, which is a non-parametric ensemble-based prediction model. We investigate issues of autocorrelation and heterogeneity in the training data using 18 different types of samples to show the variations in training, census-level (i.e., source) and output, grid-level (i.e., target) predictions. We compare results to simple random sampling and geographically stratified random sampling. Results indicate that the Random Forest model is sensitive to the spatial autocorrelation inherent in the training data, which leads to an increase in the variance of the residuals. Sample training datasets that are at a spatial scale representative of the true population produced the best fitting models. However, the true representative dataset varied in autocorrelation for both scales. More attention is needed with ensemble-based learning and spatially-heterogeneous data as underlying issues of spatial autocorrelation influence results for both the census-level and grid-level estimations.</p

    GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data

    No full text
    BackgroundHousehold survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results.ResultsWe replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts.ConclusionsGridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda.<br/

    Improving the accuracy of gridded population estimates in cities and slums to monitor SDG 11: evidence from a simulation study in Namibia

    No full text
    People living in slums and other deprived areas in low- and middle-income country (LMIC) cities are under-represented in censuses, and subsequently in “top-down” census-derived gridded population estimates. Modelled gridded population data are a unique source of disaggregated population information to calculate local development indicators such as the Sustainable Development Goals (SDGs). This study evaluates if, and how, “top-down” WorldPop Global (WPG) Unconstrained and Constrained datasets might be improved in a simulated LMIC urban population by incorporating slum population counts into model training. We found that the WPG-Unconstrained model, with or without slum training data, underestimated population in urban deprived areas while overestimating population in rural areas. The percent of population living in slums (SDG 11.1.1), for example, was estimated to be 20% or less compared to a “true” value of 29.5%. The WPG-Constrained model, which included building footprint auxiliary datasets, far more accurately estimated the population in all grid cells (including rural areas), and the inclusion of slum training data further improved estimates such that SDG 11.1.1 was estimated at 27.1% and 27.0%, respectively. Inclusion of building metrics and slum population training data in “top-down” gridded population models can substantially improve grid cell-level accuracy in both urban and rural areas

    wpgpRFPMS: WorldPop Random Forests population modelling R scripts , version 0.1.0

    No full text
    wpgpRFPMS is a population modelling R script utilizing Random Forests to inform a dasymetric redistribution of census-based population count data. Random Forest based dasymetric mapping approach developed by Stevens et al. Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data. PLoS ONE 10, e0107042 (2015). </span

    Quantifying the effects of using detailed spatial demographic data on health metrics: a systematic analysis for the AfriPop, AsiaPop, and AmeriPop projects

    No full text
    AbstractBackgroundThe Millennium Development Goals (MDGs) have prompted an expansion in approaches to deriving health metrics to measure progress towards their achievement. Accurate measurements should take into account the high degrees of spatial heterogeneity in health risks across countries, prompting the development of sophisticated cartographic techniques for mapping and modelling risks. Conversion of these risks to relevant population-based metrics requires equally detailed information on the spatial distribution and attributes of the denominator populations. However, spatial information on age and sex composition are lacking, prompting many health metric studies to overlook the substantial demographic variations that exist subnationally and to merely apply national-level adjustments.MethodsHere, we describe the development of high-resolution age and sex structured spatial population datasets for Africa, Asia, and Latin America in 2000â15, built from millions of measurements mapped to more than 200â000 subnational units, and originating from censuses, census microdata, and household surveys.FindingsWe analysed the substantial variations seen within countries, by settlement type, and across the continents for key MDG indicator groups, focusing on children under 5 and women of childbearing age, and found that substantial differences in various MDG-related health and development indicators can result through using only national-level statistics compared with accounting for subnational variation.InterpretationProgress towards meeting the MDGs will be measured through national-level indicators that mask substantial inequalities and heterogeneities across nations. Cartographic approaches are providing opportunities for quantitative assessments of these inequalities and the targeting of interventions, but demographic spatial datasets to support such efforts remain reliant on coarse and outdated input data for accurately locating risk groups. We have shown here that sufficient data exist to map the distribution of key vulnerable groups, and that doing so has substantial impacts on health metrics. Further details and data are available through the project websites: www.afripop.org, www.asiapop.org, and www.ameripop.org.FundingAJT acknowledges funding support from the RAPIDD program of the Science and Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health, and is also supported by grants from NIH/NIAID (U19AI089674) and the Bill ANDamp; Melinda Gates Foundation (49446 and 1032350).info:eu-repo/semantics/publishe

    Improving large area population mapping using geotweet densities

    No full text
    Many different methods are used to disaggregate census data and predict population densities to construct finer scale, gridded population data sets. These methods often involve a range of high resolution geospatial covariate datasets on aspects such as urban areas, infrastructure, land cover and topography; such covariates, however, are not directly indicative of the presence of people. Here we tested the potential of geo-located tweets from the social media application, Twitter, as a covariate in the production of population maps. The density of geo-located tweets in 1x1 km grid cells over a 2-month period across Indonesia, a country with one of the highest Twitter usage rates in the world, was input as a covariate into a previously published random forests-based census disaggregation method. Comparison of internal measures of accuracy and external assessments between models built with and without the geotweets showed that increases in population mapping accuracy could be obtained using the geotweet densities as a covariate layer. The work highlights the potential for such social media-derived data in improving our understanding of population distributions and offers promise for more dynamic mapping with such data being continually produced and freely availabl
    corecore