Search CORE

1,721,096 research outputs found

On the bias in gross labour flow estimates due to nonresponse and misclassification

Author: Zhang Li-Chun
Publication venue
Publication date: 01/01/2005
Field of study

I evaluate and compare the bias due to nonresponse and misclassification in the sample gross labour flow estimates from the Norwegian Labour Force Survey (LFS). These help also to explain the level and net change estimates from the same survey. The main conclusions are the following: (a) the overall labour market stability, i.e., the proportion of people without change in status, should be boosted after adjusting for both nonresponse and misclassification, (b) neither nonresponse nor misclassification affects the net change estimates, and (c) misclassification has very little effect on the level estimation of the characteristics “employed”, “unemployed” and “not in the labour force”

Southampton (e-Prints Soton)

NORA - Norwegian Open Research Archives

Statistics Norway's Open Research Repository

Graph spatial sampling

Author: Zhang Li-Chun
Publication venue
Publication date: 23/06/2024
Field of study

We develop lagged Metropolis-Hastings walk for sampling from simple undirected graphs according to given stationary sampling probabilities. It is explained how the technique can be applied together with designed graphs for sampling of units-in-space. Compared to the existing spatial sampling methods, which chiefly focus on the sample spatial balance regardless of the associated outcomes of interest, the proposed graph spatial sampling method can considerably improve the efficiency because the graph can be designed to take into account the anticipated spatial distribution of the outcome of interest

Southampton (e-Prints Soton)

Finite population small area interval estimation

Author: Zhang Li-Chun
Publication venue
Publication date: 01/01/2007
Field of study

Small area interval estimation is considered for a finite population, where the small area parameters are treated as fixed constants. Design based direct estimation yields intervals that are too long to be useful. Model based approaches are considered. The design based area-specific coverages are uncontrollable. We propose to use population-specific simultaneous coverage as the basis for evaluating the small area confidence intervals. Wage survey and census household data are used for illustration

Southampton (e-Prints Soton)

NORA - Norwegian Open Research Archives

Statistics Norway's Open Research Repository

A note on dual system population size estimator

Author: Zhang Li-Chun
Publication venue
Publication date: 2019
Field of study

Several countries are currently investigating the possibility of replacing the costly population census with a Population Data set derived from administrative sources, in combination with a purposely designed Population Coverage Survey. We formulate the assumptions of the dual system estimator in this context, and contrast them to the situation involving a census and a Census Coverage Survey

Southampton (e-Prints Soton)

Audit sampling as a quality standard for multisource official statistics

Author: Zhang Li-Chun
Publication venue
Publication date: 01/01/2023
Field of study

Designed surveys through sampling or census are the standard approach to official statistics, where the targets are descriptive summaries of a given population. Official statistics are also commonly produced by combining relevant administrative registers, such as in the Nordic countries since the 1960s. The scope of non-survey data sources are being extended to include various so-called big-data sources, although so far relatively few multisource statistics of this kind have been credited as official statistics. Trustworthy evaluation of multisource official statistics is a fundamental issue for creating a new quality assurance standard. In this paper, audit sampling inference will be explained, illustrated and promoted to this end

Southampton (e-Prints Soton)

Estimates for small area compositions subjected to informative missing data

Author: Zhang Li-Chun
Publication venue
Publication date: 01/01/2009
Field of study

Estimation of small area (or domain) compositions may suffer from informative missing data, if the probability of missing varies across the categories of interest as well as the small areas. We develop a double mixed modeling approach that combines a random effects mixed model for the underlying complete data with a random effects mixed model of the differential missing-data mechanism. The effect of sampling design can be incorporated through a quasi-likelihood sampling model. The associated conditional mean squared error of prediction is approximated in terms of a three-part decomposition, corresponding to a naive prediction variance, a positive correction that accounts for the hypothetical parameter estimation uncertainty based on the latent complete data, and another positive correction for the extra variation due to the missing data. We illustrate our approach with an application to the estimation of Municipality household compositions based on the Norwegian register household data, which suffer from informative under-registration of the dwelling identity number

Southampton (e-Prints Soton)

NORA - Norwegian Open Research Archives

Statistics Norway's Open Research Repository

Graph sampling

Author: Zhang Li-Chun
Publication venue
Publication date: 27/12/2021
Field of study

Many technological, socio-economic, environmental, biomedical phenomena exhibit an underlying graph structure. Valued graph allows one to incorporate the connections or links among the population units in addition. The links may provide effectively access to the part of population that is the primary target, which is the case for many unconventional sampling methods, such as indirect, network, line-intercept or adaptive cluster sampling. Or, one may be interested in the structure of the connections, in terms of the corresponding graph properties or parameters, such as when various breadth- or depth-first non-exhaustive search algorithms are applied to obtain compressed views of large often dynamic graphs.Graph sampling provides a statistical approach to study real graphs from either of these perspectives. It is based on exploring the variation over all possible sample graphs (or subgraphs) which can be taken from the given population graph, by means of the relevant known sampling probabilities. The resulting design-based inference is valid whatever the unknown properties of the given real graphs.One-of-a-kind treatise of multidisciplinary topics relevant to statistics, mathematics and data science.Probabilistic treatment of breadth-first and depth-first non-exhaustive search algorithms in graphs.Presenting cutting-edge theory and methods based on latest research.Pathfinding for future research on sampling from real graphs.Graph Sampling can primarily be used as a resource for researchers working with sampling or graph problems, and as the basis of an advanced course for post-graduate students in statistics, mathematics and data science

Southampton (e-Prints Soton)

Likelihood imputation

Author: Zhang Li-Chun
Publication venue
Publication date: 1998
Field of study

The method of likelihood imputation is devised under the framework of latent structure models where the observation is a statistic of the complete data which can only be specified on a latent basis. The imputed data set is chosen to differ least from the observed one in their information contents—a concept with general implications for the analysis of incomplete-data. In contrast to the standard conditional-mean single imputation, our procedure depends on an entire likelihood region instead of any single point in it, and yields consistent parameter estimators nevertheless. We explain its implementations and illustrate with data from panel surveys and linear regression with censorship. We also discuss its potentials in sensitivity analysis

Southampton (e-Prints Soton)

Nonparametric Markov chain bootstrap for multiple imputation

Author: Zhang Li-Chun
Publication venue
Publication date: 2004
Field of study

Multiple imputation is a statistical method for analyzing data with missing values. Nonparametric Markov chain bootstrap methods can be used to generate multiple imputations of both scalar and multivariate outcome variables, under the assumption that the data are missing completely at random, and nonparametric inference can be obtained using multiple implementation bootstrap. The nonparametric approach is useful when parametric settings are inappropriate or difficult. An extension of the Markov chain bootstrap method is discussed under a more complex nonresponse assumption

Southampton (e-Prints Soton)

Generalised regression estimation given imperfectly matched auxiliary data

Author: Zhang Li-Chun
Publication venue
Publication date: 13/03/2021
Field of study

Generalised regression estimation allows one to make use of available auxiliary information in survey sampling. We develop three types of generalised regression estimator when the auxiliary data cannot be matched perfectly to the sample units, so that the standard estimator is inapplicable. The inference remains design-based. Consistency of the proposed estimators is either given by construction or else can be tested given the observed sample and links. Mean square errors can be estimated. A simulation study is used to explore the potentials of the proposed estimator

Southampton (e-Prints Soton)