1,720,964 research outputs found
Upper bound estimators of the population size based on ordinal models for capture-recapture experiments
Capture-recapture studies have attracted a lot of attention over the past few decades, especially in applied disciplines where a direct estimate for the size of a population of interest is not available. Epidemiology, ecology, public health, and biodiversity are just a few examples. The estimation of the number of unseen units has been a challenge for theoretical statisticians, and considerable progress has been made in providing lower bound estimators for the population size. In fact, it is well known that consistent estimators for this cannot be provided in the very general case. Considering a case where capture-recapture studies are summarized by a frequency of frequencies distribution, we derive a simple upper bound of the population size based on the cumulative distribution function. We introduce two estimators of this bound, without any specific parametric assumption on the distribution of the observed frequency counts. The behavior of the proposed estimators is investigated using several benchmark datasets and a large-scale simulation experiment based on the scheme discussed by Pledger.</p
Modeling delay in diagnosis for NF: under reporting, incidence and prevalence estimates
In this paper, we analyze data from the Italian National Register of Rare Diseases (NRRD) focusing, in particular, on the geo-temporal distribution of patients affected by neurofibromatosis type 1 (NF1, ICD9CM code 237.71). The aim is at deriving a corrected measure of incidence for the period 2007–2009 using a single source, and to provide NF1 prevalence estimates for the period 2001–2006 through the use of capture–recapture methods over two sources. In the first case, a reverse hazard estimator for the delay
in diagnosis of NF1 is used to estimate the probability that a generic unit belonging to the population of interest has been registered by the archive of reference. For the second purpose, two-source capture–recapture methods have been used to estimate the number of NF1 prevalent units in Italy for the period
2001–2006, matching information provided by the NRRD and the national register of hospital discharge, Scheda di Dimissione Ospedaliera (in the following SDO), archives
A flexible ratio regression approach for zero-truncated capture–recapture counts
Capture–recapture methods are used to estimate the size of a population of interest which is only partially observed. In such studies, each member of the population carries a count of the number of times it has been identified during the observational period. In real-life applications, only positive counts are recorded, and we get a truncated at zero-observed distribution. We need to use the truncated count distribution to estimate the number of unobserved units. We consider ratios of neighboring count probabilities, estimated by ratios of observed frequencies, regardless of whether we have a zero-truncated or an untruncated distribution. Rocchetti et al. (2011) have shown that, for densities in the Katz family, these ratios can be modeled by a regression approach, and Rocchetti et al. (2014) have specialized the approach to the beta-binomial distribution. Once the regression model has been estimated, the unobserved frequency of zero counts can be simply derived. The guiding principle is that it is often easier to find an appropriate regression model than a proper model for the count distribution. However, a full analysis of the connection between the regression model and the associated count distribution has been missing. In this manuscript, we fill the gap and show that the regression model approach leads, under general conditions, to a valid count distribution; we also consider a wider class of regression models, based on fractional polynomials. The proposed approach is illustrated by analyzing various empirical applications, and by means of a simulation study
Estimating the undetected infections in the Covid-19 outbreak by harnessing capture-recapture methods
Objectives: A major open question, affecting the decisions of policy makers, is the estimation of the true number of Covid-19 infections. Most of them are undetected, because of a large number of asymptomatic cases. We provide an efficient, easy to compute and robust lower bound estimator for the number of undetected cases. Methods: A modified version of the Chao estimator is proposed, based on the cumulative time-series distributions of cases and deaths. Heterogeneity has been addressed by assuming a geometrical distribution underlying the data generation process. An (approximated) analytical variance of the estimator has been derived to compute reliable confidence intervals at 95% level. Results: A motivating application to the Austrian situation is provided and compared with an independent and representative study on prevalence of Covid-19 infection. Our estimates match well with the results from the independent prevalence study, but the capture–recapture estimate has less uncertainty involved as it is based on a larger sample size. Results from other European countries are mentioned in the discussion. The estimated ratio of the total estimated cases to the observed cases is around the value of 2.3 for all the analyzed countries. Conclusions: The proposed method answers to a fundamental open question: “How many undetected cases are going around?”. CR methods provide a straightforward solution to shed light on undetected cases, incorporating heterogeneity that may arise in the probability of being detected.</p
Estimating the size of undetected cases of the Covid-19 outbreak in Europe: an upper bound estimator
BackgroundWhile the number of detected COVID-19 infections are widely available, an understanding of the extent of undetected cases is urgently needed for an effective tackling of the pandemic. The aim of this work is to estimate the true number of COVID-19 (detected and undetected) infections in several European countries. The question being asked is: How many cases have actually occurred?MethodsWe propose an upper bound estimator under cumulative data distributions, in an open population, based on a day-wise estimator that allows for heterogeneity. The estimator is data-driven and can be easily computed from the distributions of daily cases and deaths. Uncertainty surrounding the estimates is obtained using bootstrap methods.ResultsWe focus on the ratio of the total estimated cases to the observed cases at April 17th. Differences arise at the country level, and we get estimates ranging from the 3.93 times of Norway to the 7.94 times of France. Accurate estimates are obtained, as bootstrap-based intervals are rather narrow.ConclusionsMany parametric or semi-parametric models have been developed to estimate the population size from aggregated counts leading to an approximation of the missed population and/or to the estimate of the threshold under which the number of missed people cannot fall (i.e. a lower bound). Here, we provide a methodological contribution introducing an upper bound estimator and provide reliable estimates on the dark number, i.e. how many undetected cases are going around for several European countries, where the epidemic spreads differently
Population size estimation based upon ratios of recapture probabilities
Estimating the size of an elusive target population is of prominent interest in many areas in the life and social sciences. Our aim is to provide an efficient and workable method to estimate the unknown population size, given the frequency distribution of counts of repeated identifications of units of the population of interest. This counting variable is necessarily zero-truncated, since units that have never been identified are not in the sample. We consider several applications: clinical medicine, where interest is in estimating patients with adenomatous polyps which have been overlooked by the diagnostic procedure; drug user studies, where interest is in estimating the number of hidden drug users which are not identified; veterinary surveillance of scrapie in the UK, where interest is in estimating the hidden amount of scrapie; and entomology and microbial ecology, where interest is in estimating the number of unobserved species of organisms. In all these examples, simple models such as the homogenous Poisson are not appropriate since they do not account for present and latent heterogeneity. The Poisson–Gamma (negative binomial) model provides a flexible alternative and often leads to well-fitting models. It has a long history and was recently used in the development of the Chao–Bunge estimator. Here we use a different property of the Poisson–Gamma model: if we consider ratios of neighboring Poisson–Gamma probabilities, then these are linearly related to the counts of repeated identifications. Also, ratios have the useful property that they are identical for truncated and untruncated distributions. In this paper we propose a weighted logarithmic regression model to estimate the zero frequency counts, assuming a Gamma–Poisson distribution for the counts. A detailed explanation about the chosen weights and a goodness of fit index are presented, along with extensions to other distributions. To evaluate the proposed estimator, we applied it to the benchmark examples mentioned above, and we compared the results with those obtained through the Chao–Bunge and other estimators. The major benefits of the proposed estimator are that it is defined under mild conditions, whereas the Chao–Bunge estimator fails to be well defined in several of the examples presented; in cases where the Chao–Bunge estimator is defined, its behavior is comparable to the proposed estimator in terms of Bias and MSE as a simulation study shows. Furthermore, the proposed estimator is relatively insensitive to inclusion or exclusion of large outlying frequencies, while sensitivity to outliers is characteristic of most other methods. The implications and limitations of such methods are discussed. <br/
Estimating the undetected infections in the Monkeypox outbreak
While the number of detected Monkeypox infections are widely available, an understanding of the extent of undetected cases is urgently needed for an effective tackling of its spread. The aim of this study is to estimate the true number of Monkeypox (detected and undetected) infections in most affected countries. The question being asked is: How many cases have actually occurred? We propose a lower bound estimator for the true number of Monkeypox cases. The estimator is data-driven and can be easily computed from the cumulative distributions of weekly cases. We focused on the ratio of the total estimated cases to the observed cases on July 31, 2022: The proportion of undetected cases was relevant in all countries, with countries whose estimated true number of infections could be more than three times the observed one. We provided a practical contribution to the understanding of the current Monkeypox wave and reliable estimates on how many undetected cases are going around in several countries, where the epidemic spreads differently
A regression estimator for mixed binomial capture-recapture data
Mixed binomial models are frequently used to provide estimates for the unknown size of a partially observed population when capture–recapture data are available through a known, finite, number of identification (sampling) sources. In this context, inherently major problems may be the lack of identifiability of the mixing distribution (Link, 2003) and boundary problems in ML estimation for mixed binomial models (such as the beta-binomial or finite mixture of binomials), see e.g. Dorazio and Royle, 2003 and Dorazio and Royle, 2005. To solve these problems, we introduce a novel regression estimator based on observed ratios of successive capture frequencies. Both simulations and real data examples show that the proposed estimator frequently leads to under-estimate the true population size, but with a smaller bias and a lower variability when compared to other well-known estimators
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
