Search CORE

1,720,998 research outputs found

TwoSampleTest.HD: An R Package for the Two-Sample Problem with High-Dimensional Data

Author: Cousido-Rocha Marta
de Uña-Álvarez Jacobo
Publication venue
Publication date: 18/12/2023
Field of study

The two-sample problem refers to the comparison of two probability distributions via two independent samples. With high-dimensional data, such comparison is performed along a large number p of possibly correlated variables or outcomes. In genomics, for instance, the variables may represent gene expression levels for p locations, recorded for two (usually small) groups of individuals. In this paper we introduce TwoSampleTest.HD, a new R package to test for the equal distribution of the p outcomes. Specifically, TwoSampleTest.HD implements the tests recently proposed by (Cousido- Rocha, Uña-Álvarez, and Hart 2019) for the low sample size, large dimensional setting. These tests take the possible dependence among the p variables into account, and work for sample sizes as small as two. The tests are based on the distance between the empirical characteristic functions of the two samples, when averaged along the p locations. Different options to estimate the variance of the test statistic under dependence are allowed. The package TwoSampleTest.HD provides the user with individual permutation p-values too, so feature discovery is possible when the null hypothesis of equal distribution is rejected. We illustrate the usage of the package through the analysis of simulated and real data, where results provided by alternative approaches are considered for comparison purposes. In particular, benefits of the implemented tests relative to ordinary multiple comparison procedures are highlighted. Practical recommendations are given.The authors acknowledge financial support from the Grant PID2020-118101GB-I00, Ministerio de Ciencia e Innovación.Peer reviewe

Digital.CSIC

Novas contribucións á análise estatística de datos de alta dimensón baixo dependencia

Author: Cousido Rocha Marta
Publication venue
Publication date: 2018
Field of study

Multiple comparison procedures (Dudoit and van der Laan, 2008) are needed when one performs several tests in a simultaneous way, since they avoid the problem of an inflated type I error rate. Traditional methods for multiple comparisons aim to control the familywise error rate (FWER) or the false discovery rate (FDR) at a pre-specified level. However, such procedures may exhibit a low power when the effects are weak or moderate. Carvajal-Rodríguez et al. (2009) introduced a new criterion, called SGoF for multiple comparisons, with the advantage of reporting a reasonable power which increases as the number of tests grow. The SGoF method starts by focusing on the p-values below a given threshold, and makes a decision which guarantees that the number of false positives is smaller than the number of false negatives with large probability 1-alpha, where alpha is fixed in advance (de Uña-Álvarez, 2012). Since no bound is imposed on the FWER or the FDR, the SGoF criterionresults in a powerful statistical procedure. Like many other multiple testing procedures, the SGoF method assumes that the p-values are uniformly distributed when all the true hypotheses are true. However, this assumption fails in the case of discrete distributions (Gilbert, 2005), leading to a remarkable loss of power. An adaptation of SGoF method to the discrete case was proposed in Castro-Conde et al. (2015). A goal will be to adapt SGoF method to the case of dependent tests. Another objective in this line of research is to introduce adjusted p-values for the several existing versions of the SGoF method. Finally, we include under the umbrella of multiple comparison procedures a goal which has to do with the comparison of a large number of densities. Zhan and Hart (2012) introduced a test statistic for this low-sample, large-dimension problem, in the independent case. However, in practice dependence in the samples is expected, and therefore a new analysis of such a test is needed. This is what it is pursued in this objective. The statistic of Zhan and Hart (2012) is a U-statistic; exist in the literature results of asymptotic normality under mixing conditions (see Dehling and Wendler, 2010, and references). For introduce a correct estimation of the variance of the U-statistic of Zhan and Hart (2012) in the practice we consider different methods adapted to dependent, for example the dependent multiplier bootstrap (Bücher and Kojadinovic 2016b). References: - Bücher, A., Kojadinovic, I. (2016b). Depedent multiplier bootstrap for non-degenerated U-statistics under mixing conditions with applications. Journal of Statistical Planning and Inference 170, 83-105. - Carvajal-Rodríguez, A., de Uña-Álvarez, J., Rolan-Álvarez, E. (2009). A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinformatics, 10, 209. - Castro-Conde, I., Doehler, S., de Uña-Álvarez, J. (2015) An extended SGoF multiple testing method for discrete data. Statistical Methods in Medical Research, in press. DOI: 10.1177/0962280215597580 - Dehling, H., Wendler, M. (2010). Central limit theorem and the bootstrap for U-statistics of strongly mixing data. Journal of Multivariate Analysis 101, 126-137. - Dudoit, S., van der Laan, M. (2008). Multiple testing procedures with applications to Genomics. Springer. - Gilbert, P.G. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics. Journal of the Royal Statistical Society - Series C, 54, 143-158. - Zhan, D., Hart, J. (2012). Testing equality of a large number of densities. Biometrika, 99, 1-17.Los procedimientos de comparaciones múltiples (Dudoit and van der Laan, 2008) son necesarios cuando se realizan varias pruebas de forma simultánea, ya que esto evita el problema de una tasa de error tipo I inflada. El objectivo de los métodos de comparaciones múltiples tradicionales es controlar la tasa de error familywise (FWER) o la tasa de falso descubrimiento (FDR) para un nivel prefijado. Sin embargo tales procedimientos pueden exhibir una baja potencia cuando los efectos son débiles o moderados. Carvajal-Rodríguez et al. (2009) introdujeron un nuevo criterio, llamado SGoF, con la ventaja de proporcionar una potencia razonable. El método SGoF comienza centrándose en los p-valores por debajo de un determinado umbral, y toma una decisión que garantiza que el número de falsos positivos es menor que el número de falsos negativos con alta probabilidad 1-alpha, donde alpha es fijado de antemano (de Uña-Álvarez, 2012). Puesto que no se le impone, a priori, ninguna cota a la FWER o a la FDR, el criterio SGoF es un procedimiento estadístico potente. Al igual que muchos otros procedimientos de contrastes múltiples, el método SGoF supone que los p-valores se distribuyen de manera uniforme cuando todas las hipótesis nulas son ciertas. Sen embargo, este supuesto no es cierto en el caso de distribucións discretas (Gilbert, 2005), lo que lleva a una notable pérdida de potencia. Una adaptación del método SGoF para el caso discreto fue propuesta en Conde et al. (2015). Un objectivo será adaptar el método SGoF al caso de pruebas dependientes. Otro de los objectivos de esta línea de investigación es introducir los p-valores ajustados para varias versiones existentes del método SGoF. Por último, en el marco de procedimientos de comparación múltiple se inclúe un objectivo que tiene que ver con la comparación de un gran número de densidades. Zhan y Hart (2012) introdujeron una prueba estadística para este problema de alta dimensión y muestra pequeña, en el caso de independencia. Sin embargo en la práctica es esperable la dependencia en las muestras, y por lo tanto se necesita un nuevo estadístico de contraste. Esto es lo que persigue este objectivo. El estadístico de Zhan y Hart (2012) es un U-estadístico; existen en la literatura resultados de normalidad asintótica bajo condiciones de dependencia tipo mixing (ver Dehling y Wendler, 2010, y referencias). Para introducir una correcta estimación de la varianza del U-estadístico de Zhan y Hart (2012) en la practica consideraremos distintos métodos adaptados a dependencia, como por ejemplo el multiplicador bootstrap dependiente (Bücher y Kojadinovic 2016b). Referencias: - Bücher, A., Kojadinovic, I. (2016b). Depedent multiplier bootstrap for non-degenerated U-statistics under mixing conditions with applications. Journal of Statistical Planning and Inference 170, 83-105. - Carvajal-Rodríguez, A., de Uña-Álvarez, J., Rolan-Álvarez, E. (2009). A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinformatics, 10, 209. - Castro-Conde, I., Doehler, S., de Uña-Álvarez, J. (2015) An extended SGoF multiple testing method for discrete data. Statistical Methods in Medical Research, in press. DOI: 10.1177/0962280215597580 - Dehling, H., Wendler, M. (2010). Central limit theorem and the bootstrap for U-statistics of strongly mixing data. Journal of Multivariate Analysis 101, 126-137. - Dudoit, S., van der Laan, M. (2008). Multiple testing procedures with applications to Genomics. Springer. - Gilbert, P.G. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics. Journal of the Royal Statistical Society - Series C, 54, 143-158. - Zhan, D., Hart, J. (2012). Testing equality of a large number of densities. Biometrika, 99, 1-17.Os procedementos de comparacións múltiples (Dudoit and van der Laan, 2008) son necesarios cando se realizan varias probas de forma simultánea, xa que isto evita o problema dunha tasa de erro tipo I inflada. O obxectivo dos métodos de comparacións múltiples tradicionais é controlar a tasa de erro familywise (FWER) ou a tasa de falso descubrimento (FDR) para un nivel prefixado. Porén tales procedementos poden exhibir unha baixa potencia cando os efectos son débiles ou moderados. Carvajal-Rodríguez et al. (2009) introduciron un novo criterio, chamado SGoF, coa vantaxe de proporcionar unha potencia razoable. O método SGoF comenza centrándose nos p-valores por debaixo dun determinado umbral, e toma unha decisión que garante que o número de falsos positivos é menor que o número de falsos negativos con alta probabilidade 1-alpha, onde alpha é fixado de antemán (de Uña-Álvarez, 2012). Posto que non se lle impón, a priori, ningunha cota á FWER ou á FDR, o criterio SGoF é un procedemento estatístico moi potente. O igual que moitos outros procedementos de contrastes múltiples, o método SGoF supón que os p-valores se distribúen de xeito uniforme cando todas as hipóteses nulas son certas. Porén, este suposto non é certo no caso de distribucións discretas (Gilbert, 2005), o que leva a unha notable perda de potencia. Unha adaptación do método SGoF para o caso discreto foi proposta en Conde et al. (2015). Un obxectivo será adaptar o método SGoF ao caso de probas dependentes. Outro dos obxectivos desta liña de investigación é introducir os p-valores axustados para varias versións existentes do método SGoF. Por último, no marco de procedementos de comparación múltiple inclúese un obxectivo que ten que ver coa comparación dun gran número de densidades. Zhan e Hart (2012) introduxeron unha proba estatística para este problema de alta dimensión e mostra pequena, no caso de independencia. Porén na práctica é esperable a dependencia nas mostras, e polo tanto necesítase un novo estatístico de contraste. Isto é o que persigue este obxectivo. O estatístico de Zhan e Hart (2012) é un U-estatístico; existen na literatura resultados de normalidade asintótica baixo condicións tipo mixing (ver Dehling e Wendler, 2010, e referencias). Para introducir unha correcta estimación da varianza do U-estatístico de Zhan e Hart (2012) na práctica consideraremos métodos adaptados a dependencia, como por exemplo o multiplicador bootstrap dependente (Bücher e Kojadinovic, 2016b). Referencias: - Bücher A., Kojadinovic, I. (2016b). Dependent multiplier bootstrap for non-degenerated U-statistics under mixing conditions with applications. Journal of Statistical Planning and Inference 170, 83-105 - Carvajal-Rodríguez, A., de Uña-Álvarez, J., Rolan-Álvarez, E. (2009). A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinformatics, 10, 209. - Castro-Conde, I., Doehler, S., de Uña-Álvarez, J. (2015) An extended SGoF multiple testing method for discrete data. Statistical Methods in Medical Research, in press. DOI: 10.1177/0962280215597580 - Dehling, H., Wendler, M. (2010). Central limit theorem and the bootstrap for U-statistics of strongly mixing data. Journal of Multivariate Analysis 101, 126-137. - Dudoit, S., van der Laan, M. (2008). Multiple testing procedures with applications to Genomics. Springer. - Gilbert, P.G. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics. Journal of the Royal Statistical Society - Series C, 54, 143-158. - Zhan, D., Hart, J. (2012). Testing equality of a large number of densities. Biometrika, 99, 1-17.Ministerio de Economía y Competitividad de España | Ref. BES-2015-074958European Social FundMinisterio de Economía y Competitividad de España | Ref. MTM2014-55966-PMinisterio de Economía, Industria y Competitividad de España | Ref. MTM2017-89422-PXunta de Galicia. Consellería de Cultura, Educación e Ordenación Universitaria | Ref. ED431C 2016/040Universidade de Vig

Investigo

A novel statistical approach to deal with spatial bias in maturity ogive estimation

Author: Silva Cristina
Cerviño Santiago
Mendes Hugo
Silva Andreia V.
Pennino Maria Grazia
Cousido-Rocha Marta
Izquierdo Francisco
Martínez-Minaya Joaquín
Sainza-Sousa María del Carmen
Publication venue
Publication date: 25/03/2024
Field of study

https://cdnsciencepub.com/about/policies/publishing-policyThe proportion of mature fish at length is one of the most important population attributes when evaluating reproductive potential for fish stock assessment purposes. Bias in maturity ogive parameters can lead to fishery management decisions based on misspecified biological reference points. These parameters can vary spatially and temporally, and this variability should be understood and included in the assessment models. However, integrating this variability becomes challenging when specific spatial-dependent ogives cannot be used in the stock assessment model. Hence, this study proposes a novel use of a multivariate response Bayesian regression model, employing an integrated nested Laplace approximation to estimate a single global maturity ogive using data from various spatial areas. This model explicitly accounts for differences in the sampling process and combines in formation from different areas to estimate shared maturity ogive parameters using joint-likelihood procedures. The model is applied to the European hake stock in ICES (International Council for the Exploration of the Sea) Divisions 27.8.c and 27.9.a, serving as a practical guide. In this model, we have considered different predictors to handle the relationship between the probability of being mature and the length and year covariates. Our results suggest that the logistic formulation correctly captures the relationship between the probability of being mature and length. For year variability, including a year factor covariate or year random effect in the predictor model produces similar values of goodness of fit measures. Copyright © 2024 Canadian Science PublishingThis research was supported by the European Regional Development Fund (ERDF) and Ministerio de Ciencia, Innovación y Universidades - Agencia Estatal de Investigación (grant No. RTI2018-099868-B-I00, IMPRESS), European Union-Next Generation EU. Componente 3. Inversión 7. CONVENIO ENTRE EL MINISTERIO DE AGRICULTURA, PESCA, Y ALIMENTACIÓN Y LA AGENCIA ESTATAL CONSEJO SUPERIOR DE INVESTIGACIONES CIENTÍFICAS M.P. -A TRAVÉS DEL INSTITUTO ESPAÑOL DE OCEANOGRAFÍA- PARA IMPULSAR LA INVESTIGACIÓN PESQUERA COMO BASE PARA LA GESTIÓN PESQUERA SOSTENIBLE. Eje4, FishClim: Conocimiento científico para la adaptación al cambio climático del sector pesquero español (grant No. MAP2021-04) and Eje6, Math4Fish: Nuevas herramientas para el modelado matemático en el asesoramiento científico de pesquerías españolas (grant No. MAP2021-06), GAIN [Agencia Gallega de Innovación] - Xunta de Galicia (grant No. IN607A 2022/04, GRC-MERVEX). Finally, MGP thanks the Generalitat Valenciana (grant No. CIAICO/2022/165).Peer reviewe

Digital.CSIC

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository

Appropriate Similarity Measures for Author Cocitation Analysis

Author: Waltman L.R.
Eck N.J.P. van
Publication venue
Publication date
Field of study

We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authorsâ€™ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

Research Papers in Economics

Dispelling the Myths Behind First-author Citation Counts

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2006
Field of study

We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

E-LIS

Author Index

Author: Author Index
Publication venue
Publication date
Field of study

Nao informado

koamabayili/VECTRON-author-checklist: VECTRON author checklist

Author: koamabayili
Publication venue
Publication date: 19/04/2022
Field of study

We have done our best to complete the author checklist relating to the use of animals in the hut study. Note that the objective for the hut study was to evaluate the IRS treatment applications for residual efficacy against Anopheles mosquitoes, including the local An. coluzzii mosquito population. Cows were only used to attract mosquitoes into the huts and no tests were carried out directly on the cows. The author checklist is intended for use with studies where experiments are carried out on animals, which is why we have had such difficulty in completing this for the hut study, as many of the questions do not relate to how the cows were used

ZENODO