1,720,994 research outputs found

    Meta-analysis of diagnostic cell-free circulating microRNAs for breast cancer detection

    Full text link
    BACKGROUND: Breast cancer (BC) is the most frequently diagnosed cancer among women. Numerous studies explored cell-free circulating microRNAs as diagnostic biomarkers of BC. As inconsistent and rarely intersecting microRNA panels have been reported thus far, we aim to evaluate the overall diagnostic performance as well as the sources of heterogeneity between studies. METHODS: Based on the search of three online search engines performed up to March 21(st) 2022, 56 eligible publications that investigated diagnostic circulating microRNAs by utilizing Real-Time Quantitative Reverse Transcription PCR (qRT-PCR) were obtained. Primary studies’ potential for bias was evaluated with the revised tool for the quality assessment of diagnostic accuracy studies (QUADAS-2). A bivariate generalized linear mixed-effects model was applied to obtain pooled sensitivity and specificity. A novel methodology was utilized in which the sample and study models’ characteristics were analysed to determine the potential preference of studies for sensitivity or specificity. RESULTS: Pooled sensitivity and specificity of 0.85 [0.81—0.88] and 0.83 [0.79—0.87] were obtained, respectively. Subgroup analysis showed a significantly better performance of multiple (sensitivity: 0.90 [0.86—0.93]; specificity: 0.86 [0.80—0.90]) vs single (sensitivity: 0.82 [0.77—0.86], specificity: 0.83 [0.78—0.87]) microRNA panels and a comparable pooled diagnostic performance between studies using serum (sensitivity: 0.87 [0.81—0.91]; specificity: 0.83 [0.78—0.87]) and plasma (sensitivity: 0.83 [0.77—0.87]; specificity: 0.85 [0.78—0.91]) as specimen type. In addition, based on bivariate and univariate analyses, miRNA(s) based on endogenous normalizers tend to have a higher diagnostic performance than miRNA(s) based on exogenous ones. Moreover, a slight tendency of studies to prefer specificity over sensitivity was observed. CONCLUSIONS: In this study the diagnostic ability of circulating microRNAs to diagnose BC was reaffirmed. Nonetheless, some subgroup analyses showed between-study heterogeneity. Finally, lack of standardization and of result reproducibility remain the biggest issues regarding the diagnostic application of circulating cell-free microRNAs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12885-022-09698-8

    Inference for multivariate and high-dimensional data in heterogeneous designs

    Full text link
    In the presented cumulative thesis, we develop statistical tests to check different hypotheses for multivariate and high-dimensional data. A suitable way to get scalar test statistics for multivariate issues are quadratic forms. The most common are statistics of Waldtype (WTS) or ANOVA-type (ATS) as well as centered and standardized versions of them. Also, [Pauly et al., 2015] and [Chen and Qin, 2010] used such quadratic forms to analyze hypotheses regarding the expectation vector of high-dimensional observations. Thereby, they had different assumptions, but both allowed just one respective two groups. We expand the approach from [Pauly et al., 2015] for multiple groups, which leads to a multitude of possible asymptotic frameworks allowing even the number of groups to grow. In the considered split-plot-design with normally distributed data, we investigate the asymptotic distribution of the standardized centered quadratic form under different conditions. In most cases, we could show that the individual limit distribution was only received under the specific conditions. For the frequently assumed case of equal covariance matrices, we also widen the considered asymptotic frameworks, since not necessarily the sample sizes of individual groups have to grow. Moreover, we add other cases in which the limit distribution can be calculated. These hold for homoscedasticity of covariance matrices but also for the general case. This expansion of the asymptotic frameworks is one example of how the assumption of homoscedastic covariance matrices allows widening conclusions. Moreover, assuming equal covariance matrices also simplifies calculations or enables us to use a larger statistical toolbox. For the more general issue of testing hypotheses regarding covariance matrices, existing procedures have strict assumptions (e.g. in [Muirhead, 1982], [Anderson, 1984] and [Gupta and Xu, 2006]), test only special hypotheses (e.g. in [Box, 1953]), or are known to have low power (e.g. in [Zhang and Boos, 1993]). We introduce an intuitive approach with fewer restrictions, a multitude of possible null hypotheses, and a convincing small sample approximation. Thereby, nearly every quadratic form known from the mean-based analysis can be used, and two bootstrap approaches are applied to improve their performance. Furthermore, it can be expanded to many other situations like testing hypotheses of correlation matrices or check whether the covariance matrix has a particular structure. We investigated the type-I-error for all developed tests and the power to detect deviations from the null hypothesis for small sample sizes up to large ones in extensive simulation studies

    Spatial and spatio-temporal regression modelling with conditional autoregressive random effects for epidemiological and spatially referenced data

    Full text link
    Regression models are suitable to analyse the association between health outcomes and environmental exposures. However, in urban health studies where spatial and temporal changes are of importance, spatial and spatio-temporal variations are usually neglected. This thesis develops and applies regression methods incorporating latent random effects terms with Conditional Autoregressive (CAR) structures in classical regression models to account for the spatial effects for cross-sectional analysis and spatio-temporal effects for longitudinal analysis. The thesis is divided into two main parts. Firstly, methods to analyse data for which all variables are given on an areal level are considered. The longitudinal Heinz Nixdorf Recall Study is used throughout this thesis for application. The association between the risk of depression and greenness at the district level is analysed. A spatial Poisson model with a latent CAR structured-Random effect is applied for selected time points. Then, a sophisticated spatio-temporal extension of the Poisson model results to a negative association between greenness and depression. The findings also suggest strong temporal autocorrelation and weak spatial effects. Even if the weak spatial effects are suggestive of neglecting them, as in the case of this thesis, spatial and spatio-temporal random effects should be taken into account to provide reliable inference in urban health studies. Secondly, to avoid ecological and atomic fallacies due to data aggregation and disaggregation, all data should be used at their finest spatial level given. Multilevel Conditional Autoregressive (CAR) models help to simultaneously use all variables at their initial spatial resolution and explain the spatial effect in epidemiological studies. This is especially important where subjects are nested within geographical units. This second part of the thesis has two goals. Essentially, it further develops the multilevel models for longitudinal data by adding existing random effects with CAR structures that change over time. These new models are named MLM tCARs. By comparing the MLM tCARs to the classical multilevel growth model via simulation studies, we observe a better performance of MLM tCARs in retrieving the true regression coefficients and with better fits. The models are comparatively applied on the analysis of the association between greenness and depressive symptoms at the individual level in the longitudinal Heinz Nixdorf Recall Study. The results show again negative association between greenness and depression and a decreasing linear individual time trend for all models. We observe once more very weak spatial variation and moderate temporal autocorrelation. Besides, the thesis provides comprehensive decision trees for analysing data in epidemiological studies for which variables have a spatial background

    Item response models for count data

    No full text
    Item response theory (IRT) represents a statistical framework within which responses to psychological tests can be modelled. A psychological test consists of a set of items (e.g., tasks to solve or statements to rate) to which a person taking the test responds. IRT assumes that responses are influenced by respondents' latent traits (e.g., personality traits or cognitive abilities) as well as by items' characteristics (e.g., difficulty). IRT models exist for a variety of different response types; the focus of this thesis lies on count responses. These can for example be generated by cognitive tests measuring idea fluency (counts: number of ideas), as process data during test taking (counts: number of clicks), or by reading proficiency assessments (counts: number of errors). Previously comparatively understudied, the field of count item response theory (CIRT) has witnessed a steady increase in interest in recent years. As a result, a number of new CIRT models have been proposed that address limitations of previously existing CIRT models, broadening the empirical applicability of CIRT. An important concern regarding modelling of counts is their dispersion: The most common distribution for counts, the Poisson distribution, assumes its mean equals its variance (so called equidispersion). By relying on the Poisson distribution, prominent CIRT models assume such equidispersion for responses (conditional on the latent trait(s)). Research has found this assumption empirically violated for some tests. A recently introduced unidimensional CIRT model using the Conway-Maxwell-Poisson (CMP) distribution instead, accommodates over- and underdispersed conditional responses as well. Nonetheless, the model maintains some of the restricting assumptions of previous models. Thus, even with new model proposals, CIRT still offers less modelling flexibility than IRT for other response types (such as binary responses). The present cumulative thesis aims to address three such gaps in the CIRT landscape. In the first article, I propose a unidimensional CIRT model with a conditional CMP response distribution which extends a previously proposed model through the inclusion of another item parameter (i.e., a discrimination parameter). As such a model has previously not been computable with existing estimation methods, I derive a maximum likelihood estimation procedure to this end, using the Expectation-Maximization (EM) algorithm. In the second article, we propose two extensions of this model which allow the inclusion of item- and person-specific covariates, respectively. Therewith, we allow to investigate explanations for differences between items and participants, respectively. Again, we provide corresponding estimation methods. In the third article, we generalize the unidimensional CIRT model proposed in the first article to a multidimensional count item response model framework, with a focus on exploratory models. We provide a respective estimation procedure, of which we additionally develop a lasso-penalized variant. The articles in this thesis are accompanied by the development of an R package that implements the proposed models and estimation methods

    Tree ensemble methods for ordinal prediction

    No full text
    Research questions and applications in the social and life sciences often involve ordinal response data. Student performance is assessed through ordinal grades, patients may express the perceived severity of their symptoms in ordinal levels and respondents of questionnaires may voice their political views through rating given statements. As such, the prediction of ordinal responses is relevant for many fields and can help, e.g., identifying which students may benefit from educational support systems. Traditionally, ordinal responses have been modeled through parametric models such as the proportional odds model. In light of the increasing quantities of data in these fields as well as the continued proliferation of machine learning (ML) methods, recent years saw the establishment of a new methodological stream of ordinal prediction methods based on ML. These methods promise high predictive performance for settings in which traditional parametric models may face difficulties (e.g., highly non-linear effects, high-dimensional data). However, many of these ML methods were originally not specifically tailored towards ordinal responses. Therefore, several extensions and adaptations of ML methods (particularly for tree-based methods) have been proposed to take ordinality into account. A particularly promising approach based on Random Forest (RF) is Ordinal Forest (OF; Hornung, 2019) which assigns numeric scores to the ordinal response categories and uses the scores to train a regression RF. To determine suitable score choices, OF performs a prior optimization step in which scores are optimized w.r.t. their predictive performance

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Sequence data mining in cognitive science

    No full text
    This thesis summarizes my research work over a five-year period from February 2020 to August 2024, including all of the papers I published during that time. As it is a cumulative, this thesis provides a concise overview of the contributed articles, omitting exhaustive results and instead referring to the original publications for full details. The main text integrates these publications into a coherent narrative, starting with basic concepts and providing background on the respective research areas. For an in-depth discussion of specific research findings, readers are recommended to consult the relevant articles directly. This thesis covers the field of sequence data mining (SDM) in cognitive science. Cognitive science increasingly examines sequence data to understand cognitive tasks involving ordered steps or elements, such as language processing, decision-making, and memory formation. SDM techniques are used to uncover patterns and models within sequential data. However, modern data mining techniques like deep learning, which have been broadly applied in other domains, have not been fully integrated into traditional cognitive science tasks. Moreover, cognitive science deals with complex sequence data, such as scanpaths and trajectories, which pose challenges that traditional pattern discovery methods and modern techniques have not successfully overcome. This thesis aims to extend SDM methods in cognitive science by focusing on the application of advanced techniques and the creation of new methods specifically tailored for handling these complex, domain-specific sequences. For instance, a machine learning-based pipeline for automatic scoring in diversity thinking tasks is proposed in one of my published papers, utilizing algorithms such as Random Forest, XGBoost, and Support Vector Regression. Another two papers introduce novel approaches to analysis scanpaths and handwritten trajectories. Through experimental validation in each paper, the newly developed methods demonstrate superior performance compared to existing approaches. Overall, my research advances SDM by integrating modern data mining techniques to address the challenges posed by complex sequential data in cognitive science

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore