1,720,990 research outputs found

    Torus Principal Component Analysis with an Application to RNA Structures

    No full text
    There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method with important properties that can be generally applied. There are two existing general methods: tangent space PCA and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the cyclic topology of the data space whereas, unlike geodesic PCA, our torus-PCA produces a variety of non-winding, non-dense descriptors. This is achieved by deforming tori into spheres and then using a variant of the recently developed principle nested spheres analysis. This PCA analysis involves a step of small sphere fitting and we provide an improved test to avoid overfitting. However, deforming tori into spheres creates singularities. We introduce a data-adaptive pre-clustering technique to keep the singularities away from the data. For the frequently encountered case that the residual variance around the PCA main component is small, we use a post-mode hunting technique for more fine-grained clustering. Thus in general, there are three successive interrelated key steps of torus-PCA in practice: pre-clustering, deformation, and post-mode hunting. We illustrate our method with two recently studied RNA structure (tori) data sets: one is a small RNA data set which is established as the benchmark for PCA and we validate our method through this data. Another is a large RNA data set (containing the small RNA data set) for which we show that our method provides interpretable principal components as well as giving further insight into its structure

    Torus Principal Component Analysis with an Application to RNA Structures

    No full text
    There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method with important properties that can be generally applied. There are two existing general methods: tangent space PCA and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the cyclic topology of the data space whereas, unlike geodesic PCA, our torus-PCA produces a variety of non-winding, non-dense descriptors. This is achieved by deforming tori into spheres and then using a variant of the recently developed principle nested spheres analysis. This PCA analysis involves a step of small sphere fitting and we provide an improved test to avoid overfitting. However, deforming tori into spheres creates singularities. We introduce a data-adaptive pre-clustering technique to keep the singularities away from the data. For the frequently encountered case that the residual variance around the PCA main component is small, we use a post-mode hunting technique for more fine-grained clustering. Thus in general, there are three successive interrelated key steps of torus-PCA in practice: pre-clustering, deformation, and post-mode hunting. We illustrate our method with two recently studied RNA structure (tori) data sets: one is a small RNA data set which is established as the benchmark for PCA and we validate our method through this data. Another is a large RNA data set (containing the small RNA data set) for which we show that our method provides interpretable principal components as well as giving further insight into its structure

    spaces

    No full text
    Abstract The task to write on data analysis on nonstandard spaces is quite substantial, with a huge body of literature to cover, from parametric to nonparametrics, from shape spaces to Wasserstein spaces. In this survey we convey simple (e.g., Fréchet means) and more complicated ideas (e.g., empirical process theory), common to many approaches with focus on their interaction with one‐another. Indeed, this field is fast growing and it is imperative to develop a mathematical view point, drawing power, and diversity from a higher level of image Surveying many non‐Euclidean statistical problems with ingenious solutions, we uncover new ones, keeping mathematicians, statisticians, computer and data scientists busy for a while.abstraction, for example, by introducing generalized Fréchet means. While many problems have found ingenious solutions (e.g., Procrustes analysis for principal component analysis [PCA] extensions on shape spaces and diffusion on the frame bundle to mimic anisotropic Gaussians), more problems emerge, often more difficult (e.g., topology and geometry influencing limiting rates and defining generic intrinsic PCA extensions). Along this survey, we point out some open problems, that will, as it seems, keep mathematicians, statisticians, computer and data scientists busy for a while. This article is categorized under: Statistical and Graphical Methods of Data Analysis \u0026gt; Analysis of High Dimensional DataDeutsche Forschungsgemeinschaft http://dx.doi.org/10.13039/501100001659Volkswagen Foundation http://dx.doi.org/10.13039/501100001663Felix‐Bernstein‐Institute for Mathematical Statistics in the Biosciences at the University of Göttinge

    Stability of the cut locus and a Central Limit Theorem for Fréchet means of Riemannian manifolds

    Full text link
    We obtain a central limit theorem for closed Riemannian manifolds, clarifying along the way the geometric meaning of some of the hypotheses in Bhattacharya and Lin’s Omnibus central limit theorem for Fréchet means. We obtain our CLT assuming certain stability hypothesis for the cut locus, which always holds when the manifold is compact but may not be satisfied in the non-compact case

    Diffusion means in geometric spaces

    No full text
    We introduce a location statistic for distributions on non-linear geometric spaces, the diffusion mean, serving as an extension and an alternative to the Fréchet mean. The diffusion mean arises as the generalization of Gaussian maximum likelihood analysis to non-linear spaces by maximizing the likelihood of a Brownian motion. The diffusion mean depends on a time parameter t, which admits the interpretation of the allowed variance of the diffusion. The diffusion t-mean of a distribution X is the most likely origin of a Brownian motion at time t, given the end-point distribution X. We give a detailed description of the asymptotic behavior of the diffusion estimator and provide sufficient conditions for the diffusion estimator to be strongly consistent. Particularly, we present a smeary central limit theorem for diffusion means and we show that joint estimation of the mean and diffusion variance rules out smeariness in all directions simultaneously in general situations. Furthermore, we investigate properties of the diffusion mean for distributions on the sphere Sm. Experimentally, we consider simulated data and data from magnetic pole reversals, all indicating similar or improved convergence rate compared to the Fréchet mean. Here, we additionally estimate t and consider its effects on smeariness and uniqueness of the diffusion mean for distributions on the sphere.</p
    corecore