1,720,984 research outputs found

    Minimax estimation in linear models with unknown finite alphabet design

    No full text
    We provide minimax theory for joint estimation of F and ω in linear models Y=Fω+Z where the parameter matrix ω and the design matrix F are unknown but the latter takes values in a known finite set. This allows to separate F and ω, a task which is not doable, in general. We obtain in the noiseless case, i.e., Z=0, stable recovery of F and ω from the linear model. Based on this, we show for Gaussian error matrix Z that the LSE attains minimax rates for the prediction error for Fω. Notably, these are exponential in the dimension of one component of Y. The finite alphabet allows estimation of F and ω itself and it is shown that the LSE achieves the minimax rate. As computation of the LSE is not feasible, an efficient algorithm is proposed. Simulations suggest that this approximates the LSE well

    Minimax estimation in linear models with unknown finite alphabet design

    No full text
    We provide minimax theory for joint estimation of F and ω in linear models Y=Fω+Z where the parameter matrix ω and the design matrix F are unknown but the latter takes values in a known finite set. This allows to separate F and ω, a task which is not doable, in general. We obtain in the noiseless case, i.e., Z=0, stable recovery of F and ω from the linear model. Based on this, we show for Gaussian error matrix Z that the LSE attains minimax rates for the prediction error for Fω. Notably, these are exponential in the dimension of one component of Y. The finite alphabet allows estimation of F and ω itself and it is shown that the LSE achieves the minimax rate. As computation of the LSE is not feasible, an efficient algorithm is proposed. Simulations suggest that this approximates the LSE well

    Multiscale quantile segmentation

    No full text
    We introduce a new methodology for analyzing serial data by quantile regression assuming that the underlying quantile function consists of constant segments. The procedure does not rely on any distributional assumption besides serial independence. It is based on a multiscale statistic, which allows to control the (finite sample) probability for selecting the correct number of segments S at a given error level, which serves as a tuning parameter. For a proper choice of this parameter, this tends exponentially fast to the true S, as sample size increases. We further show that the location and size of segments are estimated at minimax optimal rate (compared to a Gaussian setting) up to a log-factor. Thereby, our approach leads to (asymptotically) uniform confidence bands for the entire quantile regression function in a fully nonparametric setup. The procedure is efficiently implemented using dynamic programming techniques with double heap structures, and software is provided. Simulations and data examples from genetic sequencing and ion channel recordings confirm the robustness of the proposed procedure, which at the same hand reliably detects changes in quantiles from arbitrary distributions with precise statistical guarantees

    Multiscale quantile segmentation

    No full text
    We introduce a new methodology for analyzing serial data by quantile regression assuming that the underlying quantile function consists of constant segments. The procedure does not rely on any distributional assumption besides serial independence. It is based on a multiscale statistic, which allows to control the (finite sample) probability for selecting the correct number of segments S at a given error level, which serves as a tuning parameter. For a proper choice of this parameter, this tends exponentially fast to the true S, as sample size increases. We further show that the location and size of segments are estimated at minimax optimal rate (compared to a Gaussian setting) up to a log-factor. Thereby, our approach leads to (asymptotically) uniform confidence bands for the entire quantile regression function in a fully nonparametric setup. The procedure is efficiently implemented using dynamic programming techniques with double heap structures, and software is provided. Simulations and data examples from genetic sequencing and ion channel recordings confirm the robustness of the proposed procedure, which at the same hand reliably detects changes in quantiles from arbitrary distributions with precise statistical guarantees

    Statistical Methods for Minimax Estimation in Linear Models with Unknown Design Over Finite Alphabets

    No full text
    We provide a minimax optimal estimation procedure for F and 1 in matrix valued linear models Y = F 1 + Z, where the parameter matrix 1 and the design matrix F are unknown but the latter takes values in a known finite set. The proposed finite alphabet linear model is justified in a variety of applications, ranging from signal processing to cancer genetics. We show that this allows one to separate F and 1 uniquely under weak identifiability conditions, a task which is not doable, in general. To this end we quantify in the noiseless case, that is, Z = 0, the perturbation range of Y in order to obtain stable recovery of F and 1. Based on this, we derive an iterative Lloyd's type estimation procedure that attains minimax estimation rates for 1 and F for Gaussian error matrix Z. In contrast to the least squares solution the estimation procedure can be computed efficiently and scales linearly with the total number of observations. We confirm our theoretical results in a simulation study and illustrate it with a genetic sequencing data example

    Identifiability for Blind Source Separation of Multiple Finite Alphabet Linear Mixtures

    No full text
    We give under weak assumptions a complete combinatorial characterization of identifiability for linear mixtures of finite alphabet sources, with unknown mixing weights and unknown source signals, but known alphabet. This is based on a detailed treatment of the case of a single linear mixture. Notably, our identifiability analysis applies also to the case of unknown number of sources. We provide sufficient and necessary conditions for identifiability and give a simple sufficient criterion together with an explicit construction to determine the weights and the source signals for deterministic data by taking advantage of the hierarchical structure within the possible mixture values. We show that the probability of identifiability is related to the distribution of a hitting time and converges exponentially fast to one when the underlying sources come from a discrete Markov process. Finally, we explore our theoretical results in a simulation study. This paper extends and clarifies the scope of scenarios for which blind source separation becomes meaningful

    Testing for dependence on tree structures

    No full text
    Tree structures, showing hierarchical relationships and the latent structures between samples, are ubiquitous in genomic and biomedical sciences. A common question in many studies is whether there is an association between a response variable measured on each sample and the latent group structure represented by some given tree. Currently, this is addressed on an ad hoc basis, usually requiring the user to decide on an appropriate number of clusters to prune out of the tree to be tested against the response variable. Here, we present a statistical method with statistical guarantees that tests for association between the response variable and a fixed tree structure across all levels of the tree hierarchy with high power while accounting for the overall false positive error rate. This enhances the robustness and reproducibility of such findings
    corecore