1,786 research outputs found
The use of a factorial design to evaluate the physical stability of tablets after storage under tropical conditions.
Simplivariate Models: Ideas and First Examples
One of the new expanding areas in functional genomics is metabolomics: measuring the metabolome of an organism. Data being generated in metabolomics studies are very diverse in nature depending on the design underlying the experiment. Traditionally, variation in measurements is conceptually broken down in systematic variation and noise where the latter contains, e.g. technical variation. There is increasing evidence that this distinction does not hold (or is too simple) for metabolomics data. A more useful distinction is in terms of informative and non-informative variation where informative relates to the problem being studied. In most common methods for analyzing metabolomics (or any other high-dimensional x-omics) data this distinction is ignored thereby severely hampering the results of the analysis. This leads to poorly interpretable models and may even obscure the relevant biological information. We developed a framework from first data analysis principles by explicitly formulating the problem of analyzing metabolomics data in terms of informative and non-informative parts. This framework allows for flexible interactions with the biologists involved in formulating prior knowledge of underlying structures. The basic idea is that the informative parts of the complex metabolomics data are approximated by simple components with a biological meaning, e.g. in terms of metabolic pathways or their regulation. Hence, we termed the framework ‘simplivariate models’ which constitutes a new way of looking at metabolomics data. The framework is given in its full generality and exemplified with two methods, IDR analysis and plaid modeling, that fit into the framework. Using this strategy of ‘divide and conquer’, we show that meaningful simplivariate models can be obtained using a real-life microbial metabolomics data set. For instance, one of the simple components contained all the measured intermediates of the Krebs cycle of E. coli. Moreover, these simplivariate models were able to uncover regulatory mechanisms present in the phenylalanine biosynthesis route of E. col
Choosing proper normalization is essential for discovery of sparse glycan biomarkers
Rapid progress in high-throughput glycomics analysis enables the researchers to conduct large sample studies. Typically, the between-subject differences in total abundance of raw glycomics data are very large, and it is necessary to reduce the differences, making measurements comparable across samples. Essentially there are two ways to approach this issue: row-wise and column-wise normalization. In glycomics, the differences per subject are usually forced to be exactly zero, by scaling each sample having the sum of all glycan intensities equal to 100%. This total area (row-wise) normalization (TA) results in so-called compositional data, rendering many standard multivariate statistical methods inappropriate or inapplicable. Ignoring the compositional nature of the data, moreover, may lead to spurious results. Alternatively, a log-transformation to the raw data can be performed prior to column-wise normalization and implementing standard statistical tools. Until now, there is no clear consensus on the appropriate normalization method applied to glycomics data. Nor is systematic investigation of impact of TA on downstream analysis available to justify the choice of TA. Our motivation lies in efficient variable selection to identify glycan biomarkers with regard to accurate prediction as well as interpretability of the model chosen.Viaextensive simulations we investigate how different normalization methods affect the performance of variable selection, and compare their performance. We also address the effect of various types of measurement error in glycans: additive, multiplicative and two-component error. We show that when sample-wise differences are not large row-wise normalization (like TA) can have deleterious effects on variable selection and prediction
In-line monitoring of controlled radical copolymerisation reactions with near infrared spectroscopy
Comments on three-way analyses used for batch process data
Recently, several papers have appeared concerning the use of three-way models for batch process data. In these papers a number of points are raised. This paper discusses some of these points and illustrates some pitfalls. More specifically, some theoretical aspects of using different three-way models for batch process data and some practical consequences are discussed. The topics of cross-validation and data preprocessing are also discussed. These issues will be discussed using small simulated examples and theoretical arguments. Copyright (C) 2000 John Wiley & Sons, Lt
SCREAM: A novel method for multi-way regression problems with shifts and shape changes in one mode
Some fields where calibration of multi-way data is required, such as hyphenated chromatography, can suffer of high inaccuracy when traditional N-PLS is used, due to the presence of shifts or peak shape changes in one of the modes. To overcome this problem, a new regression method for multi-way data called SCREAM (Shifted Covariates REgression Analysis for Multi-way data), which is based on a combination of PARAFAC2 and principal covariates regression (PCovR), is proposed. In particular, the algorithm combines a PARAFAC2 decomposition of the X array and a PCovR-like way of computing the regression coefficients, analogously to what has been described by Smilde and Kiers (A.K. Smilde and H.A.L. Kiers, 1999) in the case of other multi-way PCovR models. The method is tested on real and simulated datasets providing good results and performing as well or better than other available regression approaches for multi-way data. (C) 2013 Elsevier B.V. All rights reserved
Monitoring and diagnosing batch processes with multiway regression models
Multivariate statistical procedures for monitoring the behavior of batch processes are presented. A Mew type of regression, called multiway covariates regression, ir used Co Sind the relationship between the process variables and the quality variables of the final product. The three-way structure of the batch process data is modeled by means of a Tucker3 or a PARAFAC model. The only information needed is a historical data set of past successful batches. Subsequent new batches can be monitored using multivariate statistical process control charts. In this way the progress of the new batch can be tracked and possible faults can be easily detected. Further detailed information from the process can be obtained by interrogating the underlying model. Diagnostic tools, such as contribution plots of each of the variables to the observed deviation, are also developed. Finally, on-line predictions of the final quality variables can be monitored; providing an additional tool to see whether a particular batch will produce an out-of-spec product. These ideas are illustrated using simulated and real data of a batch polymerization reactio
Centering and scaling in component analysis
In this paper the purpose and use of centering and scaling are discussed in depth. The main focus is on two-way bilinear data analysis, but the results can easily be generalized to multiway data analysis. In fact, one of the scopes of this paper is to show that if two-way centering and scaling are understood, then multiway centering and scaling is quite straightforward. In the literature it is often stated that preprocessing of multiway arrays is difficult, but here it is shown that most of the difficulties do not pertain to three- and higher-way modeling in particular. It is shown that centering is most conveniently seen as a projection step, where the data are projected onto certain well-defined spaces within a given mode. This view of centering helps to explain why, for example, centering data with missing elements is likely to be suboptimal if there are many missing elements. Building a model for data consists of two parts: postulating a structural model and using a method to estimate the parameters. Centering has to do with the first part: when centering, a model including offsets is postulated. Scaling has to do with the second part: when scaling, another way of fitting the model is employed. It is shown that centering is simply a convenient technique to estimate model parameters for models with certain offsets, but this does not work for all types of offsets. It is also shown that scaling is a way to fit models with a weighted least squares loss function and that sometimes this change in objective function cannot be performed by a simple scaling step. Further practical. aspects of and alternatives to centering and scaling are discussed, and examples are used throughout to show that the conclusions in the paper are not only of theoretical interest but can have an impact on practical data analysis. Copyright (C) 2003 John Wiley Sons, Lt
- …
