1,720,999 research outputs found

    Determination of the number of components during mixture analysis using the Durbin-Watson criterion in the Orthogonal Projection Approach and in the SIMPLe-to-use Interactive Self-modelling Mixture Analysis approach

    No full text
    The Orthogonal Projection Approach (OPA) and the SIMPLe-to-use Interactive Self-modelling Mixture Analysis approach (SIMPLISMA) are widely employed during process monitoring to obtain concentration profiles and/or pure spectra of a mixture. In the first step of these methods, it is extremely important to select the right number of components present in the mixture. This selection is not always obvious, and in this paper, the Durbin-Watson criterion was applied to dissimilarity values in OPA and to purity values in SIMPLISMA as a tool for the decision of the number of components. It is shown that this yields more objective results than visual interpretation.</p

    Parallel pre-processing through orthogonalization (PORTO) and its application to near-infrared spectroscopy

    Full text link
    Data generated from spectroscopy may be deformed by artefacts due to a range of physical, chemical and environmental factors that are not of interest for the characterization of the samples under study. For example, data acquired by near-infrared (NIR) spectroscopy in the diffuse reflectance mode can be affected by light scattering. This artefact, if not reduced or removed by spectral pre-processing, can complicate the multivariate data analysis. However, different pre-processing approaches correct these effects in different ways. For example, differentiation can reveal underlying bands, while spectral normalization techniques such as standard normal variate (SNV) can correct for multiplicative and additive effects. Combining multiple pre-processing techniques can lead to better results. However, it is not feasible for a user to explore all possible combinations of pre-processing techniques. In the present work, a new pre-processing fusion approach, based on the framework of separating common and distinct components in multi-block multivariate data analysis, is demonstrated. The approach utilizes parallel and orthogonalized partial least squares (PO-PLS) regression for the parallel fusion of multiple pre-processing techniques applied to the same data. The results obtained on 4 different NIR spectroscopic data sets related to the assessment of fruit quality and used as benchmark are compared to those of the recently developed sequential pre-processing through orthogonalization (SPORT) approach: it is found that, in all the cases, the PO-PLS approach leads to slightly better performances. Furthermore, a clear understanding of the common and distinct information present in the data sets after each pre-treatment was obtained. Parallel pre-processing through orthogonalization (PORTO) can be seen as parallel boosting of multiple pre-processing techniques to improve model performances

    New data preprocessing trends based on ensemble of multiple preprocessing techniques

    Full text link
    Data generated by analytical instruments, such as spectrometers, may contain unwanted variation due to measurement mode, sample state and other external physical, chemical and environmental factors. Preprocessing is required so that the property of interest can be predicted correctly. Different correction methods may remove specific types of artefacts while still leaving some effects behind. Using multiple preprocessing in a complementary way can remove the artefacts that would be left behind by using only one technique. This article summarizes the recent developments in new data preprocessing strategies and specifically reviews the emerging ensemble approaches to preprocessing fusion in chemometrics. A demonstration case is also presented. In summary, ensemble preprocessing allows the selection of several techniques and their combinations that, in a complementary way, lead to improved models. Ensemble approaches are not limited to spectral data but can be used in all cases where preprocessing is needed and identification of a single best option is not easily done

    FRUITNIR-GUI: A graphical user interface for correcting external influences in multi-batch near infrared experiments related to fruit quality prediction

    Full text link
    Near infrared (NIR) spectroscopy is widely used for non-destructive prediction of fruit traits. Common traits such as dry matter (DM) and soluble solids contents (SSC) can be predicted with reliable accuracy. However, the main problem with NIR spectroscopy is that a model developed on one batch may not perform very well when tested on other batches. Reasons for that are the physical, chemical and environmental differences between the experiments performed in different batches. To deal with these issues, approaches such as variables selection, dynamic orthogonal projection (DOP) and transfer component analysis (TCA) can be used. However, the techniques are known but it is rarely possible for a new user or non-specialist to implement them in the practical situations. To overcome this limitation, for the first time, a graphical user interface-based toolbox (FRUITNIR-GUI) for basic chemometric data processing (regression and variable selection) is developed and presented. The GUI allows performing model adaption and maintenance in the context of multi-batch NIR spectroscopic experiments related to fruit. Furthermore, a case-study demonstrating its effectiveness in correcting for seasonality when predicting DM in apples is presented. The toolbox provides a push-button approach to build chemometric models of varying complexity for the characterization of fruit quality. Moreover, approaches such as variable selection and batch correction with DOP and TCA can improve the model performances on new batches. FRUITNIR-GUI can be freely downloaded at https://github.com/puneetmishra2/FRUITNIR and run using the password “welovenirs” (without quotation marks)

    Pre-processing ensembles with response oriented sequential alternation calibration (PROSAC): A step towards ending the pre-processing search and optimization quest for near-infrared spectral modelling

    Full text link
    Ensemble pre-processing is emerging as a potential tool to avoid the tiring pre-processing selection and optimization task in near-infrared (NIR) spectral modelling. Furthermore, differently pre-processed data may carry complementary information, hence, ensemble pre-processing may represent the best suited modelling option to extract all the useful information from differently pre-processed data. Recently, multi-block techniques such as sequential (SPORT) and parallel (PORTO) orthogonalized partial least squares regression were proposed to extract complementary information present in differently pre-processed data. Although such multi-block techniques allowed efficient modelling of differently pre-processed data blocks, depending on the approach, challenges related to choosing block order, parameter tuning, block scaling and optimization time requirements still must be dealt with. To cope with such issues, the present study proposes the use of a recently developed faster, block order independent and scale independent, multi-block data modelling technique called response-oriented sequential alternation (ROSA) to process the multi-block data generated by differently pre-processing the same NIR data. This new method is called PROSAC, i.e., pre-processing ensembles with ROSA calibration. The potential of the approach is demonstrated on five real NIR spectral datasets. Furthermore, as baselines for comparison, partial least squares regression was done on individually pre-processed data sets, and using two multi-block pre-processing fusion approaches, i.e., SPORT and PORTO. The ensemble pre-processing with ROSA achieved either better performance compared to the baseline methods or achieved comparable performance without the need to worry about the pre-processing order, the scaling of data after pre-processing and optimization time requirements. PROSAC can be considered as a general tool for the ensemble pre-processing for NIR data modelling

    Response oriented covariates selection (ROCS) for fast block order- and scale-independent variable selection in multi-block scenarios

    Full text link
    Multi-block datasets are widely met in the chemometrics domain, and several data fusion approaches have recently been proposed to treat them. Apart from exploratory and predictive modelling, a key task in this context is feature selection which involves finding key complementary variables across multiple data blocks that jointly provide a good explanation of the response variables, revealing the key variables of the system. In that direction, a new method called response-oriented covariate selection (ROCS) is proposed here. ROCS is a direct extension of the covariance selection (CovSel) approach to multi-block scenarios, where the choice is based on a competition between variables in different blocks, as is done in the response-oriented sequential alternation (ROSA) method. The uniqueness of the ROCS method is its simplicity, fast execution speed, insensitivity to block order and scale-invariance. The evaluation of ROCS is presented using several multi-block modelling cases and by comparison with other variable selection methods

    An evaluation of the PoLiSh smoothed regression and the Monte Carlo Cross-Validation for the determination of the complexity of a PLS model

    No full text
    A crucial point of the PLS algorithm is the selection of the right number of factors or components (i.e., the determination of the optimal complexity of the system to avoid overfitting). The leave-one-out cross-validation is usually used to determine the optimal complexity of a PLS model, but in practice, it is found that often too many components are retained with this method. In this study, the Monte Carlo Cross-Validation (MCCV) and the PoLiSh smoothed regression are used and compared with the better known adjusted Wold's R criterion.</p

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore