1,721,036 research outputs found
Three-way compositional analysis of energy intensity in manufacturing
Both the scientific and political communities agree that significant reductions in CO2 emissions are necessary to limit the magnitude and extent of climate change and of course the energy efficiency is one of the most interesting issues analyzed by economists and policy makers within this debate. Different measures of energy efficiency in manufacturing can be defined but broadly this is the ratio of the production output to the energy input, usually disaggregated by industry. We create a global data set of energy intensity in manufacturing and analyze its structure by country, time and industry applying parallel factor analysis (CP). Since we are interested in the structure of the energy intensity, the absolute values are no more relevant for the analysis and the nature of this data set is compositional which requires specific adaptation of the methodology and suitable software
Robust methods for analysis of 3-way compositional data in R
The standard multivariate analysis addresses data sets represented as two
dimensional matrices. In recent years, an increasing number of application areas like
chemometrics, computer vision, econometrics and social network analysis involve analysis
of data sets that are represented as multidimensional arrays and multiway data analysis
becomes popular as an exploratory analysis tool. The most popular multiway models are
CANDECOMP/PARAFAC and TUCKER3. The standard algorithms for computing these models
are based on alternating least squares (ALS) and thus are vulnerable to the presence of
outlying data points. Even a single outliying data point can strongly influence the resulting
model and the conclusions based on it. Therefore robust methods are preferred. Additional
difficulties for the analysis present cases of compositional data which consist of vectors of
positive values summing to a unit, or in general, to some fixed constant for all vectors.
They appear as proportions, percentages, concentrations, absolute and relative frequencies.
We present a robust version of Tucker3 which is extended to handle compositional data.
This method, together with a robust version of PARAFAC, also with an option for handling
compositional data are implemented in an R package for analysis of multiway data sets
A Robust Tucker3 Model for Compositional Data
Double counting is inherent to the output concept, therefore it is preferable to use manufacturing value added (MVA) instead to measure the manufacturing production. While the issue of double counting in production statistics is successfully addressed by using MVA, commodity exchange in trade data is still measured as output. The relevance of value added has increased in the recent years due to the unbundling of the production process, where different stages of value chain take place in different countries. We want to represent the export statistics through value added to output ratio using data from international statistical databases. The data sets considered are organized by country, commodity or activity and year (activities are classified according to the International Standard Industrial Classification of all economic activities (ISIC)) and thus they are three-way compositional data.
Different methods exist for analysis of multi-way data and we choose Tucker3 because it provides a compromise between parsimonious and flexible models. The Tucker3 method as most of the N-way methods is based on alternating least squares (ALS) which makes it vulnerable to the presence of outliers in the data. Even a single outliying data point can
strongly influence the resulting model and the conclusions based on it. A robust version of Tucker3 was presented by Pravdova et al. (2001) but it suffers from two main deficiencies. First of all the robust initialization of the algorithm is based on MCD which will not work in high dimensions. And secondly, the method is not suitable for applying on compositional data. We propose to select the initial subset using robust PCA and to transform the compositional data applying ilr transformation (Egozcue et al., 2003). Furthermore, since to our knowledge there is no readily available software for computing robust Tucker3 models, we provide
implementation of the proposed algorithm in R. The method is compared to its competitors both in terms of its efficiency and the computational effort needed
Tri-PLS for compositional data
Compositional data (CoDa, [1] and [2]) consist of vectors of positive values summing to a unit, or in general to some fixed constant. They can often be found in many disciplines and appear as proportions, percentages, concentrations, absolute and relative frequencies. Unfortunatly, the constant-sum constraint that characterizes compositions is frequently disregarded or improperly incorporated into statistical modeling and a misleading interpretation of the results is given. Due to these specifications, several difficulties arise when dealing with CoDa. The first word of warning came already in 1897 from Karl Pearson who showed the dangers of underestimating spurious correlations. There are several approaches to incorporate CoDa into statistical modeling when it is not realistic to assume a multinomial distribution of the data. Based on the log-ratio transformations, Aitchison [1] proposed preprocessing the compositional data by means of log-ratio transformations, and successively analyzing them in a straightforward way by ’traditional’ methods. Following Aitchison’s approach, the high dimensionality of CoDa in many
scientific fields has encouraged the use of bilinear and trilinear decomposition models. Thus, in attempts to find
adequate low-dimensional descriptions of compositional variability, CoDa are collected into two or three-way
arrays ([3], [4], [5], [6], [7]). On the other side, Hinkle and Rayens [8] examined the problems that potentially occur when one performs a partial least squares (PLS) on compositional data. The principal goal of this talk is to extend the PLS regression to three-way compositional data, following the approach proposed by Bro [9] and Bro and al. [10]. Both Candecomp/Parafac (CP - [11] [12]) and Tucker3 [13]
models can be viewed as latent variables models extending principal component analysis to three-way data set. However, the most fundamental properties of PCA cannot be extended to these two models. PCA is an optimal representation of a two-way array with respect to the criteria of best low-rank approximation in least squares sense and the best approximation of the data within a joint low-dimensional subspace, while Tucker3 is only the best approximation of a three-way array within a joint low-dimensional subspace and CP is the best low-rank approximation in a least squares sense. The proposed extension of PLS to three-way compositional data is illustrated on real data sets and a software implementation will be available in the R package rrcovHD
Robust multiway analysis of compositional data in R
Multiway data analysis addresses complex data structures represented as multiway data sets where data have more than two modes. The most popular methods for modeling multiway data are CANDECOMP/PARAFAC and TUCKER3. The standard algorithms for computing these models are based on alternating least squares (ALS) and thus are vulnerable to the presence of outlying data points. A single outlier could render the obtained estimates useless. Therefore robust methods are preferred. We present an R package, rrcov3way, implementing a set of functions for the analysis of multiway data sets, including PARAFAC and TUCKER3 as well as their robust alternatives. An additional feature to handle compositional data is also included through ilr transformation. Unified diagnostics, plotting functions, data examples and a manual in the form of vignette complete the package. In the presentation, basic usage of the package will be illustrated by analyzing real data from the UNIDO INDSTAT database. The database contains data on key industrial statistics indicators for the manufacturing sectors. A subset containing I countries, J sectors and K years for some indicators as value added and output will be analyzed
Fitting the CANDECOMP-PARAFAC model to compositional data: a combined SWATLD-ALS algorithm.
Multidimensional compositional arrays require special analytical tools to be modeled. Specifically, the variation of the data can be captured by linear combinations of a defined number of parameters, capable of describing the complexity of the data. Usually these models are described as generalizations of Principal Component Analysis to higher order cases. Here the Candecomp/Parafac (CP) model is defined for compositional data contaminated with extreme observations by using a novel integrated SWATLD-ALS algorithm. Since the new procedure does not find a solution in the least square sense, it is expected to have a better performance in terms of sensitivity to outliers than ALS. However, due to the instability of its loss function, it should not be used alone
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
- …
