1,720,998 research outputs found
A procedure for the three-mode analysis of compositions
The Tucker3 model is one of the most widely used tools for factorial analysis of three-way data arrays. When orthogonal factors are extracted this model can be seen as a three-way PCA (principal component analysis). The Tucker3 model is characterized by extreme flexibility as it allows for the use of a different number of factors in each mode and it yields non-unique results. This adaptability makes the Tucker3 model extremely effective for decomposition and compression of data in many applications and fields. When this model is applied to vectors of non-negative values with a sum constraint all problems connected with the statistical analysis of compositions must be taken into consideration. Like other standard statistical techniques, this model cannot be directly applied. The aim of this paper is to present the theory behind the correct application of the Tucker3 model on compositional data and to describe the TUCKALS3 algorithm
Improving PARAFAC-ALS performance by initialization
The CANDECOMP/PARAFAC (CP) model (Carroll and Chang, 1970; Harshman, 1970)
is a trilinear decomposition which provides a low rank approximation of a three-way array in
a manner that preserves the multi-mode structure of the data. This is achieved by estimating
three sets of parameters, one for each dimension of the array, namely observation units, variables
and occasions. The CP model, however, due to an elevated number of degrees of freedom, can
be quite challenging to estimate. The most commonly used algorithm to t this model to the
data is PARAFAC-ALS. Comparative studies (Tomasi and Bro, 2006) have shown that this
procedure is, in general, more reliable and accurate than other algorithms proposed in the
literature. Nonetheless, it presents some non-trivial issues: it can be slow at converging and
may run into over-factoring and bad initialization degeneracies.
With respect to these setbacks, some of the alternative estimating procedures are able to perform
better than ALS, specically the Alternating Trilinear Decomposition (ATLD) and Self-weighted
Alternating Trilin-ear Decomposition (SWATLD) proposed by Wu et al. (1998) and Chen et al.
(2000) respectively. These algorithms are faster and less likely to be aected by over-factoring
and bad initial values. They present, however, diculties connected to their non-least squares
objective functions and for this reason they are seldom used in practice. In this work it is
suggested that a successful way to improve on ALS performance with respect to the presented
drawbacks is to initialize it with either ATLD or SWATLD steps, obtaining two integrated ALS
procedures. The eectiveness of this methodology is demonstrated by comparing the results of
standard ALS with the ones of the proposed integrated ALS variants in an extensive simulation
design
Principal balances for three-way compositions
Orthonormal balances resulting from a sequential binary partition (SBP) are one of the preferred tools for transforming compositional data in real space coordinates. The interpretability of this approach, however, greatly depends on the relevance of the SBP. SBPs can be chosen with the help of expert knowledge or with data-oriented methods, such as Principal Balances analysis. This results in an SBP whose balances maximize the explained variance in a subsequent manner. Principal balances can be calculated in an exact way or in an approximate fashion by using methods based on PCA for compositional data. In this work a method for the approximation of principal balances in the more complex case of three-way compositions is proposed. Here the additional difficulty given by the introduction of third mode variability is dealt with. In particular an algorithm based on the Tucker3 model is used which allows to keep the variability of the third dimension separate in the definition of principal balances
Algorithms for compositional tensors of third-order
The PARAFAC-ALS procedure for estimating CP parameters on tridimen-sional tensors is sensitive to data collinearity. This inefficiency is especially problematic if collinearity is paired with other issues such as data of large dimensions and difficulties in establishing correct model rank. When dealing with compositional data, i.e. positive values with a covariance bias, multicollinearity is inherent by definition, and it is preserved also if the data is transformed in log-ratios by means of the clr function. For this reason, alternative estimating procedures may be considered, such as INT and INT-2. These dual-step methods use the properties of the SWATLD and ATLD algorithms during initialization to overcome ALS inefficiency while still providing least squares results. Their comparative performance is tested in an extensive simulation study on collinear data
Three-way compositional analysis of energy intensity in manufacturing
Both the scientific and political communities agree that significant reductions in CO2 emissions are necessary to limit the magnitude and extent of climate change and of course the energy efficiency is one of the most interesting issues analyzed by economists and policy makers within this debate. Different measures of energy efficiency in manufacturing can be defined but broadly this is the ratio of the production output to the energy input, usually disaggregated by industry. We create a global data set of energy intensity in manufacturing and analyze its structure by country, time and industry applying parallel factor analysis (CP). Since we are interested in the structure of the energy intensity, the absolute values are no more relevant for the analysis and the nature of this data set is compositional which requires specific adaptation of the methodology and suitable software
A compositional methodology with external information for free time allocation preferences
The study of free-time activity preferences provides important information
on the characteristics and inclinations of specific demographics. Correct modeling
of these data can offer a useful insight in the definition of service demand and thus
help define effective social strategies.
Two important aspects need to be considered when analysing individual preferences
on free time. The first difficulty, typical of optimal resource allocation, concerns the
constrained nature of the data. There is a sum limit given by the total amount of
free time available and, as a consequence, assigned values are not free to vary independently.
Statistically this translates into a biased covariance structure. In this perspective the problem can be seen as compositional, which means that by definition these data only carry relative information and should be treated with ad-hoc tools.
A second challenge consists in discerning the role that external factors play in determining preferences without, however, forcing the assumption that all information
can be explained in this manner. In other words, there could be specific characteristics
of the respondents (such as gender, education, etc. . . ) that influence part of the information, and should be considered, but are not able to explain the preference
structure in its totality. This duality can be addressed with a methodology that combines together regression and multivariate analysis, proposed in literature
as Principal Component Analysis with external information.
The purpose of this work is thus to present an application that combines the compositional and external information approach to study free time allocation
Three–way compositional data: a multi–stage trilinear decomposition algorithm
The CANDECOMP/PARAFAC model is an extension of bilinear PCA
and has been designed to model three-way data by preserving their multidimensional
configuration. The Alternating Least Squares (ALS) procedure is the preferred
estimating algorithm for this model because it guarantees stable results. It
can, however, be slow at converging and sensitive to collinearity and over-factoring.
Dealing with these issues is even more pressing when data are compositional and
thus collinear by definition. In this talk the solution proposed is based on a multistage
approach. Here parameters are optimized with procedures that work better for
collinearity and over-factoring, namely ATLD and SWATLD, and then results are
refined with ALS
How to improve the Quality Assurance System of the Universities: a study based on compositional analysis
he National Agency for the Evaluation of Universities and Research (ANVUR) has for some decades defined the criteria for systematically evaluating student satisfaction. The analysis of these data presents various difficulties both in terms of data collection and analysis. The aim of this work is to propose Cande- comp/Parafac for a compositional analysis, which is able to capture the multidi- mensional aspects of the phenomenon taking into account its ordinal nature and the temporal characteristics of data collection
Statistical tools for student evaluation of academic educational quality
Measuring academic educational quality presents three major difficulties, typical
of all customer satisfaction and service quality studies: the use of subjective scales; the
ordinal nature of the data; and the multifold structure of satisfaction. In order to solve these
problems, principal component analysis (PCA) of compositional data is proposed in this
work. The core idea behind this methodology is to analyze by PCA the relative information
within the data rather than focusing on absolute scores. This approach is discussed in
comparison with a widely used Item Response Theory method (the Partial Credit Model) in
order to assess its merits, e.g. always identifying a coherent preference structure. Both
procedures were, thus, carried out on a real dataset collected with the 2013/14 ANVUR
questionnaire by L’Universita´ di Napoli-L’Orientale
Evaluation of Research Quality (VQR): a case study based on DINDSCAL for compositions
The eValuation of Research Quality (VQR) is one the most important
assessment process achieved by the National Agency for the Evaluation of Universities
and Research Institutes (ANVUR). Its main task is to provide information on
the status of the Italian research system assessing the performance of universities in
various scientific areas. The entities measured are made up of researchers, assistants,
first and second band professors, fixed-term professors and researchers, technology
and research executives. For the purposes, ”research products” as journal contributions,
volume contributions, and other types of scientific products are considered.
The basic evaluation criteria were defined by groups of experts (GEV) according to
the specific characteristics of each subject area and through a synthetic statement on
the products.
In this framework differences between GEV groups on a differential set of quality
judgment should be explained in terms of compositional dissimilarity matrices.
In literature the INDSCAL (Individual Differences Scaling) model is used to study
the individual differences in three-way data by doubly centered a set of matrices
of squared dissimilarity measures between a range of stimuli. A direct approach is
here preferred, defined DINDSCAL (Direct INDividual Differences SCALing), in
order to directly analyze simultaneous slices of dissimilarity matrices organized as compositional data.
The compositional aspect of data allow to understand, at a first glance, which is
the research product with the highest assessment compared to the remaining ones,
irrespective of the role and the type of institutions to which researchers belong.
Additionally, the DINDSCAL algorithm underlines the main divergencies made by
each GEV group in terms of research output classification
- …
