1,721,054 research outputs found

    A non linear regression model for time series with heteroskedastic conditional variance

    No full text
    Il lavoro presenta una metodologia per l’applicazione di modelli non lineari all’analisi delle serie storiche con varianza condizionale eteroschedastica. La metodologia è applicata a serie simulate da processi AR(2) con disturbi ARCH(1) e GARCH(1,1)

    A non linear and non parametric approach for ground level air pollutants forecasting

    No full text
    One of the main concerns in air quality management is to forecast pollutant concentrationboth to satisfy needs of public information, to predict air quality indexes and to prevent excessivepollutants concentration having negative effects on vegetation and human health. In someregions, forecasting high concentrations lead to public warning and emergency traffic restrictionsaimed at reducing pollution emission due to car fuel. In this work we describe a non parametricand nonlinear predictive model for ground level pollutants concentration. This model developsapproximated confidence intervals with heteroskedastic conditional variance taking as inputspast values of the pollutants and, eventually, covariates such as meteorological factors orpollutant precursors. In this work we present the model and results on a real data setaapplication taking the point of view that most of the necessary information for prediction iscontained in the series itself. However, the theory may be extended straightforward to inputvariables like rain and temperature. In the application we consider several series of daily valuesof ground level air pollutants and we perform short term forecasting. The model may be also usedfor long term forecasting by considering, for example, the series of weekly or monthly averages

    Kohonen networks and the influence of training on data structures

    No full text
    In this paper Kohonen feature map is applied to the so-called two-spiral problem. Even if this network is unsupervised, the results indicate that the ability to classify or visualize the data structure depends on the training parameters. The example shows, therefore, that the network self-organization can be limited and the choices of the researcher can strongly affect the network output

    On the criterion of dynamic time warping for computing the dissimilarity between time series

    No full text
    Nel presente studio si propone un criterio per il calcolo della dissimilarità tra serie storiche - anche di diversa lunghezza - che considera, oltre ai singoli valori, sia la forma della sequenza dei punti, sia la variabilità sull’asse temporale dovuta al possibile diverso allineamento delle serie. Una semplice distanza, ad esempio una metrica della classe di Minkowki, non è in grado di riconoscere come simili due serie con andamento identico ma non allineate sull’asse temporale. D’altra parte, se i valori sono affetti da “rumore”, il semplice time warping tende ad attribuire tutte le differenze fra i punti allo slittamento sull’asse dei tempi. L’algoritmo proposto ottiene in prima analisi una stima smooth dei punti di ogni serie - attraverso splines - e poi effettua il warping su questi valori. Viene mostrata un’applicazione su carte di controllo simulate

    Clustering with latent variables

    No full text
    Besides continuous variables, binary indicators on ICT(Information and Communication Technologies) infrastructures andutilities are usually collected in order to evaluate the qualityof a public company and to define the policy priorities. In thispaper we face the problem of clustering public organizations byassuming that these binary attributes are generated from latentcontinuous variables and by estimating the scores of the latentvariables. In economics, these variables are called utilityfunctions and the assumption is that the binary attributes (whichmay be, for example, the presence or the absence of a publicservice or a public utility) are determined by the crossing of acertain threshold in these functions. To compare the proposedclustering approach with the latent class mixture modelling asimplemented in the Latent Gold package we simulate data from asetting where the true group membership is known. Then, we presenta cluster analysis of the Emilia-Romagna municipalities, based ona set of back office and front office indicators, thatdemonstrates the usefulness of the proposed method as a keysupport for policy makers

    Searching for structure in air pollutants concentration measurements

    No full text
    When studying air pollution measurements at different sites in a spatial area, we may search for a typical pattern,common to all curves, describing the underlying air pollution process in a pre-specified period. Another area ofinterest to support local authorities in air quality management may be the classification of the different sites inhomogeneous clusters and the group ranking that follows. Yet, there is variation in both amplitude and dynamicsamong the air pollutant concentrations measured at the different monitoring stations. Analyzing such measurements,where the basic unit of information is the entire observed process rather than a string of numbers, involvesfinding the time shifts or the warping functions among curves. The analysis is much more complicated if weconsider a multivariate process, that is, vector-valued air pollutant measurements. Following our previous workwhere an improved dynamic time-warping algorithm has been developed, especially in the multivariate case, andused both for classifying functional data and estimating the structural mean of a sample of curves, we analyzed themeasurements of some air pollutants in Emilia Romagna (northern Italy). In addition, for the univariate analyses,we applied the self-modeling warping function approach, which is also convenient for these data. Indeed, thismethod was found to be model-free and enough flexible to capture very complex and highly non-linear patterns

    Multivariate outliers detection with Kohonen networks: an useful tool for routine exploration of large data sets

    No full text
    In this article we are considering the exploratory graphical approach to multivariate outliers detection based on Kohonen networks (Kohonen, 1982, 1995). These networks, generally known as self-organising maps (SOM), are able to find interesting low-dimensional projections of high-dimensional data. The utility of the SOM based strategy, especially for Statistical Offices, in controlling the quality of data and finding multidimensional outliers, arises from a number of reasons: it is an easy-to interpret tool for routine exploration of large data set, it can be used in every context, without the specification of an underlying model and it requires very low computational costs.An example on a real data set shows that SOM can be expected to work reasonably well in visualising multivariate outliers. In particularly, outliers identified are in a general agreement with those detected by other well-known statistical procedures such as factor analysis and k-means cluster analysis. The SOM is also shown to be a robust method, since any substantial difference in the qualitative behaviour of the algorithm, due to choice of either alternative neighbourhood functions or differently sized maps, is empirically observed

    Facing multicollinearity in data mining

    No full text
    Il presente studio riguarda il problema della scelta di un modello di regressione non lineare che si presenta nel data mining quando la funzione che lega una variabile dipendente ad un pluralità di variabili esplicative non è nota ma deve essere desunta dai dati. Viene mostrato come, in presenza di multicollinearità, la scelta del modello non possa essere basata unicamente sull’errore quadratico od indici ad esso collegati (ad esempio, AIC, BIC/SBC), in quanto alcuni modelli che utilizzano l’algoritmo di backfitting sono soggetti a grande instabilità ed arbitrarietà nella scelta delle funzioni di base. Il comportamento dei più noti metodi non lineari basati sia sulla subset selection sia sulla proiezione delle variabili, in presenta di multicollinearità, viene illustrato attraverso un esempio numerico

    On multicollinearity and concurvity in some nonlinear multivariate models

    No full text
    Recent developments of multivariate smoothing methods provide a rich collection of feasible models for nonparametric multivariate data analysis. Among the most interpretable are those with smoothed additive terms. Construction of various methods and algorithms for computing the models have been the main concern in literature in this area. Less results are available on the validation of computed fit, instead, and many applications of nonparametric methods end up in computing and comparing the generalized validation error or related indexes. This article reviews the behavior of some of the best known multivariate nonparametric methods, based on subset selection and on projection, when (exact) collinearity or multicollinearity (near collinearity) is present in the input matrix. It shows the possible aliasing effects in computed fits of some selection methods and explores the properties of the projection spaces reached by projection methods in order to help data analysts to select the best model in case of ill conditioned input matrices. Two simulation studies and a real data set application are presented to illustrate further the effects of collinearity or multicollinearity in the fit

    Pre-processing and feature extraction in radial basis functions networks

    No full text
    Although neural networks have been born in engineering field, they are actually receiving a lot of attention among the statisticians. As a matter of fact, neural networks can be viewed as computational models very similar to statistical models that can be applied on several types of real data set. In this respect, as it is done in statistics, an important step of feature extraction and data transformation should be added to the learning and prediction phases of a neural network. The aim of this paper is to show the influence of this phase, called pre-processing, in radial basis function networks, on a real data set. The success of these networks, with and without data pre-processing, is measured by the discrimination rule and the generalisation to unobserved pattern. The performances of radial basis functions networks are also compared with the results obtained by the discriminant analysis, on the same data set
    corecore