1,720,982 research outputs found

    Analysis and modeling of complex networks by means of computational intelligence techniques

    No full text
    In this thesis I explore the organizing principles between protein structure by means of their network representation, with the aid of machine learning and computational intelligence methodologies. The study is structured in two main parts. In the first part I investigate the structural properties of Protein Contact Networks (PCN) and compare them with several other biological and synthetic networks. I characterize PCNs by their heat diffusion properties, obtained with heat kernel analysis, and highlight critical differences with respect to the other analyzed networks. In particular, I observe heat subdiffusion on the PCNs topology. This peculiar character is also confirmed by the study of the correlation properties of random walks performed on the networks, analyzed via Multifractal Detrended Fluctuation Analysis. The second part of the thesis is mainly concerned with the problem of generating new networks that show similar spectral properties with respect to PCNs. A generative model is defined as a variant of the model proposed by Bartoli et al. in 2007, obtaining closer spectral properties. The samples generated through this generative model are subsequently improved by means of an evolutionary optimization scheme. As a result, the spectral distribution of the generated samples is nearly identical to the reference distribution calculated from the set of PCNs

    Multifractal characterization of protein contact networks

    No full text
    The multifractal detrended fluctuation analysis of time series is able to reveal the presence of long-range correlations and, at the same time, to characterize the self-similarity of the series. The rich information derivable from the characteristic exponents and the multifractal spectrum can be further analyzed to discover important insights into the underlying dynamical process. In this paper, we employ multifractal analysis techniques in the study of protein contact networks. To this end, initially a network is mapped to three different time series, each of which is generated by a stationary unbiased random walk. To capture the peculiarities of the networks at different levels, we accordingly consider three observables at each vertex: the degree, the clustering coefficient, and the closeness centrality. To compare the results with suitable references, we consider also instances of three well-known network models and two typical time series with pure monofractal and multifractal properties. The first result of notable interest is that time series associated to protein contact networks exhibit long-range correlations (strong persistence), which are consistent with signals in-between the typical monofractal and multifractal behavior. Successively, a suitable embedding of the multifractal spectra allows to focus on ensemble properties, which in turn gives us the possibility to make further observations regarding the considered networks. In particular, we highlight the different role that small and large fluctuations of the considered observables play in the characterization of the network topology.The multifractal detrended fluctuation analysis of time series is able to reveal the presence of long-range correlations and, at the same time, to characterize the self-similarity of the series. The rich information derivable from the characteristic exponents and the multifractal spectrum can be further analyzed to discover important insights into the underlying dynamical process. In this paper, we employ multifractal analysis techniques in the study of protein contact networks. To this end, initially a network is mapped to three different time series, each of which is generated by a stationary unbiased random walk. To capture the peculiarities of the networks at different levels, we accordingly consider three observables at each vertex: the degree, the clustering coefficient, and the closeness centrality. To compare the results with suitable references, we consider also instances of three well-known network models and two typical time series with pure monofractal and multifractal pro

    Spectral reconstruction of protein contact networks

    No full text
    In this work, we present a method for generating an adjacency matrix encoding a typical protein contact network. This work constitutes a follow-up to our recent work (Livi et al., 2015), whose aim was to estimate the relative contribution of different topological features in discovering of the unique properties of protein structures. We perform a genetic algorithm based optimization in order to modify the matrices generated with the procedures explained in (Livi et al., 2015). Our objective here is to minimize the distance with respect to a target spectral density, which is elaborated using the normalized graph Laplacian representation of graphs. Such a target density is obtained by averaging the kernel-estimated densities of a class of experimental protein maps having different dimensions. This is possible given the bounded-domain property of the normalized Laplacian spectrum. By exploiting genetic operators designed for this specific problem and an exponentially-weighted objective function, we are able to reconstruct adjacency matrices representing networks of varying size whose spectral density is indistinguishable from the target. The topological features of the optimized networks are then compared to the real protein contact networks and they show an increased similarity with respect to the starting networks. Subsequently, the statistical properties of the spectra of the newly generated matrices are analyzed by employing tools borrowed from random matrix theory. The nearest neighbors spacing distribution of the spectra of the generated networks indicates that also the (short-range) correlations of the Laplacian eigenvalues are compatible with those of real proteins

    On the long-term correlations and multifractal properties of electric arc furnace time series

    No full text
    In this paper, we study long-term correlations and multifractal properties elaborated from time series of three-phase current signals from an industrial electric arc furnace. Implicit sinusoidal trends are suitably detected by considering the scaling of the fluctuation functions. Time series are then filtered via a Fourier-based analysis to remove such strong periodicities. In the filtered time series we detected long-term, positive correlations. The presence of positive correlations is in agreement with the typical V–I characteristic (hysteresis) of the electric arc furnace, thus providing a sound physical justification for the memory effects found in the current time series. The multifractal signature is strong enough in the filtered time series to be effectively classified as multifractal

    A generative model for protein contact networks

    No full text
    In this paper, we present a generative model for protein contact networks (PCNs). The soundness of the proposed model is investigated by focusing primarily on mesoscopic properties elaborated from the spectra of the graph Laplacian. To complement the analysis, we also study the classical topological descriptors, such as statistics of the shortest paths and the important feature of modularity. Our experiments show that the proposed model results in a considerable improvement with respect to two suitably chosen generative mechanisms, mimicking with better approximation real PCNs in terms of diffusion properties elaborated from the normalized Laplacian spectra. However, as well as the other network models, it does not reproduce with sufficient accuracy the shortest paths structure. To compensate this drawback, we designed a second step involving a targeted edge reconfiguration process. The ensemble of reconfigured networks denotes further improvements that are statistically significant. As an important byproduct of our study, we demonstrate that modularity, a well-known property of proteins, does not entirely explain the actual network architecture characterizing PCNs. In fact, we conclude that modularity, intended as a quantification of an underlying community structure, should be considered as an emergent property of the structural organization of proteins. Interestingly, such a property is suitably optimized in PCNs together with the feature of path efficiency.In this paper, we present a generative model for protein contact networks (PCNs). The soundness of the proposed model is investigated by focusing primarily on mesoscopic properties elaborated from the spectra of the graph Laplacian. To complement the analysis, we also study the classical topological descriptors, such as statistics of the shortest paths and the important feature of modularity. Our experiments show that the proposed model results in a considerable improvement with respect to two suitably chosen generative mechanisms, mimicking with better approximation real PCNs in terms of diffusion properties elaborated from the normalized Laplacian spectra. However, as well as the other network models, it does not reproduce with sufficient accuracy the shortest paths structure. To compensate this drawback, we designed a second step involving a targeted edge reconfiguration process. The ensemble of reconfigured networks denotes further improvements that are statistically significant. As an important byproduct of our study, we demonstrate that modularity, a well-known property of proteins, does not entirely explain the actual network architecture characterizing PCNs. In fact, we conclude that modularity, intended as a quantification of an underlying community structure, should be considered as an emergent property of the structural organization of proteins. Interestingly, such a property is suitably optimized in PCNs together with the feature of path efficiency

    Information granules filtering for inexact sequential pattern mining by evolutionary computation

    No full text
    Nowadays, the wide development of techniques to communicate and store information of all kinds has raised the need to find new methods to analyze and interpret big quantities of data. One of the most important problems in sequential data analysis is frequent pattern mining, that consists in finding frequent subsequences (patterns) in a sequence database in order to highlight and to extract interesting knowledge from the data at hand. Usually real-world data is affected by several noise sources and this makes the analysis more challenging, so that approximate pattern matching methods are required. A common procedure employed to identify recurrent patterns in noisy data is based on clustering algorithms relying on some edit distance between subsequences. When facing inexact mining problems, this plain approach can produce many spurious patterns due to multiple pattern matchings on the same sequence excerpt. In this paper we present a method to overcome this drawback by applying an optimization-based filter that identifies the most descriptive patterns among those found by the clustering process, able to return clusters more compact and easily interpretable. We evaluate the mining system's performances using synthetic data with variable amounts of noise, showing that the algorithm performs well in synthesizing retrieved patterns with acceptable information loss

    Noise sensitivity of an information granules filtering procedure by genetic optimization for inexact sequential pattern mining

    No full text
    One of the most essential challenges in Data Mining and Knowledge Discovery is the development of effective tools able to find regularities in data. In order to highlight and to extract interesting knowledge from the data at hand, a key problem is frequent pattern mining, i.e. to discover frequent substructures hidden in the available data. In many interesting application fields, data are often represented and stored as sequences over time or space of generic objects. Due to the presence of noise and uncertainties in data, searching for frequent subsequences must employ approximate matching techniques, such as edit distances. A common procedure to identify recurrent patterns in noisy data is based on clustering algorithms relying on some edit distance between subsequences. However, this plain approach can produce many spurious patterns due to multiple pattern matchings on close positions in the same sequence excerpt. In this paper, we present a method to overcome this drawback by applying an optimization-based step lter that identifies the most descriptive patterns among those found by the clustering process, and allows to return more compact and easily interpretable clusters. We evaluate the mining systems performances on synthetic data in two separate cases, corresponding respectively to two different (simulated) sources of noise. In both cases, our method performs well in retrieving the original patterns with acceptable information loss.One of the most essential challenges in Data Mining and Knowledge Discovery is the development of effective tools able to find regularities in data. In order to highlight and to extract interesting knowledge from the data at hand, a key problem is frequent pattern mining, i.e. to discover frequent substructures hidden in the available data. In many interesting application fields, data are often represented and stored as sequences over time or space of generic objects. Due to the presence of noise and uncertainties in data, searching for frequent subsequences must employ approximate matching techniques, such as edit distances. A common procedure to identify recurrent patterns in noisy data is based on clustering algorithms relying on some edit distance between subsequences. However, this plain approach can produce many spurious patterns due to multiple pattern matchings on close positions in the same sequence excerpt. In this paper, we present a method to overcome this drawback by applying an optimization-based step lter that identifies the most descriptive patterns among those found by the clustering process, and allows to return more compact and easily interpretable clusters. We evaluate the mining systems performances on synthetic data in two separate cases, corresponding respectively to two different (simulated) sources of noise. In both cases, our method performs well in retrieving the original patterns with acceptable information loss

    Data-driven detrending of nonstationary fractal time series with echo state networks

    Full text link
    In this paper, we propose a novel data-driven approach for removing trends (detrending) from nonstationary, fractal and multifractal time series. We consider real-valued time series relative to measurements of an underlying dynamical system that evolves through time. We assume that such a dynamical process is predictable to a certain degree by means of a class of recurrent networks called Echo State Network (ESN), which are capable to model a generic dynamical process. In order to isolate the superimposed (multi)fractal component of interest, we define a data-driven filter by leveraging on the ESN prediction capability to identify the trend component of a given input time series. Specifically, the (estimated) trend is removed from the original time series and the residual signal is analyzed with the multifractal detrended fluctuation analysis procedure to verify the correctness of the detrending procedure. In order to demonstrate the effectiveness of the proposed technique, we consider several synthetic time series consisting of different types of trends and fractal noise components with known characteristics. We also process a real-world dataset, the sunspot time series, which is well-known for its multifractal features and has recently gained attention in the complex systems field. Results demonstrate the validity and generality of the proposed detrending method based on ESNs

    Analysis of heat kernel highlights the strongly modular and heat-preserving structure of proteins

    No full text
    In this paper, we study the structure and dynamical properties of protein contact networks with respect to other biological networks, together with simulated archetypal models acting as probes. We consider both classical topological descriptors, such as modularity and statistics of the shortest paths, and different interpretations in terms of diffusion provided by the discrete heat kernel, which is elaborated from the normalized graph Laplacians. A principal component analysis shows high discrimination among the network types, by considering both the topological and heat kernel based vector characterizations. Furthermore, a canonical correlation analysis demonstrates the strong agreement among those two characterizations, providing thus an important justification in terms of interpretability for the heat kernel. Finally, and most importantly, the focused analysis of the heat kernel provides a way to yield insights on the fact that proteins have to satisfy specific structural design constraints that the other considered networks do not need to obey. Notably, the heat trace decay of an ensemble of varying-size proteins denotes subdiffusion, a peculiar property of proteins.In this paper, we study the structure and dynamical properties of protein contact networks with respect to other biological networks, together with simulated archetypal models acting as probes. We consider both classical topological descriptors, such as modularity and statistics of the shortest paths, and different interpretations in terms of diffusion provided by the discrete heat kernel, which is elaborated from the normalized graph Laplacians. A principal component analysis shows high discrimination among the network types, by considering both the topological and heat kernel based vector characterizations. Furthermore, a canonical correlation analysis demonstrates the strong agreement among those two characterizations, providing thus an important justification in terms of interpretability for the heat kernel. Finally, and most importantly, the focused analysis of the heat kernel provides a way to yield insights on the fact that proteins have to satisfy specific structural design constraints that the other considered networks do not need to obey. Notably, the heat trace decay of an ensemble of varying-size proteins denotes subdiffusion, a peculiar property of proteins
    corecore