1,721,006 research outputs found
Epigenomic k-mer dictionaries: shedding light on how sequence composition influences in vivo nucleosome positioning
Motivation: Information-theoretic and compositional analysis of biological sequences, in terms of k-mer dictionaries, has a well established role in genomic and proteomic studies. Much less so in epigenomics, although the role of k-mers in chromatin organization and nucleosome positioning is
particularly relevant. Fundamental questions concerning the informational content and compositional structure of nucleosome favouring and disfavoring sequences with respect to their basic building blocks still remain open. Results: We present the first analysis on the role of k-mers in the composition of nucleosome enriched and depleted genomic regions (NER and NDR for short) that is: (i) exhaustive and within the
bounds dictated by the information-theoretic content of the sample sets we use and (ii) informative for comparative epigenomics. We analize four different organisms and we propose a paradigmatic formalization of k-mer dictionaries, providing two different and complementary views of the
k-mers involved in NER and NDR. The first extends well known studies in this area, its comparative nature being its major merit. The second, very novel, brings to light the rich variety of k-mers involved in influencing nucleosome positioning, for which an initial classification in terms of clusters
is also provided. Although such a classification offers many insights, the following deserves to be singled-out: short poly(dA:dT) tracts are reported in the literature as fundamental for nucleosome depletion, however a global quantitative look reveals that their role is much less prominent
than one would expect based on previous studies
Optimal extraction of motif patterns in 2D
The combinatorial explosion of motif patterns occurring in 1D and 2D arrays leads to the consideration of special classes of motifs growing linearly with the size of the input array. Such motifs, called irredundant motifs, are able to succinctly represent all of the other motifs occurring in the same array within reasonable time and space bounds. In previous work irredundant motifs were extracted from 2D arrays in O (N 2 log 2 n log log n) and O (N 3) time, where N is the size of the 2D input array and n is its largest dimension. In this paper, we present an algorithm to extract irredundant motifs from 2D arrays that is quadratic in the size of the input. The input is defined on a binary alphabet. It is shown that the algorithm is optimal and practically faster than the previous ones. © 2009 Elsevier B.V. All rights reserved
A co-clustering approach for mining large protein-protein interaction networks
Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonoverlapping clusters. The density of the clusters to search for can also be set by the user. We tested our method on the two networks of yeast and human, and compared it to other five well-known techniques on the same interaction data sets. The results showed that, for all the examples considered, our approach always reaches
a good compromise between accuracy and network coverage. Furthermore, the behavior of our algorithm is not influenced by the structure of the input network, different from all the techniques considered in the comparison, which returned very good results on the yeast network, while on the human network their outcomes are rather poor
Searching for repetitions in biological networks: methods, resources and tools
We present here a compact overview of the data, models and methods proposed for the analysis of biological networks based on the search for significant repetitions. In particular, we concentrate on three problems widely studied in the literature: 'network alignment', 'network querying' and 'network motif extraction'. We provide (i) details of the experimental techniques used to obtain the main types of interaction data, (ii) descriptions of the models and approaches introduced to solve such problems and (iii) pointers to both the available databases and software tools. The intent is to lay out a useful roadmap for identifying suitable strategies to analyse cellular data, possibly based on the joint use of different interaction data types or analysis techniques
An evolutionary restricted neighborhood search clustering approach for PPI networks
Protein-protein interaction networks have been broadly studied in the last few years, in order to understand the behavior of proteins inside the cell. Proteins interacting with each other often share common biological functions or they participate in the same biological process. Thus, discovering protein complexes made of a group of proteins strictly related can be useful to predict protein functions. Clustering techniques have been widely employed to detect significant biological complexes. In this paper, we integrate one of the most popular network clustering techniques, namely the Restricted Neighborhood Search Clustering (RNSC), with evolutionary computation. The two cost functions introduced by RNSC, besides a new one that combines them, are used by a Genetic Algorithm as fitness functions to be optimized. Experimental evaluations performed on two different groups of interactions of the budding yeast Saccharomyces cerevisiae show that the clusters obtained by the genetic approach are a larger number of those found by RNSC, though this method predicts more true complexes
Discovering meaningful protein-protein interaction modules by a co-clustering based approach
- …
