1,721,016 research outputs found

    MODIMO: Workshop on Multi-Omics Data Integration for Modelling Biological Systems

    No full text
    Multi-omics analysis aims at extracting previously uncovered biological knowledge by integrating information across multiple single-omic sources. Past approaches have focused on the simultaneous analysis of a small number of omic data sets. Current challenges face the problem of integrating multiple omic sources into a unified complex model, or of combining already available tools for two-by-two omics analyses and merging their outcomes. By doing so and leveraging integrated system-level knowledge, multi-omic approaches ought to enable the development of better qualitative and quantitative models for descriptive and predictive analyses. To move this area forward, new statistical and algorithmic frameworks are needed, for example for generalizing classical graph theory results to heterogeneous networks and applying them to diverse problems such as drug repurposing or understanding the immune response to infections. Thus, in short, this workshop aims at investigating novel methodologies for providing crucial insights into multi-omics data management, integration, and analysis in order to enable biological discoveries

    A general probabilistic database model

    No full text
    A new model for probabilistic databases, using interval-valued conditional probability assessments, is proposed. The concept of coherence adopted in our approach is based on a suitable generalization of the coherence principle of de Finetti and can be related to similar definitions given for lower and upper probabilities by other authors. A corresponding probabilistic relational algebra is introduced and some new operators are defined. Finally, some simple examples are given

    An entropy heuristic to optimize decision diagrams for index-driven search in biological graph databases

    Full text link
    Graphs are a widely used structure for knowledge representation. Their uses range from biochemical to biomedical applications and are recently involved in multi-omics analyses. A key computational task regarding graphs is the search of specific topologies contained in them. The task is known to be NP-complete, thus indexing techniques are applied for dealing with its complexity. In particular, techniques exploiting paths extracted from graphs have shown good performances in terms of time requirements, but they still suffer because of the relatively large size of the produced index. We applied decision diagrams (DDs) as index data structure showing a good reduction in the indexing size with respect to other approaches. Nevertheless, the size of a DD is dependent on its variable order. Because the search of an optimal order is an NP-complete task, variable order heuristics on DDs are applied by exploiting domain-specific information. Here, we propose a heuristic based on the information content of the labeled paths. Tests on well-studied biological benchmarks, which are an essential part of multi-omics graphs, show that the resultant size correlates with the information measure related to the paths and that the chosen order allows to effectively reduce the index size

    PanDelos-frags: A methodology for discovering pangenomic content of incomplete microbial assemblies

    No full text
    Pangenomics was originally defined as the problem of comparing the composition of genes into gene families within a set of bacterial isolates belonging to the same species. The problem requires the calculation of sequence homology among such genes. When combined with metagenomics, namely for human microbiome composition analysis, gene-oriented pangenome detection becomes a promising method to decipher ecosystem functions and population-level evolution. Established computational tools are able to investigate the genetic content of isolates for which a complete genomic sequence is available. However, there is a plethora of incomplete genomes that are available on public resources, which only a few tools may analyze. Incomplete means that the process for reconstructing their genomic sequence is not complete, and only fragments of their sequence are currently available. However, the information contained in these fragments may play an essential role in the analyses. Here, we present PanDelos-frags, a computational tool which exploits and extends previous results in analyzing complete genomes. It provides a new methodology for inferring missing genetic information and thus for managing incomplete genomes. PanDelos-frags outperforms state-of-the-art approaches in reconstructing gene families in synthetic benchmarks and in a real use case of metagenomics. PanDelos-frags is publicly available at https://github.com/InfOmics/PanDelos-frags

    Core algorithms to search in biological structured data

    No full text
    Motivations. The graph is a data structure to represent biological data ranging from molecules and proteins to biological networks and metabolic pathways. Working on those data involves manly applying graph isomorphism algorithms. Those algorithms are computationally hard and their efficiency may depend upon the input graphs. We are building a library, SubGraphLib, of the most popular searching algorithms and benchmarks highlighting drawbacks, advantages, and best performance input cases for each method. A novel approach to find all occurrences of a query subgraph in a target graph is also proposed. This new method applies a search strategy which significantly reduces the search space without using any complex pruning rule. Results show a significant reduction of the running time with respect to other methods together with a scalable memory requirement. Methods. The best known algorithms to solve the subgraph isomorphism problem are the ones proposed by Ullmann [1] and by Cordella et al. [2] (VF2), which make use of backtracking algorithms in conjunction with some filtering rules to prune branches of the search space represented as a tree. The nodes of the tree denote pairs of matched vertices of the query and the target graphs, respectively. During the visit, the isomorphism conditions are applied to verify the partial matches. The algorithm in [1] modeled the graph isomorphism problem also as a constraint satisfaction problem (CSP). A CSP is defined by a set of variables and a set of constraints among them. To each variable a set of possible values, called domain, is associated. The solution of a given CSP problem is an assignment of values to all variables such that all constraints are satisfied. More recently, Solnon [3] published a method, LAD, for propagating global neighborhood constraints together with a generalized arc consistency. Ullmann [4] proposed a new method, FocusSearch, based on bitvector representation of domains, to deal with parallel operations. In FocusSearch, domain reduction is not applied until convergence is achieved. The search phase is preceded by two steps based on vertex invariants and local AllDifferent constraints [3,4]. Search strategy is established by a static instantiation sequence based on the number of future branches. Our newly proposed algorithm, called CoreGraph, is based on a new search strategy which builds a static instantiation sequence of the query node. CoreGraph does not deal with complex filtering rule or domains. The basic idea for the construction of the search sequence is to maximize the number of branches to preceding nodes in the sequence. The sequence is recursively generated by adding those neighbors maximizing a score function. The score of each candidate node is assigned taking into account its degree, the number of its edges leading to nodes in the sequence and to their neighbors. Notice that, CoreGraph applies those filtering rules only to the query graph. Concerning the target graphs, the only information CoreGraph uses for pruning is node degree. Finally, since the search strategy does not give priority to more dense parts of the target graphs it results efficient in a large variety of query and target graphs. Results. SubGraphLib contains the original implementation of VF2, LAD, and CoreGraph and a new implementation of FocusSearch in C++ (which is originally distributed in modula2). All algorithms have been compared on benchmarks such as synthetic unlabeled graphs, molecules, and biological networks. CoreGraph and FocusSearch in all cases outperform the other algorithms in terms of execution time. In most benchmarks, CoreGraph outperforms also FocusSearch. FocusSearch results particularly efficient on regular graphs having a mesh structure. However, since FocusSearch uses initial domains to avoid label comparisons, the memory requirements do not scale with respect to graphs size. On the other hand, CoreGraph maintains a low memory profile

    A scoring methodology for an integrated network of non-coding RNAs and genetic diseases

    No full text
    The deregulation of non-coding RNAs (ncRNAs) has a functional role in cancer and other human disorders [1, 2]. Reconstructing and visualizing networks of ncRNAs interactions with diseases and candidate targeting genes is important to understand their regulatory mechanism in complex cellular systems. Within ncRNA-DB [3], we have recently imported and integrated associations among non-coding RNAs, protein coding genes, and associated diseases from ten on-line databases. Up to date it contains about 300 thousands associations. To improve the usability of such a complex integrated system, it has to be equipped with a methodology leading users to weight the connections linking the ncRNAs to genes and to diseases. We elaborate a scoring methodology based on literature mining, network analysis, and alignment-free sequence algorithms, to rank the ncRNA-disease and ncRNA-gene associations reported in ncRNA-DB. The Lit-Score takes into account the frequencies of co-occurrence in PubMed and in ncRNA-DB of the pairs ncRNA-gene, ncRNA-disease, and gene-disease

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore