1,721,179 research outputs found

    Motifs in Ziv-Lempel-Welch Clef

    No full text
    Abstract We present variants of classical data compression paradigms by Ziv, Lempel, and Welch in which the phrases used in compression are selected among suitably chosen motifs, defined here as strings of intermittently solid and wild characters that recur more or less frequently in the source textstring. This notion emerged primarily in the analysis of biological sequences and molecules. Whereas the number of motifs in a sequence or family may be exponential in the size of the input, a linear-sized basis of irredundant motifs may be defined such that any other motif can be obtained by the union of a suitable subset from the basis. Previous study has exposed the advantages of using irredundant motifs in lossy as well as lossless offline compression. In the present paper, we examine adaptations and extensions of classical incremental ZL and ZLW paradigms. First, hybrid schemata are proposed along these lines, in which motifs may be discovered and selected off-line, while the parse and encoding is still conducted on-line. The performances thus obtained improve on the one hand over previous off-line implementations of motif-based compression, and on the other, over the traditionally best implementations of ZLW. On the basis of this, both lossy and lossless motif-based schemata are introduced and tested that follow more closely the ZL and ZLW paradigms

    Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences

    No full text
    The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing technique

    Motif Patterns in 2D

    No full text
    Motif patterns consisting of sequences of intermixed solid and don't-care characters have been introduced and studied in connection with pattern discovery problems of computational biology and other domains. In order to alleviate the exponential growth of such motifs, notions of maximal saturation and irredundancy have been formulated, whereby more or less compact subsets of the set of all motifs can be extracted, that are capable of expressing all others by suitable combinations. In this paper, we introduce the notion of maximal irredundant motifs in a two-dimensional array and develop initial properties and a combinatorial argument that poses a linear bound on the total number of such motifs. The remainder of the paper presents approaches to the discovery of irredundant motifs both by offline and incremental algorithms

    Abstract 3574: A survey of mutations in biomedical literature using a machine based approach

    No full text
    Abstract Introduction: Being able to characterize mutations for both pathogenicity and drug response is indispensable to the analysis of tumor genomics and the development of therapeutic options. While a great deal of data has been deposited in various structured, genomic databases, a large portion of insights are primarily and often times solely found in biomedical literature. Medline contains about 26 million literature citations; a number that is unrealistic for a human to read. Thus machine based approaches are needed to comprehensively capture the landscape of reported mutations. Method: An automated pattern matching method is utilized to extract mutations from Medline abstracts as presented in Human Genome Variation Society (HGVS) format and RefSNPs (rs) number. A typical HGVS protein mutation is described as [reference amino acid][position][new amino acid], as in p.His1047Arg, His1047Arg, or simply H1047R in HGVS format. This method identifies and consolidates all mentioned protein mutations and their alternate formulations. Result: Over 300,000 unique abstract-mutation pairs were identified including 90,000 unique mutations. Well known cancer mutations such as BRAF V600E, JAK2 V617F and EGFR L858R are among the most frequent appearing in oncology literature. At the other end, 51,000 mutations are mentioned in just a single abstract, 16,000 mutations in two abstracts, 7,600 in three abstracts, and so forth. Conclusion: The number of mutations appearing in Medline abstracts represents just a small portion of the 2 million unique coding mutations contained in the COSMIC database. While we expect the actual coverage of mutations by literature to be more comprehensive if this approach is extended to the full text body, the number would likely remain small compared with the total reported COSMIC mutations. One of the great challenges in oncology is characterizing variants of unknown significance (VUS), and by first extracting all reported mutations, even those mentioned in only one article, and their specific biological context, we can begin to identify broader patterns in mutations’ pathogenicity and their impact on drug response. Citation Format: Takahiko Koyama, Kahn Rhrissorrakrai, Laxmi Parida. A survey of mutations in biomedical literature using a machine based approach [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3574. doi:10.1158/1538-7445.AM2017-3574</jats:p

    LIPIcs, Volume 113, WABI'18, Complete Volume

    No full text
    LIPIcs, Volume 113, WABI'18, Complete Volum

    Front Matter, Table of Contents, Preface, Conference Organization

    No full text
    Front Matter, Table of Contents, Preface, Conference Organizatio

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore