1,721,190 research outputs found

    Mathematical Approaches to Comparative Linguistics

    No full text
    The inference of the evolutionary history of a set of languages is a complex problem. Although some languages are known to be related through descent from common ancestral languages, for other languages determining whether such a relationship holds is itself a difficult problem. In this paper we report on new methods, developed by linguists Johanna Nichols (Berkeley), Donald Ringe (Penn), and Ann Taylor (Penn), and computer scientist Tandy Warnow (Penn), for answering some of the most difficult questions in this domain. These methods and the results of the analyses based upon these methods were presented in November 1995 at the Symposium on the Frontiers of Science of the National Academy of Science. 1 Evolutionary relationships in linguistics Evolutionary relatedness of languages is described by observing that separation of speech communities into separate and noninteracting sub-communities eventually results in a language developing into distinct new languages in a process quite sim..

    plos-datasets

    No full text
    <p>Datasets for "Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses"<br>Md. Shamsuzzoha Bayzid, Siavash Mirarab, Bastien Boussau and Tandy Warnow, PLoS ONE, 2015.</p

    Fast and Accurate Species Trees from Weighted Internode Distances

    Full text link
    Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., "gene tree heterogeneity"). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing "gene trees") and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. Our experimental study evaluating weighted ASTRID shows improvements in accuracy compared to the original (unweighted) ASTRID while remaining fast. Moreover, weighted ASTRID shows competitive accuracy against weighted ASTRAL, the state of the art. Thus, this study provides a new and very fast method for species tree estimation that improves upon ASTRID and has comparable accuracy with the state of the art while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode

    On the upper bound of the prediction accuracy of residue contacts in proteins with correlated mutations: the case study of the similarity matrices

    No full text
    Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In this paper, we describe an optimization procedure that maximizes the correlation between the Pearson coefficient and the protein residue contacts with respect to different similarity matrices, including random. Our results indicate that there is a large number of equivalent matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in the prediction of the protein residue contacts is independent of the optimized similarity matrix. This suggests that poor scoring may be due to the choice of the linear correlation function in evaluating correlated mutations

    DACTAL

    No full text

    Large-scale Phylogenetic Analysis

    No full text

    Genome-scale Estimation of the Tree of Life

    No full text

    SATe-Enabled Phylogenetic Placement

    No full text
    corecore