Search CORE

1,721,190 research outputs found

Mathematical Approaches to Comparative Linguistics

Author: Tandy Warnow
Publication venue
Publication date: 01/01/1996
Field of study

The inference of the evolutionary history of a set of languages is a complex problem. Although some languages are known to be related through descent from common ancestral languages, for other languages determining whether such a relationship holds is itself a difficult problem. In this paper we report on new methods, developed by linguists Johanna Nichols (Berkeley), Donald Ringe (Penn), and Ann Taylor (Penn), and computer scientist Tandy Warnow (Penn), for answering some of the most difficult questions in this domain. These methods and the results of the analyses based upon these methods were presented in November 1995 at the Symposium on the Frontiers of Science of the National Academy of Science. 1 Evolutionary relationships in linguistics Evolutionary relatedness of languages is described by observing that separation of speech communities into separate and noninteracting sub-communities eventually results in a language developing into distinct new languages in a process quite sim..

CiteSeerX

Recommended from our members

Developing scalable quartet tree encodings

Author: Lee Young-suk
Publication venue
Publication date: 2009
Field of study

Reconstructing the Tree of Life, the evolutionary history of all species, stands as one of the most significant and intensive problems in computational biology. One approach to this grand project is to use supertree methods that merge a set of smaller trees (or source trees) into one single tree. In practice, most biologists use a particular supertree method called Matrix Representation with Parsimony (MRP) due to its topological accuracy as compared to most other methods. Recently, Snir and Rao presented a new supertree method that first encodes the source trees as a set of four-leaf trees and then uses Quartet Maxcut (QMC) on these quartet trees to compute a single overall tree. On certain realistic model conditions, this supertree method using a particular quartet encoding, Exp + TSQ, was shown to outperform MRP in terms of topological accuracy. However, this supertree method have many limitations. First, it fails to complete on many cases. Second, its subroutine Exp+TSQ is computationally intensive because it examines all possible quartets. These limitations discourage the use of QMC on Exp+TSQ. Thus, we extend the QMC study in the hope of designing a new scalable quartet encoding that would further improve this supertree estimation. Our quartet encodings are based on two ideas: the examination of all possible quartets on large trees is unnecessary, and the taxon sampling density of the source tree should be taken into account in the encoding. We propose an alternative time-efficient and robust encoding UniformK +TSQ* that may be used to substitute for Exp+TSQ without compromising the accuracy of the supertree method.Mathematic

Texas ScholarWorks

plos-datasets

Author: Md shamsuzzoha Bayzid (736807)
Publication venue
Publication date: 2015
Field of study

<p>Datasets for "Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses"<br>Md. Shamsuzzoha Bayzid, Siavash Mirarab, Bastien Boussau and Tandy Warnow, PLoS ONE, 2015.</p

The Francis Crick Institute

Fast and Accurate Species Trees from Weighted Internode Distances

Author: Liu Baqiao
Warnow Tandy
Publication venue
Publication date: 01/01/2022
Field of study

Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., "gene tree heterogeneity"). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing "gene trees") and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. Our experimental study evaluating weighted ASTRID shows improvements in accuracy compared to the original (unweighted) ASTRID while remaining fast. Moreover, weighted ASTRID shows competitive accuracy against weighted ASTRAL, the state of the art. Thus, this study provides a new and very fast method for species tree estimation that improves upon ASTRID and has comparable accuracy with the state of the art while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode

DROPS Dagstuhl Research Online Publication Server

On the upper bound of the prediction accuracy of residue contacts in proteins with correlated mutations: the case study of the similarity matrices

Author: MARGARA LUCIANO
Piero Fariselli
P. Di Lena
CASADIO RITA
Pietro Di Lena
M. Vassura
DI LENA PIETRO
L. Margara
Marco Vassura
VASSURA MARCO
Luciano Margara
Rita Casadio
R. Casadio
Fariselli Piero
Publication venue
Publication date: 01/01/2009
Field of study

Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In this paper, we describe an optimization procedure that maximizes the correlation between the Pearson coefficient and the protein residue contacts with respect to different similarity matrices, including random. Our results indicate that there is a large number of equivalent matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in the prediction of the protein residue contacts is independent of the optimized similarity matrix. This suggests that poor scoring may be due to the choice of the linear correlation function in evaluating correlated mutations

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Padova