1,721,190 research outputs found
Mathematical Approaches to Comparative Linguistics
The inference of the evolutionary history of a set of languages is a complex problem. Although some languages are known to be related through descent from common ancestral languages, for other languages determining whether such a relationship holds is itself a difficult problem. In this paper we report on new methods, developed by linguists Johanna Nichols (Berkeley), Donald Ringe (Penn), and Ann Taylor (Penn), and computer scientist Tandy Warnow (Penn), for answering some of the most difficult questions in this domain. These methods and the results of the analyses based upon these methods were presented in November 1995 at the Symposium on the Frontiers of Science of the National Academy of Science. 1 Evolutionary relationships in linguistics Evolutionary relatedness of languages is described by observing that separation of speech communities into separate and noninteracting sub-communities eventually results in a language developing into distinct new languages in a process quite sim..
Recommended from our members
Developing scalable quartet tree encodings
Reconstructing the Tree of Life, the evolutionary history of all
species, stands as one of the most significant and intensive problems
in computational biology. One approach to this grand project is to
use supertree methods that merge a set of smaller trees (or source
trees) into one single tree. In practice, most biologists use a particular supertree method called Matrix Representation with Parsimony
(MRP) due to its topological accuracy as compared to most other
methods. Recently, Snir and Rao presented a new supertree method
that first encodes the source trees as a set of four-leaf trees and then
uses Quartet Maxcut (QMC) on these quartet trees to compute a single overall tree. On certain realistic model conditions, this supertree
method using a particular quartet encoding, Exp + TSQ, was shown
to outperform MRP in terms of topological accuracy. However, this
supertree method have many limitations. First, it fails to complete
on many cases. Second, its subroutine Exp+TSQ is computationally
intensive because it examines all possible quartets. These limitations
discourage the use of QMC on Exp+TSQ. Thus, we extend the QMC
study in the hope of designing a new scalable quartet encoding that
would further improve this supertree estimation. Our quartet encodings are based on two ideas: the examination of all possible quartets
on large trees is unnecessary, and the taxon sampling density of the
source tree should be taken into account in the encoding. We propose
an alternative time-efficient and robust encoding UniformK +TSQ*
that may be used to substitute for Exp+TSQ without compromising
the accuracy of the supertree method.Mathematic
plos-datasets
<p>Datasets for "Weighted Statistical Binning: enabling statistically consistent genome-scale phylogenetic analyses"<br>Md. Shamsuzzoha Bayzid, Siavash Mirarab, Bastien Boussau and Tandy Warnow, PLoS ONE, 2015.</p
Fast and Accurate Species Trees from Weighted Internode Distances
Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., "gene tree heterogeneity"). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing "gene trees") and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. Our experimental study evaluating weighted ASTRID shows improvements in accuracy compared to the original (unweighted) ASTRID while remaining fast. Moreover, weighted ASTRID shows competitive accuracy against weighted ASTRAL, the state of the art. Thus, this study provides a new and very fast method for species tree estimation that improves upon ASTRID and has comparable accuracy with the state of the art while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode
On the upper bound of the prediction accuracy of residue contacts in proteins with correlated mutations: the case study of the similarity matrices
Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In this paper, we describe an optimization procedure that maximizes the correlation between the Pearson coefficient and the protein residue contacts with respect to different similarity matrices, including random. Our results indicate that there is a large number of equivalent matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in the prediction of the protein residue contacts is independent of the optimized similarity matrix. This suggests that poor scoring may be due to the choice of the linear correlation function in evaluating correlated mutations
Faculty Opinions recommendation of Renewing Felsenstein's phylogenetic bootstrap in the era of big data.
- …
