Understanding the effects of taxon sampling on phylogenetic estimation is a problem that grows more urgent as our ability to gather and analyze large-scale datasets expands. Recent attention to this question has led to several studies examining the behaviour of phylogenetic estimators with the addition or deletion of taxa to datasets, yielding the puzzling findings that the estimator performances improve sometimes but not always with the addition of taxa. Results from Kim (1998) hint at an explanation for this phenomenon, showing that for some kinds of clade structure, more intensive taxon sampling can lead to better phylogenetic estimates.

Accordingly, there has been a call for broader systematic studies of the scaling effect on the performance of parsimony and other frequently used taxonomic estimators, across a range of models, character evolution rates, and tree structures. A full systematic survey is possible with the novel approach of finding extreme bounds on the performance of phylogenetic estimation with increases of taxa. We have proposed to provide such a study by calculating performance indicators for phylogenies estimated across a variety of model trees and sampling strategies, and delineating a profile of those clades and datasets where the performance of the estimators increases with increased sampling.

The goals of the project are to answer the following:

The ultimate reward of this analysis will be the development of practical guidelines for finding optimal taxon sampling strategies for phylogenetic estimations.

A little background reading: