I'm working on two projects: one for my biology thesis, and the other for my CSE thesis. Both involve phylogenetic trees.
Here's a copy of my abstract in progress:
Many problems in computational biology make use of phylogenetic trees to discuss the likely ancestral relationships among DNA sequences. In order to make useful statistical inferences from phylogenetic tree comparison metrics, one must have well characterized distributions of the comparison metrics relative to various null models for random tree sampling. For statistics that compare tree similarity, such as Maximum Agreement Sub-Tree (MAST), only the asymptotic behavior of the distribution is well characterized. However, the 'tails', or the regions of the distribution of least frequency, have not yet been well characterized. It is the purpose of this project to investigate the statistical nature of the tails of the distribution of MAST scores on random trees. Because the number of labeled trees increases combinatorially with the number of taxa, it is impossible to generate all possible trees for large numbers of taxa. Therefore simulations of various tree-generating processes or conditional probabilities may be used to enhance sampling bias.
I intend to do a simulation study of two phylogenetic tree reconstruction methods under two different models of evolution. The two tree reconstruction methods are maximum likelihood and maximum parsimony. The two models of evolution are the i.i.d. site model and a model where the sequence space is limited by the constraints of context-free-grammars. The idea behind the second model is to select a random tree and evolve a grammar over that tree so at the end of the evolutionary process we have one grammar at each leaf. Each grammar then emits one or more sequences consistent with that grammar. The question is how do the tree reconstruction methods perform on the limited sequence space compared to the i.i.d. model.
2300 Pine Street Apt 1 Philadelphia, PA 19103
Mobile: 610-656-3755 AIM: kbullaughey email: email@example.com
Majors: Biology (computational concentration) Computer Science Engineering
Originally from West Chester, PA. I'll be graduating May 2004, so if you need me after that, try getting updated contact info from my parents at 610-793-2370.