Laboratory 2: Tree Estimation (search & summary) and Character Evolution
In this lab, you will be using your new familiarity with Mesquite and R to conduct additional phylogeny estimates and look at character evolution in Caminalcules.
The goals of this lab are to:
- Gain more familiarity with functions in R and Mesquite.
- Understand the underlying assumptions of phylogeny estimation under parsimony.
- Examine morphological character evolution on trees using “character tracing” and ancestral state reconstruction.
First, we will use a few different functions in R to estimate phylogenies based on the Caminalcule data generated in the last lab. Whenever we estimate a phylogeny, we invoke a set of evolutionary assumptions. For example:
- What is the order, if any, of transitions from one character state to the next?
- Is the rate of transition among character states similar across the phylogeny, or are some clades evolving more rapidly than others?
- Is evolution in my particular study group best estimated using the parsimony criterion, or a more complex (and accurate) model of evolution that incorporates my empirical knowledge of character evolution?
Before lab, you estimated a phylogeny in R from your Caminalcule dataset using Fitch parsimony (unordered states). Using the instructions from the pre-lab assignment, open R Studio, set your working directory, and load the following packages: ape, igraph, lattice, Matrix, phangorn, rgl.
Next, repeat the prelab instructions to bring your Caminalcule data set into R.
At this point, you should have a data set called cam1. Let’s begin by running another parsimony analysis. There are a few options for parsimony analyses, which you can find by typing parsimony into the search field under the Help tab (bottom, right in R Studio).
The ape, phytools, and phangorn packages have options for parsimony tree estimation (see screenshot 1). These differ principally in tree search methodologies— some employ exhaustive searches for all possible reconstructions, while others employ shortcuts for finding the most parsimonious trees among all possible topologies.
From your Search Results (screenshot 1) navigate to the Help pages for phangorn::parsimony and take a look at the description of functions.
Okay, here you can see that some functions are used for data (i.e., your character matrix) while others are used for trees.
Before lab you estimated a Fitch parsimony tree from your Caminalcule character matrix using the random.addition function for estimating a ‘starting tree’. A starting tree is used in more complex analyses to reduce tree search time (ask me for details), but it does not sufficiently account for all of the most-parsimonious solutions.
Today, we will use a method to help us find the best (most-parsimonious) tree without having to conduct an exhaustive search.
Exhaustive (brute-force):
How many solutions can we expect to find in an exhaustive search?
For 3 taxa?
For 5 taxa?
For 30 taxa? 4.524 x 1013
Wow! See Felsenstein Inferring Phylogenies, 32
The parsimony ratchet (Nixon, 1999) is an iterative search to help prevent getting stuck in local optima.
Recall that the tree we generated with random.addition was called tree. This time, we will generate a set of most-parsimonious trees using pratchet and name it something more informative, like pars.rat. Examine the arguments in the command below, then copy the command into your R console to execute the analysis.
pars.rat <- pratchet(cam1, start=NULL, method=”fitch”, maxit=1000, k=10, trace=1, all=TRUE, rearrangements=”SPR”)
That was fast! What if we increase the number of rounds before the ratchet is stopped when there is no improvement (k)? Use the “up” arrow key on your keyboard to invoke the last command and change k to 50. Did this change the computation time?
Repeat the last step again; invoke the command using your arrow key and set maxit to 10000 and k=10. Note that each time you make an object called pars.rat you are over-writing the previous object you made with the same name.
To see how many most-parsimonious trees were found using this procedure, simply type
pars.rat
How many trees do you have? To view your trees use the plot command.
plot(pars.rat)
You can scroll through the trees using the arrow buttons at the top of your Plot screen. Are they different? Which tree is the right one? How do we interpret and represent a set of trees resulting from a phylogenetic analysis?
We interpret these trees as all equally likely to match the TRUE tree, but without more information, it is impossible to say which is best. The problem of representing trees in an acceptable amount of space is a tough one, but there are a few solutions. It is important to remember that your results are only as good as the data that go into them.
- Choose one, but report how many equally-likely solutions were found.
- List all of the trees in parenthetical format in a table.
- Present a ‘summary’ tree
Summary trees are constructed using consensus techniques, of which there are several (Felsenstein Ch.30). Today we will calculate two consensus trees, a strict consensus and a majority rule consensus, from our set of trees.
St.pars.cons <- consensus(pars.rat, p = 1, check.labels = TRUE)
plot(St.pars.cons)
What is being shown here?
Now let’s use a majority rule criterion for summarizing the trees. The ‘majority’ can be set to any fraction—to make the consensus more stringent, simply increase the value of p.
MR.pars.cons <- consensus(pars.rat, p = 0.5, check.labels = TRUE)
plot(MR.pars.cons)
Is this consensus different? Why?
Finally, let’s write (export) your majority rule consensus tree to a nexus file. The file should be ‘written’ to your working directory.
write.nexus(MR.pars.cons, file=”Caminalcule_MR.nex”, translate=TRUE)
This concludes our analyses in R for the day. Close R and execute Mesquite.
When Mesquite has loaded, open your new Caminalcule_MR.nex file by navigating to your R working directory. When you open the file, your majority rule tree should appear. This tree is very simple, in that it contains only information about tree shape (i.e., the arrangement of Caminalcule taxa). Although the tree appears to have a root in the Mesquite GUI, this is false—when we ran the analysis, we did not specify a root.
Will the absence of a root have any influence on ancestral state estimation?
To specify a root, find the Reroot at branch tool and click on Caminalcule 19. Now is a good time to save your work. Hit Save!
Take a minute right now to make a large sketch (fill most of the sheet) of your rooted phylogeny on the space remaining on the last page.
Next, we need to relate your tree topology with your character matrix. Matching the data sets is possible because the taxon names in your topology are exactly the same as the ones in your character matrix. In the File menu of Mesquite, scroll down to Link File…, navigate to your R working directory and open your Caminalcule_data.nex character matrix. If you are prompted to decide whether to retain redundant taxon fields, choose not to… we only need one column (or field) for taxon names.
In Mesquite, navigate to the Project of “Caminalcule_MR.nex” tab. From here, you can explore the different types of data stored in this project. Show Matrix under the Character Matrix will display your characters. Return to View Trees to see how the characters fit to your majority rule tree.
Once in the Tree Window go to the Analysis menu at the top of your screen. Choose Trace Character History from the drop down menu. When prompted, choose Stored characters and Parsimony ancestral states. A pop-up legend should appear in your tree window; use the arrows to scroll through the characters.
To view ancestral states a different way, go to Drawing in the main menu bar, choose Tree form> and then choose Balls & Sticks.
What kinds of information are provided to you in the pop-up legend? Are there certain characters that support monophyletic groups particularly well?
Finally, go to Taxa&Trees in the main menu bar and choose New tree window> with trees from source to open a new tree window. In the new tree window, go to Analysis> Trace Character History, repeat the procedure, but choose Likelihood Ancestral States instead of parsimony.
How does the Maximum Likelihood estimate differ from the Parsimony estimate? To investigate this, try plotting the first five characters (with ancestral states at nodes) onto the tree you’ve drawn, first copying from your parsimony ancestor estimate and then from your maximum likelihood reconstruction.
Draw your majority rule tree here:
this is great!