Laboratory 5: Models of molecular evolution and Maximum Likelihood
In this lab, we will use what we learned about nucleotide evolution from last week to conduct phylogeny estimates using the maximum likelihood (ML) criterion. We will take a short departure from R to test out a few different applications for running ML analyses. You are already familiar with the Clusia dataset**, so we’ll stick with those data for today.
The goals of this lab are to:
- Familiarize yourself with tools available through the CIPRES web portal.
- Think about how different models of nucleotide evolution can lead to different phylogeny estimates.
- Use FigTree to visualize and modify trees.
- Familiarize yourself with RaxML and Garli applications.
Please download the FigTree v1.4.0 application from http://tree.bio.ed.ac.uk/software/figtree/. We will use this application to view and modify our trees today.
Let’s take a look together at the CIPRES (Cyberinfrastructure for Phylogenetic Research) web portal, which gives us access to a computer cluster. Open an internet browser for your computer and navigate to the CIPRES website:
Follow the links to Use CIPRES Science Gateway and create an account for yourself. When you have established a workspace for yourself on CIPRES, navigate to the tab called Toolkit. This is a list of applications that are available to you—as you can see, some of them are for homology assessment and some are for phylogeny estimation.
Uploading data to CIPRES
- Next, navigate to the Home tab, then Create New Folder to make a folder for this class. You may want to name it something like BOT370. Your new folder should appear in the left margin of your browser with two subfolders—Data and Tasks.
- Follow the link to Data and then to Upload Data in the new window.
- Find the link for Multiple Uploads and follow it—the java applet is slow, but will allow you to upload your four ITS datasets for Clusia at one time. Find the two Clusia files I sent to you via email today (Clusia_total.phy and partition.txt) and upload them to CIPRES. Follow the Data (2) link in the left margin of your screen to view a list of your datasets.
Setting up tasks
- Navigate to the Tasks subfolder in the left margin of your screen. To start, we will set up an analysis in RaxML (Randomized Axelerated Maximum Likelihood) using the Clusia dataset.
- Navigate to Create New Task and select as your input data the Clusia_total.phy file.
- As your Tool, select RaxML-HPC BlackBox (7.6.3).
- Click on 8 Parameters Set and find the Use a mixed/partitioned model? menu. Under this menu, choose your partition.txt file.
- Save the parameters and run the analysis.
Looking at your results
In the tasks window, you can see if your analysis has finished by clicking the Refresh Tasks button. You should have 2 files in your Output when the analysis has finished. Click on View (2) and download the file called RaxML_bipartitions.result. Move this file to your working directory and rename the file RaxML_bipartitions.tre. We will now open the file with FigTree.
When prompted, name the branch labels bootstrap.
Reroot the tree using one of the outgroup taxa and the Reroot button in the menu across the top of your screen.
Ladderize the tree using the Order nodes function under Trees (left menu).
Finally, show bootstrap values on branches by selecting the appropriate measure in the Branch Labels menu. Can you see which clades are well supported and which ones are not?
Using Garli 0.95 for ML analyses
In this set of analyses, we will use the rate matrices you generated last week to specify a model of nucleotide evolution for your analyses. Garli is really fast but has been criticized for getting stuck in local optima, so we will run our analyses five different times and compare the likelihood scores of our runs.
Execute the Garli 0.95 application by clicking on the icon. From the File menu, choose Open and locate your Clusia_total.nex file.
In the General screen, there is an option to set Bootstrap repetitions; we’ll not use this now, but to estimate statistical clade support for a particular phylogeny estimate in the future, use the BS option.
Go to the Model tab and set the Substitution model, Base frequencies, and Among-site variation using the GTR+I+G model you estimated last week in class:
unconstrained loglikelihood: -3859.258
Proportion of invariant sites: 0.09850988
Discrete gamma model
Number of rate categories: 4
Shape parameter: 0.6157474
a c g t
a 0.000000 1.1937673 3.5368278 1.221226
c 1.193767 0.0000000 0.5343681 6.839743
g 3.536828 0.5343681 0.0000000 1.000000
t 1.221226 6.8397431 1.0000000 0.000000
0.235116 0.2665526 0.2775702 0.2207611
Next, go to the Run tab and start the analysis. What you will see in the Run window are the real-time results of your analysis. What is being shown here?
|Run 1||Run 2||Run 3||Run 4||Run 5|
|Length of run:|
Viewing the results
The Garli phylogeny estimate will be output to your working directory! Navigate to your working directory and open the file called Clusia_total.best.tre in FigTree.
How does this tree compare to your RaxML tree?
For your homework, I will be providing you with instructions for running PhyML analyses. For now, please do the following:
- Run an additional analysis in Garli with 100 bootstrap replicates. Open the result in FigTree and modify the tree to show statistical bootstrap support. Root and ladderize the tree. Play around with branch thickness, font size and type, and some of the other options for visualizing trees.
- Using the Gustafsson and Bittrich (2002) tree showing subgenera, highlight each of the subgeneric clades in your Garli and RaxML trees. When you are satisfied with the amount and type of information show in your trees, export the trees as image files and send them to me via email.
**Gustafsson & Bittrich. 2002. Evolution of morphological diversity and resin secretion in flowers of Clusia (Clusiaceae): insights from ITS sequence variation. Nordic Journal of Botany 22(2): 183-203.