Prelab 2: Getting the software to work!
This week in lab we will begin to look at phylogenetic data using the computer programs Mesquite, R, and R Studio. You will need to download the software and conduct two simple analyses before class next week. Here goes!
Download instructions:
1. Navigate to http://mesquiteproject.org/mesquite/download/download.html in your internet browser and download the latest version of Mesquite for your operating system by clicking on MacOS or Windows. In the next window, you may be instructed to download the latest version of Java (go to java.com). When you’re ready, proceed to download the 1GB version of Mesquite—this version will allow you to dedicate more of your memory to Mesquite when running analyses. Read the installation directions and contact me if you run into any issues. I’m happy to help you with installation.
2. Navigate to http://www.r-project.org/ and follow the Download link to download R version 3.0.1 (“Good sport”). You will prompted to choose a CRAN (Comprehensive R Archive Network) mirror—choose somewhere nearby by scrolling down to U.S.A. and clicking on UC Berkeley or UCLA.
- If you are running a Mac OS, follow the link to download R-3.0.1.pkg and follow the installation instructions. Make sure that the executable you download is located in your Applications folder.
- If you are running a Windows OS, follow the link called base to download R. Make sure that the executable is in your Programs folder.
3. Next, you will need to download R Studio by navigating to http://www.rstudio.com and following the download link for your OS (Mac or Windows). If you are running a Mac OS earlier that 10.6, let me know and we’ll figure out what to do. When installed, you will want make a shortcut for R Studio to your desktop directory (Windows) or drag R Studio to your Dock (Mac) so that you can open the program easily. I think that when you execute R Studio it will help you find R immediately, but if it doesn’t please contact Diana or Mark ASAP.
Assignments:
1. Creating a data set. Locate the paper data matrix that you created for the Caminalcules lab project. Each of you should have a matrix that is approximately 14 taxa by 15+ characters. We will begin by recreating this dataset in Microsoft Excel. Use the attached file, Caminalcule_data.xlsx, replace the column names to reflect short names for your characters (like mine). It is probably easiest for someone to read the character state strings aloud while another person enters the values into the spreadsheet (less error!). IMPORTANT: Create a folder in your Desktop or Documents directory called Caminalcules. Put all your files in this folder. When you save your Excel data matrix, Save As… a Tab-delimited file (Caminalcule_data.txt) to your Caminalcules folder.
2. Trees & character tracing in Mesquite. Mesquite is a software package for the study of evolutionary biology, with an emphasis on phylogenetic analysis. It incorporates a series of modules each with different function, but at the core there is a spreadsheet for a taxon by character matrix and a tree window for the manipulation of phylogenetic trees. Mesquite does not estimate phylogenies using a data matrix (though it has some limited functions for branch-swapping), but it does allow you to analyze your data in the context of a phylogeny.
To begin, click on the Mesquite icon to start the program—it typically takes a minute or two to boot up. Under the File menu, choose Open File…, locate Caminalcule_data.txt, and open it. You will be prompted to indicate what kind of data file this is, so choose Tab-delimited categorical data file. The first line of your file contains character names, so click Yes when prompted. (You may be prompted to save a file named “Caminalcule_data.txt.nex” at this point, save the file as “Caminalcule_data.nex”). You will see a Project window. Take a moment to explore the project window. Note that there are rows titled Taxa and Character Matrix. Note that under the title Taxa, the number of taxa in the matrix is given (14). If you click on List & Manage Taxa a new tab will open, showing the taxon names (in the Taxon column) and columns for Taxon order and Group. Close the Taxa tab by clicking the x in the upper right of the tab (you should return to the project window).
View your data matrix by clicking on Show Matrix, in the Character Matrix row. Move the cursor over the icons. Note that when the cursor is over an icon, the function of that icon is reported in the bar at the bottom of the window.
Okay, let’s quickly conduct a simplified tree estimate so that you can minimally find your way around the program. First, under the Taxa&Trees menu, choose Make new trees block from and choose Tree Search and choose Mesquite Heuristic (add & rearrange).
3. Using R Studio to run analyses in R. Execute R Studio by clicking on the icon. The R Studio should be divided into four quadrants (modules). The bottom left module is your R console. In this terminal, you should see that R version 3.0.1– “Good Sport” has been loaded automatically.
There a several tabs in your bottom right module. Click on the Packages tab to view a list of the packages that were automatically installed with R 3.0.1. Right now, we are interested in using a package called ‘phangorn’ (Schleip & Paradis, 2013) that, in addition to several other analyses, will permit us to estimate a tree using the parsimony criterion. To acquire this package, click on Install Packages (just under the Packages tab)—a pop-up window should appear where you can enter the name of the Package you want to install. At the bottom of the pop-up window should be a box called Install dependencies.
!! Make sure this box is checked !!
Next, enter the name phangorn in the Package field and click Install. You should be able to watch the progress of installation in your R console (bottom left). When installation is complete, go to the list of Packages and make sure that the boxes next to the following packages are checked: ape, igraph, lattice, Matrix, rgl
The last thing we want to do for now is to upload a dataset. In the bottom right module, go to the Files tab to see a list of directories on your computer’s hard drive. Find the directory that contains your Caminalcules folder, click on it, locate your Caminalcules folder and click on it. You should see the tab-delimited (.txt) and nexus (.nex) files that you’ve made thus far.
Find the menu icon near the Files tab called More and select it to view the dropdown menu. Then select Set As Working Directory. In your R console, you should see the command setwd appear with a path to your Caminalcules folder. You just specified that any files you use for analyses (i.e., your matrix) or save as a result of your analyses (i.e., trees, etc.) will be in this folder. Good job!
Return to your R console (bottom left).
Next, copy the following text and paste it into the console following the carat (i.e., > )
cam<-read.nexus.data(“Caminalcule_data.nex”)
To execute the command, hit return. You have just created an object in R called cam— this object is your character matrix. Let’s take a look at it by copying this text into your console:
cam
Your matrix should look something like this:
$C1
[1] “1” “0” “0” “0” “0” “1” “0” “0”
$C2
[1] “0” “1” “1” “1” “1” “0” “1” “1”
$C3
[1] “0” “0” “0” “1” “0” “0” “0” “1”
$C4
[1] “0” “1” “1” “0” “1” “0” “1” “0”
$C9
[1] “0” “1” “1” “0” “0” “0” “1” “1”
$C12
[1] “1” “1” “0” “1” “1” “0” “0” “0”
It’s not in a format that we can use for parsimony analysis yet, so let’s convert it by making a new object, cam1, with the character states we specified:
cam1 <- phyDat(cam, type=”USER”, levels=c(“-3″,”-2″,”-1″,”0″,”1″,”2″,”3″), return.index=TRUE)
You should be able to check out the dimensions of your new object by copying this text into your console:
cam1
To estimate your first tree, we will use a simple method, Fitch parsimony. To do this, we’ll have to create a new object (your tree):
tree<-random.addition(cam1, method=”fitch”)
To view your first tree in R, use the command:
plot(tree)
GREAT WORK! SEE YOU IN CLASS!