Prelab 3

Prelab 3. Taxonomy and Phylogeny

Sometimes the relationship of taxonomy to phylogeny is cloudy… on the one hand, we want our taxa and corresponding nomenclature to reflect phylogeny; more specifically, monophyly. On the other hand, many plant groups exhibit major discordance between phylogeny and morphology because of convergent evolution. One of our principal responsibilities as plant taxonomists is to disseminate information in a useful way to people who need it. In the absence of hand-held barcoding devices (and the barcodes, themselves), people rely on taxonomic keys that are based on morphology.

How do I make a key based on morphology?

You already know that discovering morphological characters that are also synapomorphies is tricky. Choosing characters to analyze is difficult, but making sure that you include an appropriate sample of plants in your analysis is also important!

When sampling, you should consider:

  1. Am I sampling from the entire known range of this taxon? Have I sampled across the different habitat types it occurs in?
  2. Have I collected a large enough number of individuals to constitute a statistical sample?
  3. Have I included the range of morphological variation exhibited by this taxon?

If you feel confident in these respects, it is time to choose the characters for analysis. Morphological analyses are conducted for many different reasons, to answer many different kinds of evolutionary questions. Therefore, the characters you choose to analyze will be ones that are relevant to your particular hypothesis.

In this example, I will demonstrate one way that I chose to address the specific question, “Can any morphological characters, or suites of characters, be used to distinguish among Californian Pyrola species?”

I am going to summarize the first few steps of the process.

1. Ask yourself, have I observed variation in any particular characteristics? Do I expect there to be variation in any particular characteristics as a result of natural selection? How much time do I have to complete this project? How many measurements per plant can I afford to make and also measure enough individuals to constitute a statistical sample?

2. After deciding on some characters, begin your measurements. Make measurements only on mature structures so that you can be sure that size differences are heritable, not due to differences in developmental stage. When you have conducted several measurements of a single structure (for example, there may be several full-size petals that can be used to measure length), calculate the mean for this character. Multiple measurements per plant serve the purpose of getting a more accurate estimate of the size in one individual, but the individual measures are not independent because they are all from a single individual.

3. Enter your data into a spreadsheet (see attached Excel data file) with different characters in columns and individuals as rows. You may also want to include information about which subtaxon the individuals belong to or what geographic area they are from—this information can be included as additional columns.

Save your spreadsheet as a comma-delimited values (.csv) file for later use in R…

Now for today’s exercise—switching gears…

As usual, set up a working directory for your project before you execute R Studio. Make sure the dataset C_robustipina_2004_partial.csv is in your working directory.

Begin R Studio.

Under the Files tab, find your project folder and set your working directory in the More… menu.

Install the following packages (if you don’t have them already) and load them by checking the boxes next to their names: gplots, vegan

Import the data set provided (or your own) using the read.csv command:

data<- read.csv(“C_robustipina_2004_partial.csv”)

Check the dimensions and column names (they should match those in the table below) for the data set to make sure that you imported the correct data set.


If the dimensions and column names of the data look right, then continue to the next step.

Summary statistics

First, let’s examine some basic descriptive statistics for your characters. As you can see, columns 1:3 contain information concerning sample identity, while columns 4:19 contain the morphological measurements associated with those samples. Use the summary command to calculate summary statistics for each of the morphological characters:


For each character you should see several statistics, including the minimum measurement (Min.), the first quartile (1st Qu.), Median, Mean, the third quartile (3rd Qu.), and the maximum measurement (Max.).

Is the range of each character given?

What can we tell about the distribution of the data by comparing the median with the mean?

Checking characters for normality and correlation

Next, let’s make a script. Go to the top right corner of your R Studio screen and find the menu to make a new R Script (right). Paste the following text into the new window. Use the Save icon to store your new script in your working directory as paired.hist.cor

panel.cor <- function(x, y, digits = 2, prefix = “”, cex.cor, …)
usr <- par(“usr”)
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if (missing(cex.cor))
cex.cor <- 0.85/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
panel.hist <- function(x, …)
usr <- par(“usr”)
par(usr = c(usr[1:2], 0, 1.5))
h <- hist(x, plot = FALSE)
breaks <- h$breaks
nB <- length(breaks)
y <- h$counts
y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = “cyan”, …)

Now that you’ve got your script, show yourself what it does. Highlight the entire script (including curly brackets) and from the small menu above your script window, hit the Run icon. This will send your script to the R console. You may need to hit the Return button on your keyboard to run the script.

Now that you’ve set up how you want to look at your characters, let’s access the character data by entering the following command:

pairs(data[,4:19], lower.panel=panel.smooth, diag.panel=panel.hist, upper.panel=panel.cor)

You can see from the command above that I specified only the morphological measurements, columns 4:19. In the plot window, the data are reported in three different ways: along the diagonal are histograms for each character.

  • Are the characters normally distributed?

Below the diagonal, pairs of characters are plotted against each other. Each point in these plots represents two character measurements (for example, AL and APD) for an individual plotted against each other. A “best fit” line is plotted through the points to show the relationship between each pair of characters for all of the individuals measured. Note, each point represents the mean value of a character for an individual, not all of the measurements you made for a single individual. Above the diagonal are correlation coefficients for each pair of characters. Our script modifies the size of the font to match the degree of correlation (0= uncorrelated, 1= correlated).

  • Which character(s) show the highest degree of correlation?

Introduction to Ordination Analyses

One of the ways we assess the importance of particular characters or their ability to distinguish among taxa is by conducting ordination analyses on multiple characters at once. This affords us the chance to measure many characters without bias and find out later, via multivariate statistical analyses, which characters are useful. We will be talking about ordination techniques in class, but let’s make a simple ordination plot for now using the characters that meet the criterion of being (relatively) normally distributed. Enter the following commands to specify the dataset and function:


To view an ordination plot, you may use the function:

biplot(pca, pch=19)

What you are seeing here is the arrangement of your samples that best explains overall variation among the characters we used from the dataset. The number represent row numbers and the arrows indicate how variation in each character is associated with variation associated with the transformed characters (components). Let’s look at how each of our sub-taxa are distributed in this plot:

plot(pca$scores, col=data$TAXON)