ECB-ART-42909PLoS One January 1, 2013; 8 (6): e66245.
A simple method for estimating informative node age priors for the fossil calibration of molecular divergence time analyses.
Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distribution-fitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulation-based sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates.
PubMed ID: 23755303
PMC ID: PMC3673923
Article link: PLoS One
Species referenced: Echinodermata
Genes referenced: hpd irak1bp1
Article Images: [+] show captions
|Figure 1. Simplified Diagram of the Model.Our method provides an estimate for the length of time after age of the MRCA of a clade but prior to the age of the oldest fossil (i.e. the missing history). This hypothetical clade has N = 11 lineages at time T, representing the current standing diversity of the group. Thick bars on the internal branches of the tree represent the preserved fossil history of the clade, such that n = 1 lineage preserved at time t. The expressions for deriving the probability of the three key temporal durations in the history of a clade are shown.|
|Figure 2. Example informative divergence time priors estimated with the SNAPE v1.0 software.These likelihood curves and associated best-fit gamma distributions show some of the variation in prior shape that can be estimated using this method. The y-axis scale is the likelihood (or f for the best-fit gamma distribution) and the x-axis is in millions of years ago (MYA). Note that the scale of discretized likelihood curve and the gamma distribution are not equivalent, and they must be scaled to assist in visualization. A. Estimated prior distribution for the root node in the echinoid data set. Values of the discretized likelihood curve are shown in black, and the best-fit gamma distribution is shown in red. Horizontal lines representing the 95%, 75%, and 50% quantiles of the discretized likelihood curve are labeled on the figure. The quantile values are shown here only for reference when interpreting the simulation results shown in Figure 3. B. Estimated prior distribution for the MRCA of the mammalian order Rodentia. The input data for this prior estimate was assembled by searching the Paleobiology Database (www.pbdb.org) for all Rodentia occurrences (see File S2). This analysis assumed the existence of 400 extant genera in Rodentia. The oldest Rodentia fossil occurrence that met the input data criteria was 55.8 Ma. The vertical line shows the position of the Cretaceous/Paleogene (K/PG) boundary at 65.5 Ma. The analysis was performed once for each of four preservation rates: 0.1 = black; 0.2 = blue; 0.3 = orange; 0.4 = yellow. The best-fit gamma distribution for the likelihood curve assuming a 0.1 preservation rate is shown in red. This prior for the age of the MRCA of Rodentia was estimated solely for demonstration purposes. The results show how the preservation rate estimate provided by the user can have a large impact on the shape of the prior estimated.|
|Figure 3. Performance of the method with simulated data.For three different preservation rate categories (0.1, 0.45, and 0.8) a total of 1000 simulation replicates were analyzed using the SNAPE v1.0 software. Method success was determined by the likelihood of the true TMRCA being greater than the 50% quantile of the discretized likelihood curve, which is shown by the purple bars. The percentage of replicates in which the method failed to meet this standard is shown in red. Replicates that failed due to an inability to calculate origination and extinction rates are shown in black. Simulation replicates in which the method returned a prior in which the likelihood of the true TMRCA was greater than the 75% quantile were considered accurate and these are shown in blue. Those replicates in which the prior showed the likelihood of the true TMRCA was greater than the 95% quantile were considered highly accurate, and the proportion of replicates meeting this standard are shown in green.|
|Figure 4. Echinoid divergence times estimated using two alternative node age prior calibration schemes.Bars on nodes represent the 95% HPD of the node age and are colored by the two prior calibration schemes used: red bars = uniform priors; blue bars = informative gamma priors; purple = overlap of 95% HPD from both approaches. The tree represents the highest a posteriori chronogram for the analyses run with informative gamma priors, and the nodes are placed at the mean of the posterior distribution of node age. The bright red vertical dash on each node bar represents the mean of that node's age from the posterior distribution of the analyses run with uniform priors. Nodes are numbered as in Table 2, and calibration nodes are indicated with an asterisk. The scale at the bottom of the figure is in millions of years before present (Ma), and the time scale is binned by 50 Ma intervals. The tips of the tree are labeled by genus name as in Smith et al. , . Posterior clade probabilities are provided in Figure S1.|
|Figure 5. Simulating the impacts of incomplete preservation on the estimation of informative node age priors.To test the sensitivity of our method of prior estimation to the quality of the fossil record (i.e. under varying rates of fossil preservation), we simulated fossil occurrences for all fossil lineages in each of the eight constraint nodes and sub-sampled these under four preservation rates (0.2, 0.4, 0.6, 0.8). We constructed node age priors for each simulated data set, and summarized the results using boxplots of the 95% density of the estimated gamma distributions (measured in millions of years) for each of the four preservation rates grouped by calibration node (following the node numbering scheme in Figure 2, and Tables 1 and 2). Note that higher rates of fossil preservation reduce the 95% density of the gamma distribution significantly, which shows that when provided with data of higher quality (i.e. more meaningful for calibrating the age of the node in question), the method provides a more informative prior distribution. Conversely, when the method is provided with less informative fossil data (i.e. data simulated under a poor preservation rate), it provides a prior distribution that is less informative, and thus likely to have less of an impact in the resulting divergence time analysis.|
References [+] :
Alfaro ME, Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates. 2009, Pubmed