ECBART42909
PLoS One
January 1, 2013;
8
(6):
e66245.
A simple method for estimating informative node age priors for the fossil calibration of molecular divergence time analyses.
Nowak MD
,
Smith AB
,
Simpson C
,
Zwickl DJ
.
Abstract
Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distributionfitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulationbased sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates.
PubMed ID:
23755303
PMC ID:
PMC3673923
Article link:
PLoS One
Species referenced:
Echinodermata
Genes referenced:
hpd
irak1bp1
Article Images:
[+] show captions

Figure 1. Simplified Diagram of the Model.Our method provides an estimate for the length of time after age of the MRCA of a clade but prior to the age of the oldest fossil (i.e. the missing history). This hypothetical clade has N = 11 lineages at time T, representing the current standing diversity of the group. Thick bars on the internal branches of the tree represent the preserved fossil history of the clade, such that n = 1 lineage preserved at time t. The expressions for deriving the probability of the three key temporal durations in the history of a clade are shown.


Figure 2. Example informative divergence time priors estimated with the SNAPE v1.0 software.These likelihood curves and associated bestfit gamma distributions show some of the variation in prior shape that can be estimated using this method. The yaxis scale is the likelihood (or f for the bestfit gamma distribution) and the xaxis is in millions of years ago (MYA). Note that the scale of discretized likelihood curve and the gamma distribution are not equivalent, and they must be scaled to assist in visualization. A. Estimated prior distribution for the root node in the echinoid data set. Values of the discretized likelihood curve are shown in black, and the bestfit gamma distribution is shown in red. Horizontal lines representing the 95%, 75%, and 50% quantiles of the discretized likelihood curve are labeled on the figure. The quantile values are shown here only for reference when interpreting the simulation results shown in Figure 3. B. Estimated prior distribution for the MRCA of the mammalian order Rodentia. The input data for this prior estimate was assembled by searching the Paleobiology Database (www.pbdb.org) for all Rodentia occurrences (see File S2). This analysis assumed the existence of 400 extant genera in Rodentia. The oldest Rodentia fossil occurrence that met the input data criteria was 55.8 Ma. The vertical line shows the position of the Cretaceous/Paleogene (K/PG) boundary at 65.5 Ma. The analysis was performed once for each of four preservation rates: 0.1 = black; 0.2 = blue; 0.3 = orange; 0.4 = yellow. The bestfit gamma distribution for the likelihood curve assuming a 0.1 preservation rate is shown in red. This prior for the age of the MRCA of Rodentia was estimated solely for demonstration purposes. The results show how the preservation rate estimate provided by the user can have a large impact on the shape of the prior estimated.


Figure 3. Performance of the method with simulated data.For three different preservation rate categories (0.1, 0.45, and 0.8) a total of 1000 simulation replicates were analyzed using the SNAPE v1.0 software. Method success was determined by the likelihood of the true TMRCA being greater than the 50% quantile of the discretized likelihood curve, which is shown by the purple bars. The percentage of replicates in which the method failed to meet this standard is shown in red. Replicates that failed due to an inability to calculate origination and extinction rates are shown in black. Simulation replicates in which the method returned a prior in which the likelihood of the true TMRCA was greater than the 75% quantile were considered accurate and these are shown in blue. Those replicates in which the prior showed the likelihood of the true TMRCA was greater than the 95% quantile were considered highly accurate, and the proportion of replicates meeting this standard are shown in green.


Figure 4. Echinoid divergence times estimated using two alternative node age prior calibration schemes.Bars on nodes represent the 95% HPD of the node age and are colored by the two prior calibration schemes used: red bars = uniform priors; blue bars = informative gamma priors; purple = overlap of 95% HPD from both approaches. The tree represents the highest a posteriori chronogram for the analyses run with informative gamma priors, and the nodes are placed at the mean of the posterior distribution of node age. The bright red vertical dash on each node bar represents the mean of that node's age from the posterior distribution of the analyses run with uniform priors. Nodes are numbered as in Table 2, and calibration nodes are indicated with an asterisk. The scale at the bottom of the figure is in millions of years before present (Ma), and the time scale is binned by 50 Ma intervals. The tips of the tree are labeled by genus name as in Smith et al. [21], [38]. Posterior clade probabilities are provided in Figure S1.


Figure 5. Simulating the impacts of incomplete preservation on the estimation of informative node age priors.To test the sensitivity of our method of prior estimation to the quality of the fossil record (i.e. under varying rates of fossil preservation), we simulated fossil occurrences for all fossil lineages in each of the eight constraint nodes and subsampled these under four preservation rates (0.2, 0.4, 0.6, 0.8). We constructed node age priors for each simulated data set, and summarized the results using boxplots of the 95% density of the estimated gamma distributions (measured in millions of years) for each of the four preservation rates grouped by calibration node (following the node numbering scheme in Figure 2, and Tables 1 and 2). Note that higher rates of fossil preservation reduce the 95% density of the gamma distribution significantly, which shows that when provided with data of higher quality (i.e. more meaningful for calibrating the age of the node in question), the method provides a more informative prior distribution. Conversely, when the method is provided with less informative fossil data (i.e. data simulated under a poor preservation rate), it provides a prior distribution that is less informative, and thus likely to have less of an impact in the resulting divergence time analysis.

References [+] :
Alfaro ME,
Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates.
2009,
Pubmed
Alfaro ME,
Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates.
2009,
Pubmed
Alroy J,
Colloquium paper: dynamics of origination and extinction in the marine fossil record.
2008,
Pubmed
Benton MJ,
Paleontological evidence to date the tree of life.
2007,
Pubmed
Clarke JT,
Establishing a timescale for plant evolution.
2011,
Pubmed
Clayton JW,
Recent longdistance dispersal overshadows ancient biogeographical patterns in a pantropical angiosperm family (Simaroubaceae, Sapindales).
2009,
Pubmed
Donoghue PC,
Rocks and clocks: calibrating the Tree of Life using fossils and molecules.
2007,
Pubmed
Dornburg A,
Integrating fossil preservation biases in the selection of calibrations for molecular divergence time estimation.
2011,
Pubmed
Drummond AJ,
Bayesian phylogenetics with BEAUti and the BEAST 1.7.
2012,
Pubmed
Drummond AJ,
Relaxed phylogenetics and dating with confidence.
2006,
Pubmed
Foote M,
Fossil preservation and the stratigraphic ranges of taxa.
1996,
Pubmed
Foote M,
Evolutionary and preservational constraints on origins of biologic groups: divergence times of eutherian mammals.
1999,
Pubmed
Friedman M,
Sequences, stratigraphy and scenarios: what can we say about the fossil record of the earliest tetrapods?
2011,
Pubmed
Graur D,
Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision.
2004,
Pubmed
Heled J,
Calibrated tree priors for relaxed phylogenetics and divergence time estimation.
2012,
Pubmed
Himmelmann L,
TreeTime: an extensible C++ software package for Bayesian phylogeny reconstruction with timecalibration.
2009,
Pubmed
Ho SY,
The effect of inappropriate calibration: three case studies in molecular ecology.
2008,
Pubmed
Ho SY,
Accounting for calibration uncertainty in phylogenetic estimation of evolutionary divergence times.
2009,
Pubmed
Hugall AF,
Calibration choice, rate smoothing, and the pattern of tetrapod diversification according to the long nuclear gene RAG1.
2007,
Pubmed
Inoue J,
The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times.
2010,
Pubmed
Kenrick P,
Timescales and timetrees.
2011,
Pubmed
Lartillot N,
PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating.
2009,
Pubmed
Lee MS,
Phylogenetic uncertainty and molecular clock calibrations: a case study of legless lizards (Pygopodidae, Gekkota).
2009,
Pubmed
Liow LH,
When can decreasing diversification rates be detected with molecular phylogenies and the fossil record?
2010,
Pubmed
Müller J,
Four wellconstrained calibration points from the vertebrate fossil record for molecular clock estimates.
2005,
Pubmed
Marshall CR,
A simple method for bracketing absolute divergence times on molecular phylogenies using multiple fossil calibration points.
2008,
Pubmed
Parham JF,
Best practices for justifying fossil calibrations.
2012,
Pubmed
Parham JF,
Caveats on the use of fossil calibrations for molecular dating: a comment on Near et al.
2008,
Pubmed
Pyron RA,
A likelihood method for assessing molecular divergence time estimates and the placement of fossil calibrations.
2010,
Pubmed
Rannala B,
Inferring speciation times under an episodic molecular clock.
2007,
Pubmed
Renner SS,
Relaxed molecular clocks for dating historical plant dispersal events.
2005,
Pubmed
Rutschmann F,
Assessing calibration uncertainty in molecular dating: the assignment of fossils to alternative calibration points.
2007,
Pubmed
Sallam HM,
Fossil and molecular evidence constrain scenarios for the early evolutionary and biogeographic history of hystricognathous rodents.
2009,
Pubmed
Sauquet H,
Testing the impact of calibration on molecular divergence times using a fossilrich group: the case of Nothofagus (Fagales).
2012,
Pubmed
Shaul S,
Playing chicken (Gallus gallus): methodological inconsistencies of molecular divergence date estimates due to secondary calibration points.
2002,
Pubmed
Smith AB,
Testing the molecular clock: molecular and paleontological estimates of divergence times in the Echinoidea (Echinodermata).
2006,
Pubmed
,
Echinobase
Tavaré S,
Using the fossil record to estimate the age of the last common ancestor of extant primates.
2002,
Pubmed
Thorne JL,
Estimating the rate of evolution of the rate of molecular evolution.
1998,
Pubmed
Warnock RC,
Exploring uncertainty in the calibration of the molecular clock.
2012,
Pubmed
Wilkinson RD,
Dating primate divergences through an integrated analysis of palaeontological and molecular data.
2011,
Pubmed
Yang Z,
Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds.
2006,
Pubmed
dos Reis M,
Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny.
2012,
Pubmed