| Title: | Test for Differences in Diversification Rates over Time |
| Version: | 1.0.0 |
| Description: | Employ time-calibrated phylogenies and trait/range data to test for differences in diversification rates over evolutionary time. Extend the STRAPP test from BAMMtools::traitDependentBAMM() to any time step along phylogenies. See inst/COPYRIGHTS for details on third-party code. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| URL: | https://github.com/MaelDore/deepSTRAPP, https://maeldore.github.io/deepSTRAPP/ |
| BugReports: | https://github.com/MaelDore/deepSTRAPP/issues |
| Imports: | ape (≥ 5.8.0), BAMMtools (≥ 2.1.12), cladoRcpp (≥ 0.15.1), coda (≥ 0.19-4.1), cowplot (≥ 1.1.3), dplyr (≥ 1.1.4), dunn.test (≥ 1.3.6), geiger (≥ 2.0.11), ggplot2 (≥ 3.5.0), methods (≥ 4.4.2), phytools (≥ 2.0.0), plyr (≥ 1.8.9), purrr (≥ 1.0.2), qpdf (≥ 1.4.1), RColorBrewer (≥ 1.1-3), Rcpp (≥ 1.0.14), scales (≥ 1.3.0), stringr (≥ 1.5.1), tidyr (≥ 1.3.1) |
| Depends: | R (≥ 4.4) |
| LazyData: | true |
| LazyDataCompression: | xz |
| Suggests: | knitr, maps (≥ 3.4.2), parallel (≥ 4.4.2), BioGeoBEARS (≥ 1.1.3), rmarkdown |
| Additional_repositories: | https://maeldore.github.io/drat |
| LinkingTo: | Rcpp |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2026-01-13 16:22:36 UTC; Mael |
| Author: | Maël Doré |
| Maintainer: | Maël Doré <mael.dore@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-19 17:30:02 UTC |
Template file for BAMM diversification analyses
Description
Template file for BAMM diversification analyses provided as character strings.
Source: bamm-2.5.0 References: http://bamm-project.org/; https://github.com/macroevolution/bamm
Usage
data(BAMM_template_diversification)
Format
A vector of 260 character strings.
Details
A vector of 260 character strings that can be displayed with print(BAMM_template_diversification).
This is the template used to generate the 'config_file.txt' controlling settings used for a BAMM run.
It provides detailed explanations of the additional_BAMM_settings that can be used in prepare_diversification_data()
to customize the BAMM run.
It is called internally by prepare_diversification_data() to produce
the custom 'config_file' used in the subsequent BAMM run.
References
Authors: Daniel Rabosky (BAMM) & Pascal Title ({BAMMtools}). Modified by Maël Doré for deepSTRAPP.
See Also
BAMM software website: http://bamm-project.org/
Convert Biogeographic Stochastic Map (BSM) to phytools SIMMAP stochastic map (SM) format
Description
These functions converts a Biogeographic Stochastic Map (BSM) output from BioGeoBEARS into
a simmap object from R package {phytools} (See phytools::make.simmap()).
They require a model fit with BioGeoBEARS::bears_optim_run() and the output of a Biogeographic Stochastic Mapping
performed with BioGeoBEARS::runBSM() to produce simmap objects as phylogenies with the associated
mapping of range evolution along branches across simulations.
-
BSM_to_phytools_simmap(): Produce onesimmapfor the required simulation (index of the simulation provided withsim_index). -
BSMs_to_phytools_simmaps(): Produce allsimmapobjects for all simulations stored in a uniquemultiSimmapobject.
Initial functions in R package BioGeoBEARS by Nicholas J. Matzke:
-
BioGeoBEARS::BSM_to_phytools_SM() -
BioGeoBEARS::BSMs_to_phytools_SMs()
Usage
BSM_to_phytools_simmap(model_fit, phylo, BSM_output, sim_index)
BSMs_to_phytools_simmaps(model_fit, phylo, BSM_output)
Arguments
model_fit |
A BioGeoBEARS results object, produced by ML inference via |
phylo |
Time-calibrated phylogeny used in the BioGeoBEARS analyses to produce the historical biogeographic inference
and run the Biogeographic Stochastic Mapping. Object of class |
BSM_output |
A list with two objects, a cladogenetic events table and an anagenetic events table, as the result of
Biogeographic Stochastic Mapping conducted with |
sim_index |
Integer. Index of the biogeographic simulation targeted to produce the |
Details
These functions are slight adaptations of original functions from the R Package BioGeoBEARS by N. Matzke.
Initial functions: BioGeoBEARS::BSM_to_phytools_SM() BioGeoBEARS::BSMs_to_phytools_SMs()
Changes:
Solves issue with differences in ranges allowed across time-strata.
Requires directly the output of
BioGeoBEARS::runBSM()instead of separated cladogenetic and anagenetic event tables.Update the documentation.
Value
The BSM_to_phytools_simmap() function returns a list with two elements:
-
$simmapA uniquesimmapfor a given biogeographic simulation as an object of classesc("simmap", "phylo"). This is a modified{ape}tree with additional elements to report range mapping, model parameters and likelihood.-
$mapsA list of named numerical vectors. Provides the mapping of ranges along each remaining edge. Names are the ranges. Values are residence times in each state across segments -
$mapped.edgeA numerical matrix. Provides the evolutionary time spent across ranges (columns) along the edges (rows). row.names() are the node ID at the rootward and tipward ends of each edge. -
$QNumerical matrix. The transition rates across ranges calculated from the ML parameter estimates of the model. -
$logLNumeric. The log-likelihood of the data under the ML model.
-
-
$residence_timesData.frame with two rows. Summarizes the residence time spent in each range along all branches, in (raw) evolutionary time (i.e., branch lengths), and in percentage (perc).
The BSMs_to_phytools_simmaps() function loop around the BSM_to_phytools_simmap() function to aggregate all simmaps
from all biogeographic simulations in a unique list of classes c("multiSimmap", "multiPhylo").
Each element in the
$simmapof a biogeographic simulation obtained withBSM_to_phytools_simmap().-
$residence_timessummary data.frames are not preserved.
Notes on BioGeoBEARS
The R package BioGeoBEARS is needed for this function to work with biogeographic data.
Please install it manually from: https://github.com/nmatzke/BioGeoBEARS.
Notes on using the resulting simmap object in phytools (adapted from Nicholas J. Matzke)
The phytools functions, like phytools::countSimmap(), will only count the anagenetic events
(range transitions occurring along branches) as it was written assuming purely anagenetic models.
It remains possible to extract cladogenetic events (range transitions occurring at speciation)
by comparing the last-state-below-a-node with the descendant-pairs-above-a-node.
However, it is recommended to use the built-in functions from BioGeoBEARS to summarize
the biogeographic history based on the tables of cladogenetic and anagenetic events obtained
from BioGeoBEARS::runBSM(). simmap objects should primarily be considered as a tool for visualization.
Associated functions in R package BioGeoBEARS:
-
BioGeoBEARS::simulate_source_areas_ana_clado(): To select randomly a unique area source for transition from a multi-area state to a single area. -
BioGeoBEARS::get_dmat_times_from_res(): To generate matrices of range expansion from source area to destination area. -
BioGeoBEARS::count_ana_clado_events(): To count the number and type of events from BSM tables. -
BioGeoBEARS::hist_event_counts(): To plot histograms of event counts across BSM tables.
Please note carefully that area-to-area dispersal events are not identical with the state transitions.
For example, a state can be a geographic range with multipleareas, but under the logic of DEC-type models,
a range-expansion event like ABC->ABCD actually means that a dispersal happened from some specific area (A, B, or C)
to the new area. BSMs track this area-to-area sourcing in its cladogenetic and anagenetic event tables,
at least if BioGeoBEARS::simulate_source_areas_ana_clado() has been run on the output of BioGeoBEARS::runBSM().
Author(s)
Nicholas J. Matzke. Contact: matzke@berkeley.edu
Changes by Maël Doré (see Details)
References
For BioGeoBEARS: Matzke, Nicholas J. (2018). BioGeoBEARS: BioGeography with Bayesian (and likelihood) Evolutionary Analysis with R Scripts. version 1.1.1, published on GitHub on November 6, 2018. doi:10.5281/zenodo.1478250. Website: http://phylo.wikidot.com/biogeobears.
See Also
phytools::countSimmap() phytools::make.simmap()
BioGeoBEARS::simulate_source_areas_ana_clado() BioGeoBEARS::get_dmat_times_from_res()
BioGeoBEARS::count_ana_clado_events() BioGeoBEARS::hist_event_counts()
Examples
if (deepSTRAPP::is_dev_version())
{
## Run only if you have R package 'BioGeoBEARS' installed.
# Please install it manually from: https://github.com/nmatzke/BioGeoBEARS")
## Load phylogeny and tip data
library(phytools)
data(eel.tree)
# (May take several minutes to run)
## Load directly output of prepare_trait_data() run on biogeographic data
data(eel_biogeo_data, package = "deepSTRAPP")
## Convert BSM output into a unique simmap, including residence times
simmap_1 <- BSM_to_phytools_simmap(model_fit = eel_biogeo_data$best_model_fit,
phylo = eel.tree,
BSM_output = eel_biogeo_data$BSM_output,
sim_index = 1)
# Explore output
str(simmap_1, max.level = 1)
# Print residence times in each range
simmap_1$residence_times
# Plot simmap
plot(simmap_1$simmap)
## Convert BSM output into all simmaps in a multiSimmap/multiPhylo object
all_simmaps <- BSMs_to_phytools_simmaps(model_fit = eel_biogeo_data$best_model_fit,
phylo = eel.tree,
BSM_output = eel_biogeo_data$BSM_output)
# Explore output
str(all_simmaps, max.level = 1)
# Plot simmap n°1
plot(all_simmaps[[1]])
}
Dataset providing biogeographic range data for extant ponerine ants
Description
A data.frame of range location for the 1534 extant ponerine ant taxa (Ponerinae subfamily).
Source: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3
Usage
data(Ponerinae_binary_range_table)
Format
A data.frame with 1534 rows and 10 columns.
Details
A data.frame of range locations covering the 1534 extant ponerine ant taxa (Ponerinae subfamily).
-
$TaxaCharacter string. Names of ponerinae ant taxa. -
$AfrotropicsLogical. Whether the range of the taxa extends to Afrotropics. -
$AustralasiaLogical. Whether the range of the taxa extends to Australasia. -
$IndomalayaLogical. Whether the range of the taxa extends to Indomalaya. -
$NearcticLogical. Whether the range of the taxa extends to Nearctic. -
$NeotropicsLogical. Whether the range of the taxa extends to Neotropics. -
$Eastern_PalearcticLogical. Whether the range of the taxa extends to Eastern Palearctic. -
$Western_PalearcticLogical. Whether the range of the taxa extends to Western Palearctic. -
$Old_WorldLogical. Whether the range of the taxa extends to the Old World: encompassing any bioregion among Afrotropics, Australasia, Indomalaya, Eastern Palearctic, or Western Palearctic. -
$New_WorldLogical. Whether the range of the taxa extends to the New World: encompassing any bioregion among Nearctic, or Neotropics.
References
Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3
Data summarizing the evolution of geographic ranges in Ponerinae ants using an old ill-calibrated phylogeny for illustrative purposes
Description
A list containing geographic ranges data of Ponerinae mapped on the old ill-calibrated phylogeny,
modeled with R package BioGeoBEARS. Ranges are labeled between "Old World" (O) and New World (N).
This object was obtained with prepare_trait_data().
The phylogeny used is also NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.
Source: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3
Usage
data(Ponerinae_biogeo_data_old_calib)
Format
A list with 6 elements.
Details
A list of five objects containing information on the evolution of feeding habits in eels.
This object was obtained with prepare_trait_data().
-
$densityMapsList of objects of class"densityMapthat contains a phylogenetic tree and associated mapping of probability to harbor a given range along branches. The list contains only a"densityMapper unique areas becausesplit_multi_area_rangeswas set to TRUE. In this case, unique areas are "N" (= "New World") and "O" (= "Old World) -
$densityMaps_all_rangesList of objects of class"densityMapthat contains a phylogenetic tree and associated mapping of probability to harbor a given range along branches. The list contains one"densityMapper range found along branches during the simulated biogeographic histories. Here those ranges are "N" (= "New World"), "O" (= "Old World), and "NO" for multi-area ranges encompassing both regions. -
$trait_data_typeCharacter string. Record the type of trait data. Here: "biogeographic". -
$aceNumerical matrix that record the posterior probabilities of ancestral ranges estimated at internal nodes. Only unique areas (i.e., "N" and "O") are considered among the ranges. Multi-area ranges (i.e., "NO") have been split among unique ranges. Rows are internal nodes. Columns are ranges. Values are posterior probabilities of each range per node. -
$ace_all_rangesNumerical matrix that record the posterior probabilities of ancestral ranges estimated at internal nodes. All ranges observed along branches during the simulated biogeographic histories are present (i.e., "N", "O", and "NO"). Rows are internal nodes. Columns are ranges. Values are posterior probabilities of each range per node. -
$model_selection_dfData.frame that summarizes model comparisons used to select the best fitting model.
References
Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3
See Also
Data summarizing the evolution of fake size data in Ponerinae ants using a 2-level factor as categorical trait
Description
A list containing fake size data of Ponerinae ants mapped on the phylogeny,
modeled with geiger::fitDiscrete. This object was obtained with prepare_trait_data().
This is NOT real biological/ecological data. They were designed for illustrative purposes only.
The phylogeny used is also NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.
Usage
data(Ponerinae_cat_2lvl_data_old_calib)
Format
A list with 5 elements.
Details
A list of five objects containing information on the evolution of fake size data in Ponerinae ants.
This object was obtained with prepare_trait_data().
-
$densityMapsList of two objects of class"densityMapthat contains a phylogenetic tree and associated mapping of probability to harbor a given state along branches. The list contains one"densityMapper state found in thetip_data(i.e., "large" and "small"). -
$trait_data_typeCharacter string. Record the type of trait data. Here: "categorical". -
$aceNumerical matrix that record the posterior probabilities of ancestral states (characters) estimates (ACE) at internal nodes. Rows are internal nodes. Columns are states (i.e., "large" and "small"). Values are posterior probabilities of each state per node. -
$best_model_fitList that provides the output of the best fitting model (Here: ARD model). -
$model_selection_dfData.frame that summarizes model comparisons used to select the best fitting model.
See Also
Data summarizing the evolution of fake habitat data in Ponerinae ants using a 3-level factor as categorical trait
Description
A list containing fake habitat data of Ponerinae ants mapped on the phylogeny,
modeled with geiger::fitDiscrete. This object was obtained with prepare_trait_data().
This is NOT real biological/ecological data. They were designed for illustrative purposes only.
The phylogeny used is also NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.
Usage
data(Ponerinae_cat_3lvl_data_old_calib)
Format
A list with 5 elements.
Details
A list of five objects containing information on the evolution of fake habitat data in Ponerinae ants.
This object was obtained with prepare_trait_data().
-
$densityMapsList of three objects of class"densityMapthat contains a phylogenetic tree and associated mapping of probability to harbor a given state along branches. The list contains one"densityMapper state found in thetip_data(i.e., "arboreal", "subterranean", and "terricolous"). -
$trait_data_typeCharacter string. Record the type of trait data. Here: "categorical". -
$aceNumerical matrix that record the posterior probabilities of ancestral states (characters) estimates (ACE) at internal nodes. Rows are internal nodes. Columns are states (i.e., "large" and "small"). Values are posterior probabilities of each state per node. -
$best_model_fitList that provides the output of the best fitting model (Here: ARD model). -
$model_selection_dfData.frame that summarizes model comparisons used to select the best fitting model.
See Also
Data summarizing the evolution of a fake continuous trait in Ponerinae ants extracted for 10 Mya
Description
A list containing estimated values for a fake continuous trait mapped on the Ponerinae ant phylogeny,
modeled with geiger::fitContinuous. This object was obtained with extract_most_likely_trait_values_for_focal_time()
applied on trait evolution data obtained with prepare_trait_data().
The phylogeny used is NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.
Usage
data(Ponerinae_trait_cont_tip_data_10My)
Format
A list with 4 elements.
Details
A list of four elements containing information on the evolution of a continuous trait in Ponerinae ants extracted for 10 Mya.
-
$trait_dataNamed vector of numerical values. Names are the taxa or internal tipward node ID associated with the values. Values are the continuous trait data estimated along branches for 10 Mya. -
$focal_timeNumerical. Time in the past at which the trait data where extracted. -
$trait_data_typeCharacter string. Record the type of trait data. Here: "continuous". -
$contMapA phylogenetic tree and associated mapping of estimated trait values. It was updated such as the tips correspond to lineages found 10 Mya (i.e., at the focal time in the past).
Dataset providing fake trait data for extant ponerine ants for illustrative purposes
Description
A data.frame of fake trait data covering the 1534 extant ponerine ant taxa (Ponerinae subfamily). This is NOT real biological/ecological data. They were designed for illustrative purposes only.
Usage
data(Ponerinae_trait_tip_data)
Format
A data.frame with 1534 rows and 4 columns.
Details
A data.frame of fake trait data covering the 1534 extant ponerine ant taxa (Ponerinae subfamily).
-
$TaxaCharacter string. Names of ponerinae ant taxa. -
$fake_cont_tip_dataNumeric. Fake continuous trait data. -
$fake_cat_2lvl_tip_dataVector of character strings. Fake categorical size data with two levels: "large" and "small". -
$fake_cat_3lvl_tip_dataVector of character strings. Fake categorical habitat data with three levels: "arboreal", "subterranean", and "terricolous".
Dataset providing the extensive time-calibrated phylogeny of extant ponerine ants
Description
A phylo object describing the time-calibrated phylogeny of the 1534 extant ponerine ants (Ponerinae subfamily).
Source: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3
Usage
data(Ponerinae_tree)
Format
A phylo object with 4 elements.
Details
A time-calibrated phylogeny as a phylo object with 4 elements.
-
$edgeMatrix of integers. Defines the tree topology by providing rootward and tipward node ID of each edge. -
$edge.lengthVector of numerical. Length of edges/branches. -
$NnodeInteger. Number of internal nodes. -
$tip.labelVector of character strings. Labels of all tips.
References
Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3
Dataset providing the extensive time-calibrated phylogeny of extant ponerine ants using an old calibration for illustrative purposes
Description
A phylo object describing the time-calibrated phylogeny of the 1534 extant ponerine ants (Ponerinae subfamily).
This is NOT a properly time-calibrated phylogeny. It uses an ill-designed old calibration for illustrative purposes.
For a proper time-calibrated phylogeny of ponerine ants, see the Ponerinae_tree object in deepSTRAPP.
Source: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3
Usage
data(Ponerinae_tree_old_calib)
Format
A phylo object with 4 elements.
Details
A time-calibrated phylogeny as a phylo object with 4 elements.
-
$edgeMatrix of integers. Defines the tree topology by providing rootward and tipward node ID of each edge. -
$edge.lengthVector of numerical. Length of edges/branches. -
$NnodeInteger. Number of internal nodes. -
$tip.labelVector of character strings. Labels of all tips.
References
Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B. (2025). Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity. Nature Communications, 16, 8297. doi:10.1038/s41467-025-63709-3
Compute STRAPP to test for a relationship between diversification rates and trait data
Description
Carries out the appropriate statistical method to test for a relationship between
diversification rates and trait data for a given point in the past (i.e. the focal_time).
Tests are based on block-permutations: rates data are randomized across tips following blocks
defined by the diversification regimes identified on each tip (typically from a BAMM).
Such tests are called STructured RAte Permutations on Phylogenies (STRAPP) as described in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. doi:10.1093/sysbio/syv066.
The function is an extension of the original BAMMtools::traitDependentBAMM() function used to
carry out STRAPP test on extant time-calibrated phylogenies.
Tests can be carried out on speciation, extinction and net diversification rates.
deepSTRAPP::compute_STRAPP_test_for_focal_time() can handle three types of statistical tests depending on the type of trait data provided:
Continuous trait data
Tests for correlations between trait and rates carried out with deepSTRAPP::compute_STRAPP_test_for_continuous_data().
The associated test is the Spearman's rank correlation test (See stats::cor.test).
Binary trait data
For categorical and biogeographic trait data that have only two states (ex: 'Nearctic' vs. 'Neotropics').
Tests for differences in rates between states are carried out with deepSTRAPP::compute_STRAPP_test_for_binary_data().
The associated test is the Mann-Whitney-Wilcoxon rank-sum test (See stats::wilcox.test).
Multinominal trait data
For categorical and biogeographic trait data with more than two states (ex: 'No leg' vs. 'Two legs' vs. 'Four legs').
Tests for differences in rates between states are carried out with deepSTRAPP::compute_STRAPP_test_for_multinominal_data().
The associated test for all states is the Kruskal-Wallis H test (See stats::kruskal.test).
If posthoc_pairwise_tests = TRUE, post hoc pairwise tests between pairs of states will be carried out too.
The associated test for post hoc pairwise tests is the Dunn's post hoc pairwise rank-sum test (See dunn.test::dunn.test).
Usage
compute_STRAPP_test_for_focal_time(
BAMM_object,
trait_data_list,
rate_type = "net_diversification",
seed = NULL,
nb_permutations = NULL,
replace_samples = FALSE,
alpha = 0.05,
two_tailed = TRUE,
one_tailed_hypothesis = NULL,
posthoc_pairwise_tests = FALSE,
p.adjust_method = "none",
return_perm_data = FALSE,
nthreads = 1,
print_hypothesis = TRUE
)
Arguments
BAMM_object |
Object of class |
trait_data_list |
List obtained from |
rate_type |
A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). |
seed |
Integer. Set the seed to ensure reproducibility. Default is |
nb_permutations |
Integer. To select the number of random permutations to perform during the tests. If NULL (default), all posterior samples will be used once. |
replace_samples |
Logical. To specify whether to allow 'replacement' (i.e., multiple use) of a posterior sample when drawing samples used to carry out the test. Default is |
alpha |
Numerical. Significance level to use to compute the |
two_tailed |
Logical. To define the type of tests. If
|
one_tailed_hypothesis |
A character string specifying the alternative hypothesis in the one-tailed test. For continuous data, it is either "negative" or "positive" correlation. For binary data, it lists the trait states with states ordered in increasing rates under the alternative hypothesis, separated by a greater-than such as c('A > B'). |
posthoc_pairwise_tests |
Logical. Only for multinominal data (with more than two states). If |
p.adjust_method |
A character string. Only for multinominal data (with more than two states). It specifies the type of correction to apply to the p-values
in the post hoc pairwise tests to account for multiple comparisons. See |
return_perm_data |
Logical. Whether to return the stats data computed from the posterior samples for observed and permuted data in the output.
This is needed to plot the histogram of the null distribution used to assess significance of the test with |
nthreads |
Integer. Number of threads to use for paralleled computing of the tests across the permutations. The R package |
print_hypothesis |
Logical. Whether to print information on what test is carried out, detailing the null and alternative hypotheses,
and what significant level is used to rejected or not the null hypothesis. Default is |
Details
These set of functions carries out the STructured RAte Permutations on Phylogenies (STRAPP) test as defined in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193.
It is an extension of the original BAMMtools::traitDependentBAMM() function used to
carry out STRAPP test on extant time-calibrated phylogenies, but allowing here to test for
differences/correlations at any point in the past (i.e. the focal_time).
It takes an object of class "bammdata" (BAMM_object) that was updated such as
its diversification rates ($tipLambda and $tipMu) and regimes ($tipStates) are reflecting
values observed at at a specific time in the past (i.e. the $focal_time).
Similarly, it takes a list (trait_data_list) that provides $trait_data as observed on branches
at the same focal_time than the diversification rates and regimes.
A STRAPP test is carried out by drawing a random set of posterior samples from the BAMM_object, then randomly permuting rates
across blocks of tips defined by the macroevolutionary regimes. Test statistics are then computed across the initial observed data
and the permuted data for each sample.
In a two-tailed test, the p-value is the proportion of posterior samples in which the test stats is as extreme in the permuted than in the observed data.
In a one-tailed test, the p-value is the proportion of posterior samples in which the test stats is higher in the permuted than in the observed data.
———- Major changes compared to BAMMtools::traitDependentBAMM() ———-
Allow to choose if random sampling of posterior configurations must be done with replacement or not with
replace_samples.Add post hoc pairwise tests (Dunn test) for multinominal data. Use
posthoc_pairwise_tests = TRUE.Provide outputs tailored for histogram plots
plot_histogram_STRAPP_test_for_focal_time()and p-value time-series plotsplot_STRAPP_pvalues_over_time().Add prints detailing what test is carried out, what are the null and alternative hypotheses, and what significant level is used to rejected or not the null hypothesis. (Enabled with
print_hypothesis = TRUE).Split the function in multiple sub-functions according to the type of data (
$trait_data_type).Prevent using Pearson's correlation tests and applying log-transformation for continuous data. The rationale is that there is no reason to assume that tip rates are distributed normally or log-normally. Thus, a Spearman's rank correlation test is favored.
Value
The function returns a list with at least eight elements.
Summary elements for the main test:
-
$estimateNamed numerical. Value of the test statistic used to assess significance of the test according to the significance level provided (alpha). The test is significant if$estimateis higher than zero. -
$stats_medianNumerical. Median value of the distribution of test statistics across all selected posterior samples. -
$p-valueNumerical. P-value of the test. The test is considered significant if$p-valueis lower thanalpha. -
$methodCharacter string. The statistical method used to carry out the test. -
$rate_typeCharacter string. The type of diversification rates tested. One of 'speciation', 'extinction' or 'net_diversification'. -
$trait_data_typeCharacter string. The type of trait data as found in 'trait_data_list$trait_data_type'. One of 'continuous', 'categorical', or 'biogeographic'. -
$trait_data_type_for_statsCharacter string. The type of trait data used to select statistical method. One of 'continuous', 'binary', or 'multinominal'. -
$focal_timeThe time in the past at which the trait and rates data were tested.
If using continuous or binary data:
-
$two-tailedLogical. Record the type of test used: two-tailed ifTRUE, one-tailed ifFALSE. Ifone_tailed_hypothesisis provided (only for continuous and binary trait data): -
$one_tailed_hypothesisCharacter string. Record of the alternative hypothesis used for the one-tailed tests.
If posthoc_pairwise_tests = TRUE (only for multinomial trait data):
-
$posthoc_pairwise_testsList of at least 3 sub-elements:-
$summary_dfData.frame of five variables providing the summary results of post hoc pairwise tests -
$methodCharacter string. The statistical method used to carry out the test. Here, "Dunn". -
$two-tailedLogical. Record the type of post hoc pairwise tests used: two-tailed ifTRUE, one-tailed ifFALSE.
-
If return_perm_data = TRUE, the stats data computed from the posterior samples for observed and permuted data are provided.
This is needed to plot the histogram of the null distribution used to assess significance of the test with plot_histogram_STRAPP_test_for_focal_time().
-
$perm_data_dfA data.frame with four variables summarizing the data generated during the STRAPP test:-
$posterior_samples_random_IDInteger. ID of the posterior samples randomly drawn and used for the STRAPP test. -
$*_obsNumerical. Test stats computed from the observed data in the posterior samples. Name depends on the test used. -
$*_permNumerical. Test stats computed from the permuted data in the posterior samples. Name depends on the test used. -
$delta_*OR$abs_delta_*Numerical. Test stats computed for the STRAPP test comparing observed stats and permuted stats. Name depends on the test used and the type of tests (two-tailed compare absolute values; one-tailed compare raw values). Combined withposthoc_pairwise_tests = TRUE, the stats data are also provided for the post hoc pairwise tests:
-
-
$posthoc_pairwise_tests$perm_data_arrayA 3D array containing stats data for all post hoc pairwise tests in a similar format that$perm_data_df.
If no STRAPP test was performed in the case of categorical/biogeographic data with a single state/range at focal_time,
only the $trait_data_type, $trait_data_type_for_stats = "none", and $focal_time are returned.
Author(s)
Maël Doré
References
For STRAPP: Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. doi:10.1093/sysbio/syv066.
For STRAPP in deep times: Doré, M., Borowiec, M. L., Branstetter, M. G., Camacho, G. P., Fisher, B. L., Longino, J. T., Ward, P. S., Blaimer, B. B., (2025), Evolutionary history of ponerine ants highlights how the timing of dispersal events shapes modern biodiversity, Nature Communications. doi:10.1038/s41467-025-63709-3
See Also
Associated functions in deepSTRAPP: extract_most_likely_trait_values_for_focal_time() update_rates_and_regimes_for_focal_time()
Original function in BAMMtools: BAMMtools::traitDependentBAMM()
Statistical tests: stats::cor.test() stats::wilcox.test() stats::kruskal.test() dunn.test::dunn.test()
For a guided tutorial, see this vignette: vignette("explore_STRAPP_test_types", package = "deepSTRAPP")
Examples
if (deepSTRAPP::is_dev_version())
{
# ------ Prepare data ------ #
## Load the BAMM_object summarizing 1000 posterior samples of BAMM with diversification rates
# for ponerine ants extracted for 10My ago.
data(Ponerinae_BAMM_object_10My, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Plot the associated phylogeny with mapped rates
plot_BAMM_rates(Ponerinae_BAMM_object_10My)
## Load the object containing head width trait data for ponerine ants extracted for 10My ago.
data(Ponerinae_trait_cont_tip_data_10My, package = "deepSTRAPP")
# Plot the associated contMap (continuous trait stochastic map)
plot_contMap(Ponerinae_trait_cont_tip_data_10My$contMap)
# Check that objects are ordered in the same fashion
identical(names(Ponerinae_BAMM_object_10My$tipStates[[1]]),
names(Ponerinae_trait_cont_tip_data_10My$trait_data))
# Save continuous data
trait_data_continuous <- Ponerinae_trait_cont_tip_data_10My
## Transform trait data into binary and multinominal data
# Binarize data into two states
trait_data_binary <- trait_data_continuous
trait_data_binary$trait_data[trait_data_continuous$trait_data < 0] <- "state_A"
trait_data_binary$trait_data[trait_data_continuous$trait_data >= 0] <- "state_B"
trait_data_binary$trait_data_type <- "categorical"
table(trait_data_binary$trait_data)
# Categorize data into three states
trait_data_multinominal <- trait_data_continuous
trait_data_multinominal$trait_data[trait_data_continuous$trait_data < 0] <- "state_B"
trait_data_multinominal$trait_data[trait_data_continuous$trait_data < -1] <- "state_A"
trait_data_multinominal$trait_data[trait_data_continuous$trait_data >= 0] <- "state_C"
trait_data_multinominal$trait_data_type <- "categorical"
table(trait_data_multinominal$trait_data)
# (May take several minutes to run)
# ------ Compute STRAPP test for continuous data ------ #
plot(x = trait_data_continuous$trait_data, y = Ponerinae_BAMM_object_10My$tipLambda[[1]])
# Compute STRAPP test under the alternative hypothesis of a "negative" correlation
# between "net_diversification" rates and trait data
STRAPP_results <- compute_STRAPP_test_for_focal_time(
BAMM_object = Ponerinae_BAMM_object_10My,
trait_data_list = trait_data_continuous,
two_tailed = FALSE,
one_tailed_hypothesis = "negative",
return_perm_data = TRUE)
str(STRAPP_results, max.level = 2)
# Data from the posterior samples is available in STRAPP_results$perm_data_df
head(STRAPP_results$perm_data_df)
# ------ Compute STRAPP test for binary data ------ #
# Compute STRAPP test under the alternative hypothesis that "state_A" is associated
# with higher "net_diversification" that "state_B"
STRAPP_results <- compute_STRAPP_test_for_focal_time(
BAMM_object = Ponerinae_BAMM_object_10My,
trait_data_list = trait_data_binary,
two_tailed = FALSE,
one_tailed_hypothesis = c("state_A > state_B"))
str(STRAPP_results, max.level = 1)
# Compute STRAPP test under the alternative hypothesis that "state_B" is associated
# with higher "net_diversification" that "state_A"
STRAPP_results <- compute_STRAPP_test_for_focal_time(BAMM_object = Ponerinae_BAMM_object_10My,
trait_data_list = trait_data_binary,
two_tailed = FALSE,
one_tailed_hypothesis = c("state_B > state_A"))
str(STRAPP_results, max.level = 1)
# ------ Compute STRAPP test for multinominal data ------ #
# Compute STRAPP test between all three states, and compute post hoc tests
# for differences in rates between all possible pairs of states
# with a p-value adjusted for multiple comparison using Bonferroni's correction
STRAPP_results <- compute_STRAPP_test_for_focal_time(
BAMM_object = Ponerinae_BAMM_object_10My,
trait_data_list = trait_data_multinominal,
posthoc_pairwise_tests = TRUE,
two_tailed = TRUE,
p.adjust_method = "bonferroni")
str(STRAPP_results, max.level = 3)
# All post hoc pairwise test summaries are available in $summary_df
STRAPP_results$posthoc_pairwise_tests$summary_df
}
Cut the phylogeny and continuous trait mapping for a given focal time in the past
Description
Cuts off all the branches of the phylogeny which are
younger than a specific time in the past (i.e. the focal_time).
Branches overlapping the focal_time are shorten to the focal_time.
Likewise, remove continuous trait mapping for the cut off branches
by updating the $tree$maps and $tree$mapped.edge elements.
Usage
cut_contMap_for_focal_time(contMap, focal_time, keep_tip_labels = TRUE)
Arguments
contMap |
Object of class |
focal_time |
Numerical. The time, in terms of time distance from the present, for which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny. |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip must retained their initial |
Details
The phylogenetic tree is cut for a specific time in the past (i.e. the focal_time).
When a branch with a single descendant tip is cut and keep_tip_labels = TRUE,
the leaf left is labeled with the tip.label of the unique descendant tip.
When a branch with a single descendant tip is cut and keep_tip_labels = FALSE,
the leaf left is labeled with the node ID of the unique descendant tip.
In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.
The continuous trait mapping is updated accordingly by removing mapping associated with the cut off branches.
Value
The function returns the cut contMap as an object of class "contMap".
It contains a $tree element of classes "simmap" and "phylo". This function updates and adds multiple useful sub-elements to the $tree element.
-
$mapsAn updated list of named numerical vectors. Provides the mapping of trait values along each remaining edge. -
$mapped.edgeAn updated matrix. Provides the evolutionary time spent across trait values (columns) along the remaining edges (rows). -
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels().
Author(s)
Maël Doré
See Also
cut_phylo_for_focal_time() extract_most_likely_trait_values_for_focal_time()
extract_most_likely_trait_values_from_contMap_for_focal_time()
For a guided tutorial, see this vignette: vignette("cut_phylogenies", package = "deepSTRAPP")
Examples
# ----- Prepare data ----- #
# Load mammals phylogeny and data from the R package motmot (data included in deepSTRAPP)
# Initial data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data(mammals, package = "deepSTRAPP")
mammals_tree <- mammals$mammal.phy
mammals_data <- setNames(object = mammals$mammal.mass$mean,
nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]
# Run a stochastic mapping based on a Brownian Motion model
# for Ancestral Trait Estimates to obtain a "contMap" object
mammals_contMap <- phytools::contMap(mammals_tree, x = mammals_data,
res = 100, # Number of time steps
plot = FALSE)
# Set focal time
focal_time <- 80
# ----- Example 1: keep_tip_labels = TRUE ----- #
# Cut contMap to 80 Mya while keeping tip.label
# on terminal branches with a unique descending tip.
updated_contMap <- cut_contMap_for_focal_time(contMap = mammals_contMap,
focal_time = focal_time,
keep_tip_labels = TRUE)
# Plot node labels on initial stochastic map with cut-off
plot_contMap(mammals_contMap, lwd = 2, fsize = c(0.5, 1))
ape::nodelabels(cex = 0.5)
abline(v = max(phytools::nodeHeights(mammals_contMap$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot initial node labels on cut stochastic map
plot_contMap(updated_contMap, fsize = c(0.8, 1))
ape::nodelabels(cex = 0.8, text = updated_contMap$tree$initial_nodes_ID)
# ----- Example 2: keep_tip_labels = FALSE ----- #
# Cut contMap to 80 Mya while NOT keeping tip.label.
updated_contMap <- cut_contMap_for_focal_time(contMap = mammals_contMap,
focal_time = focal_time,
keep_tip_labels = FALSE)
# Plot node labels on initial stochastic map with cut-off
plot_contMap(mammals_contMap, fsize = c(0.5, 1))
ape::nodelabels(cex = 0.5)
abline(v = max(phytools::nodeHeights(mammals_contMap$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot initial node labels on cut stochastic map
plot_contMap(updated_contMap, fsize = c(0.8, 1))
ape::nodelabels(cex = 0.8, text = updated_contMap$tree$initial_nodes_ID)
Cut the phylogeny and posterior probability mapping of a categorical trait for a given focal time in the past
Description
Cuts off all the branches of the phylogeny which are
younger than a specific time in the past (i.e. the focal_time).
Branches overlapping the focal_time are shorten to the focal_time.
Likewise, remove posterior probability mapping of the categorical trait for the cut off branches
by updating the $tree$maps and $tree$mapped.edge elements.
Usage
cut_densityMap_for_focal_time(densityMap, focal_time, keep_tip_labels = TRUE)
Arguments
densityMap |
Object of class |
focal_time |
Numerical. The time, in terms of time distance from the present, for which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny. |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip must retained their initial |
Details
The phylogenetic tree is cut for a specific time in the past (i.e. the focal_time).
When a branch with a single descendant tip is cut and keep_tip_labels = TRUE,
the leaf left is labeled with the tip.label of the unique descendant tip.
When a branch with a single descendant tip is cut and keep_tip_labels = FALSE,
the leaf left is labeled with the node ID of the unique descendant tip.
In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.
The posterior probability mapping of a categorical trait is updated accordingly by removing mapping associated with the cut off branches.
Value
The function returns the cut densityMap as an object of class "densityMap" with three elements.
It contains a $tree element of classes "simmap" and "phylo". This function updates and adds multiple useful sub-elements to the $tree element.
* $maps An updated list of named numerical vectors. Provides the mapping of posterior probability of the state along each remaining edge.
* $mapped.edge An updated matrix. Provides the evolutionary time spent across posterior probabilities (columns) along the remaining edges (rows).
* $root_age Integer. Stores the age of the root of the tree.
* $nodes_ID_df Data.frame with two columns. Provides the conversion from the new_node_ID to the initial_node_ID. Each row is a node.
* $initial_nodes_ID Vector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels with ape::nodelabels().
* $edges_ID_df Data.frame with two columns. Provides the conversion from the new_edge_ID to the initial_edge_ID. Each row is an edge/branch.
* $initial_edges_ID Vector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels with ape::edgelabels().
The $col element describes the colors used to map each possible posterior probability value from 0 to 1000.
The $states element provide the name of the states. Here, the first value is the absence of the state labeled as "Not X" with X being the state.
The second value is the name of the state.
High posterior probability reflects high likelihood to harbor the state. Low probability reflects high likelihood to NOT harbor the state.
Author(s)
Maël Doré
See Also
cut_phylo_for_focal_time() extract_most_likely_trait_values_for_focal_time()
extract_most_likely_states_from_densityMaps_for_focal_time()
For a guided tutorial, see this vignette: vignette("cut_phylogenies", package = "deepSTRAPP")
Examples
# ----- Prepare data ----- #
# Load mammals phylogeny and data from the R package motmot included within deepSTRAPP
# Data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data("mammals", package = "deepSTRAPP")
# Obtain mammal tree
mammals_tree <- mammals$mammal.phy
# Convert mass data into categories
mammals_mass <- setNames(object = mammals$mammal.mass$mean,
nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]
mammals_data <- mammals_mass
mammals_data[seq_along(mammals_data)] <- "small"
mammals_data[mammals_mass > 5] <- "medium"
mammals_data[mammals_mass > 10] <- "large"
table(mammals_data)
# (May take several minutes to run)
# Produce densityMaps using stochastic character mapping based on an equal-rates (ER) Mk model
mammals_cat_data <- prepare_trait_data(tip_data = mammals_data, phylo = mammals_tree,
trait_data_type = "categorical",
evolutionary_models = "ER",
nb_simulations = 100,
plot_map = FALSE)
# Set focal time
focal_time <- 80
# Extract the density map for small mammals (state 3 = "small")
mammals_densityMap_small <- mammals_cat_data$densityMaps[[3]]
# ----- Example 1: keep_tip_labels = TRUE ----- #
# Cut densityMap to 80 Mya while keeping tip.label
# on terminal branches with a unique descending tip.
updated_mammals_densityMap_small <- cut_densityMap_for_focal_time(
densityMap = mammals_densityMap_small,
focal_time = focal_time,
keep_tip_labels = TRUE)
# Plot node labels on initial stochastic map with cut-off
phytools::plot.densityMap(mammals_densityMap_small, fsize = 0.7, lwd = 2)
ape::nodelabels(cex = 0.7)
abline(v = max(phytools::nodeHeights(mammals_densityMap_small$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot initial node labels on cut stochastic map
phytools::plot.densityMap(updated_mammals_densityMap_small, fsize = 0.8)
ape::nodelabels(cex = 0.8, text = updated_mammals_densityMap_small$tree$initial_nodes_ID)
# ----- Example 2: keep_tip_labels = FALSE ----- #
# Cut densityMap to 80 Mya while NOT keeping tip.label
updated_mammals_densityMap_small <- cut_densityMap_for_focal_time(
densityMap = mammals_densityMap_small,
focal_time = focal_time,
keep_tip_labels = FALSE)
# Plot node labels on initial stochastic map with cut-off
phytools::plot.densityMap(mammals_densityMap_small, fsize = 0.7, lwd = 2)
ape::nodelabels(cex = 0.7)
abline(v = max(phytools::nodeHeights(mammals_densityMap_small$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot initial node labels on cut stochastic map
phytools::plot.densityMap(updated_mammals_densityMap_small, fsize = 0.8)
ape::nodelabels(cex = 0.8, text = updated_mammals_densityMap_small$tree$initial_nodes_ID)
Cut phylogenies and posterior probability mapping of each state for a given focal time in the past
Description
Cuts off all the branches of the phylogeny which are
younger than a specific time in the past (i.e. the focal_time).
Branches overlapping the focal_time are shorten to the focal_time.
Likewise, remove posterior probability mapping of the categorical trait for the cut off branches
by updating the $tree$maps and $tree$mapped.edge elements.
Usage
cut_densityMaps_for_focal_time(densityMaps, focal_time, keep_tip_labels = TRUE)
Arguments
densityMaps |
List of objects of class |
focal_time |
Numerical. The time, in terms of time distance from the present, for which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny. |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip must retained their initial |
Details
The phylogenetic tree is cut for a specific time in the past (i.e. the focal_time).
When a branch with a single descendant tip is cut and keep_tip_labels = TRUE,
the leaf left is labeled with the tip.label of the unique descendant tip.
When a branch with a single descendant tip is cut and keep_tip_labels = FALSE,
the leaf left is labeled with the node ID of the unique descendant tip.
In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.
The continuous trait mapping is updated accordingly by removing mapping associated with the cut off branches.
Value
The function returns an updated list of objects as cut densityMaps of class "densityMap".
Each densityMap object contains three elements:
-
$treeelement of classes"simmap"and"phylo". This function updates and adds multiple useful sub-elements to the$treeelement.-
$mapsAn updated list of named numerical vectors. Provides the mapping of posterior probability of the state along each remaining edge. -
$mapped.edgeAn updated matrix. Provides the evolutionary time spent across posterior probabilities (columns) along the remaining edges (rows). -
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels().
-
-
$colelement describes the colors used to map each possible posterior probability value from 0 to 1000. -
$stateselement provide the name of the states. Here, the first value is the absence of the state labeled as "Not X" with X being the state. The second value is the name of the state.
High posterior probability reflects high likelihood to harbor the state. Low probability reflects high likelihood to NOT harbor the state.
Author(s)
Maël Doré
See Also
cut_phylo_for_focal_time() cut_densityMap_for_focal_time()
extract_most_likely_trait_values_for_focal_time()
extract_most_likely_states_from_densityMaps_for_focal_time()
Examples
# ----- Prepare data ----- #
# Load mammals phylogeny and data from the R package motmot, and implemented in deepSTRAPP
# Data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data("mammals", package = "deepSTRAPP")
# Obtain mammal tree
mammals_tree <- mammals$mammal.phy
# Convert mass data into categories
mammals_mass <- setNames(object = mammals$mammal.mass$mean,
nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]
mammals_data <- mammals_mass
mammals_data[seq_along(mammals_data)] <- "small"
mammals_data[mammals_mass > 5] <- "medium"
mammals_data[mammals_mass > 10] <- "large"
table(mammals_data)
# (May take several minutes to run)
# Produce densityMaps using stochastic character mapping based on an equal-rates (ER) Mk model
mammals_cat_data <- prepare_trait_data(tip_data = mammals_data, phylo = mammals_tree,
trait_data_type = "categorical",
evolutionary_models = "ER",
nb_simulations = 100,
plot_map = FALSE)
# Set focal time
focal_time <- 80
# Extract the density maps
mammals_densityMaps <- mammals_cat_data$densityMaps
# ----- Example 1: keep_tip_labels = TRUE ----- #
# Cut densityMaps to 80 Mya while keeping tip.label
# on terminal branches with a unique descending tip.
updated_mammals_densityMaps <- cut_densityMaps_for_focal_time(
densityMaps = mammals_densityMaps,
focal_time = focal_time,
keep_tip_labels = TRUE)
## Plot density maps as overlay of all state posterior probabilities
# ?plot_densityMaps_overlay
# Plot initial density maps
plot_densityMaps_overlay(densityMaps = mammals_densityMaps, fsize = 0.5)
abline(v = max(phytools::nodeHeights(mammals_densityMaps[[1]]$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated/cut density maps
plot_densityMaps_overlay(densityMaps = updated_mammals_densityMaps, fsize = 0.8)
# ----- Example 2: keep_tip_labels = FALSE ----- #
# Cut densityMap to 80 Mya while NOT keeping tip.label
updated_mammals_densityMaps <- cut_densityMaps_for_focal_time(
densityMaps = mammals_densityMaps,
focal_time = focal_time,
keep_tip_labels = FALSE)
# Plot initial density maps
plot_densityMaps_overlay(densityMaps = mammals_densityMaps, fsize = 0.5)
abline(v = max(phytools::nodeHeights(mammals_densityMaps[[1]]$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated/cut density maps
plot_densityMaps_overlay(densityMaps = updated_mammals_densityMaps, fsize = 0.8)
Cut the phylogeny for a given time in the past
Description
Cuts off all the branches of the phylogeny which are
younger than a specific time in the past (i.e. the focal_time).
Branches overlapping the focal_time are shorten to the focal_time.
Usage
cut_phylo_for_focal_time(tree, focal_time, keep_tip_labels = TRUE)
Arguments
tree |
Object of class |
focal_time |
Numerical. The time, in terms of time distance from the present, for which the tree must be cut. It must be smaller than the root age of the phylogeny. |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip must retained their initial |
Details
When a branch with a single descendant tip is cut and keep_tip_labels = TRUE,
the leaf left is labeled with the tip.label of the unique descendant tip.
When a branch with a single descendant tip is cut and keep_tip_labels = FALSE,
the leaf left is labeled with the node ID of the unique descendant tip.
In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.
Value
The function returns the cut phylogeny as an object of class "phylo". It adds multiple useful elements to the object.
-
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels().
Author(s)
Maël Doré
See Also
cut_contMap_for_focal_time() cut_densityMaps_for_focal_time()
For a guided tutorial, see this vignette: vignette("cut_phylogenies", package = "deepSTRAPP")
Examples
# Load eel phylogeny from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
data("eel.tree", package = "phytools")
# ----- Example 1: keep_tip_labels = TRUE ----- #
# Cut tree to 30 Mya while keeping tip.label on terminal branches with a unique descending tip.
cut_eel.tree <- cut_phylo_for_focal_time(tree = eel.tree, focal_time = 30, keep_tip_labels = TRUE)
# Plot internal node labels on initial tree with cut-off
plot(eel.tree, cex = 0.5)
abline(v = max(phytools::nodeHeights(eel.tree)[,2]) - 30, col = "red", lty = 2, lwd = 2)
nb_tips <- length(eel.tree$tip.label)
nodelabels_in_cut_tree <- (nb_tips + 1):(nb_tips + eel.tree$Nnode)
nodelabels_in_cut_tree[!(nodelabels_in_cut_tree %in% cut_eel.tree$initial_nodes_ID)] <- NA
ape::nodelabels(text = nodelabels_in_cut_tree)
# Plot initial internal node labels on cut tree
plot(cut_eel.tree, cex = 0.8)
ape::nodelabels(text = cut_eel.tree$initial_nodes_ID)
# Plot edge labels on initial tree with cut-off
plot(eel.tree, cex = 0.5)
abline(v = max(phytools::nodeHeights(eel.tree)[,2]) - 30, col = "red", lty = 2, lwd = 2)
edgelabels_in_cut_tree <- 1:nrow(eel.tree$edge)
edgelabels_in_cut_tree[!(1:nrow(eel.tree$edge) %in% cut_eel.tree$initial_edges_ID)] <- NA
ape::edgelabels(text = edgelabels_in_cut_tree)
# Plot initial edge labels on cut tree
plot(cut_eel.tree, cex = 0.8)
ape::edgelabels(text = cut_eel.tree$initial_edges_ID)
# ----- Example 2: keep_tip_labels = FALSE ----- #
# Cut tree to 30 Mya without keeping tip.label on terminal branches with a unique descending tip.
# All tip.labels are converted to their descending/tipward node ID
cut_eel.tree <- cut_phylo_for_focal_time(tree = eel.tree, focal_time = 30, keep_tip_labels = FALSE)
plot(cut_eel.tree, cex = 0.8)
Data summarizing the evolution of geographic ranges in eels
Description
A list containing (fake) geographic ranges data of eels mapped on the phylogeny,
modeled with R package BioGeoBEARS. This object was obtained with prepare_trait_data().
Initial data based on feeding habits was altered to be transformed into range "A" and "B", and then adding arbitrarily multi-area "AB" ranges.
This is NOT real biogeographic data. Please refer to the initial article for real data.
Original data source: Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014) Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505. doi:10.1038/ncomms6505
Usage
data(eel_biogeo_data)
Format
A list with 9 elements.
Details
A list of 9 elements containing information on the evolution of geographic ranges in eels.
This object was obtained with prepare_trait_data().
-
$densityMapsList of objects of class"densityMapthat contains a phylogenetic tree and associated mapping of probability to harbor a given range along branches. The list contains only a"densityMapper unique areas becausesplit_multi_area_rangeswas set to TRUE. -
$densityMaps_all_rangesList of objects of class"densityMapthat contains a phylogenetic tree and associated mapping of probability to harbor a given range along branches. The list contains one"densityMapper range found along branches during the simulated biogeographic histories. -
$trait_data_typeCharacter string. Record the type of trait data. Here: "biogeographic". -
$aceNumerical matrix that record the posterior probabilities of ancestral ranges estimated at internal nodes. Only unique areas are considered among the ranges. Multi-area ranges have been split among unique ranges. Rows are internal nodes. Columns are ranges. Values are posterior probabilities of each range per node. -
$ace_all_rangesNumerical matrix that record the posterior probabilities of ancestral ranges estimated at internal nodes. All ranges observed along branches during the simulated biogeographic histories are present. Rows are internal nodes. Columns are ranges. Values are posterior probabilities of each range per node. -
$BSM_outputList of two lists that contains summary information of cladogenetic ($RES_caldo_events_tables) and anagenetic ($RES_ana_events_tables) events recording across the 1000 simulations of biogeographic histories performed during Biogeographic Stochastic Mapping (BSM). Each element of the list is a data.frame recording events occurring during one simulation. -
$simmapsList of 1000 objects of class"simmap". Each simmap object is a phylogeny with one simulated biogeographic history (i.e., transitions in geographic ranges) mapped along branches. -
$best_model_fitList that provides the output of the best fitting model (Here: DEC+J model). -
$model_selection_dfData.frame that summarizes model comparisons used to select the best fitting model.
References
Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014) Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505. doi:10.1038/ncomms6505
See Also
Data summarizing the evolution of feeding habits in eels using a 3-level factor as categorical trait
Description
A list containing feeding habits data of eels mapped on the phylogeny,
modeled with geiger::fitDiscrete. This object was obtained with prepare_trait_data().
Initial data was altered arbitrarily to create three categories, adding a "kiss" feeding habit to the initial
"bite" and "suction" data. This is NOT real biological data. Please refer to the initial article for real data.
Original data source: Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014) Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505. doi:10.1038/ncomms6505
Usage
data(eel_cat_3lvl_data)
Format
A list with 6 elements.
Details
A list of five objects containing information on the evolution of feeding habits in eels.
This object was obtained with prepare_trait_data().
-
$densityMapsList of objects of class"densityMapthat contains a phylogenetic tree and associated mapping of probability to harbor a given state/range along branches. The list contains one"densityMapper state/range found in thetip_data. -
$trait_data_typeCharacter string. Record the type of trait data. Here: "categorical". -
$simmapsList of 100 stochastic mapping simulations for trait evolution. Each element is a"simmap"object (see phytools::make.simmap) representing a possible evolutionary history that fits states observed on tips, infered ACE at internal nodes, and transition rates as estimated from the best fit model. -
$aceNumerical matrix that record the posterior probabilities of ancestral states/ranges (characters) estimates (ACE) at internal nodes. Rows are internal nodes. Columns are states/ranges. Values are posterior probabilities of each state per node. -
$best_model_fitList that provides the output of the best fitting model (Here: ER model). -
$model_selection_dfData.frame that summarizes model comparisons used to select the best fitting model.
References
Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014) Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505. doi:10.1038/ncomms6505
See Also
Extract diversification data from a BAMM_object
Description
Extracts regimes ID and tip rates from a BAMM_object that have been
updated to provide diversification data for a specific time in the past (i.e. the focal_time).
Use update_rates_and_regimes_for_focal_time() to obtain
a BAMM_object updated for a given focal_time.
Usage
extract_diversification_data_melted_df_for_focal_time(
BAMM_object,
verbose = TRUE
)
Arguments
BAMM_object |
Object of class |
verbose |
Logical. Should progression be displayed? A message will be printed for every batch of
100 BAMM posterior samples extracted. Default is |
Value
Returns a data.frame with six columns.
-
$focal_timeInteger. The time, in terms of time distance from the present, at which the trait data were extracted. Should be equal for all rows as a unique BAMM_object updated for a uniquefocal_timeis being extracted. -
$BAMM_sample_IDInteger. ID of the posterior samples from which the diversification data are extracted. -
$tip_IDCharacter string. Tip labels of the branches cut-off atfocal_time.If
keep_tip_labels = TRUEwas used inupdate_rates_and_regimes_for_focal_time(), cut-off branches with a single descendant tip retain their initialtip.label.If
keep_tip_labels = FALSEwas used inupdate_rates_and_regimes_for_focal_time(), all cut-off branches are labeled using their tipward node ID.
-
$regime_IDInteger. The regime ID on tips atfocal_time. IDs are integer. The root process equals "1", then they are incremented from oldest to youngest. Regime IDs are independent across posterior samples. -
$rate_typeCharacter string. Type of rates: "lambda" for speciation rates, "mu" for extinction rates, and "net_diversification" for net diversification rates (lambda - mu). -
$ratesNumerical. Rates in \[number of events / branch / evolutionary time\].
Author(s)
Maël Doré
Examples
if (deepSTRAPP::is_dev_version())
{
# Load the BAMM_object summarizing 1000 posterior samples of BAMM updated for a focal_time of 10 My
data(Ponerinae_BAMM_object_10My)
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# (May take several minutes to run)
# Extract diversification data
diversification_data_df <- extract_diversification_data_melted_df_for_focal_time(
BAMM_object = Ponerinae_BAMM_object_10My,
verbose = TRUE)
# Print output
head(diversification_data_df)
}
Extract biogeographic range data mapped on a phylogeny at a given time in the past
Description
Extracts the most likely ranges found along branches
at a specific time in the past (i.e. the focal_time).
Optionally, the function can update the mapped phylogeny (densityMaps)
such as branches overlapping the focal_time are shorten to the focal_time,
and the range mapping for the cut off branches are removed
by updating the $tree$maps and $tree$mapped.edge elements.
Usage
extract_most_likely_ranges_from_densityMaps_for_focal_time(
densityMaps,
ace = NULL,
tip_data = NULL,
focal_time,
update_densityMaps = FALSE,
keep_tip_labels = TRUE
)
Arguments
densityMaps |
List of objects of class |
ace |
(Optional) Numerical matrix that record the posterior probabilities of ancestral ranges at internal nodes,
obtained with |
tip_data |
(Optional) Named character string vector of tip ranges. Names are nodes_ID of the internal nodes. Needed to provide accurate tip values. |
focal_time |
Integer. The time, in terms of time distance from the present, at which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny. |
update_densityMaps |
Logical. Specify whether the mapped phylogeny ( |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip
must retained their initial |
Details
The mapped phylogeny (densityMaps) is cut at a specific time in the past
(i.e. the focal_time) and the current trait values of the overlapping edges/branches are extracted.
—– Extract trait_data —–
Most likely ranges are extracted from the posterior probabilities displayed in the densityMaps.
The range with the highest probability is assigned to each tip and cut branches at focal_time.
True ML estimates will be used if tip_data and/or ace are provided as optional inputs.
In practice the discrepancy is negligible.
—– Update the densityMaps —–
To obtain updated densityMaps alongside the trait data, set update_densityMaps = TRUE.
The update consists in cutting off branches and mapping that are younger than the focal_time.
When a branch with a single descendant tip is cut and
keep_tip_labels = TRUE, the leaf left is labeled with the tip.label of the unique descendant tip.When a branch with a single descendant tip is cut and
keep_tip_labels = FALSE, the leaf left is labeled with the node ID of the unique descendant tip.In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.
The ancestral range mapping in densityMap ($tree$maps and $tree$mapped.edge) is updated accordingly by removing mapping associated with the cut off branches.
Value
By default, the function returns a list with three elements.
-
$trait_dataA named character string vector with ML ranges found along branches overlapping thefocal_time. Names are the tip.label/tipward node ID. -
$focal_timeInteger. The time, in terms of time distance from the present, at which the trait data were extracted. -
$trait_data_typeCharacter string. Define the type of trait data as "biogeographic". Used in downstream analyses to select appropriate statistical processing.
If update_densityMaps = TRUE, the output is a list with four elements: $trait_data, $focal_time, $trait_data_type, and $densityMaps.
-
$densityMapsA list of objects of class"densityMap"that contains the updateddensityMapof each range/range, with branches and mapping that are younger than thefocal_timecut off. The function also adds multiple useful sub-elements to the$densityMaps$treeelements.-
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels().
-
Author(s)
Maël Doré
See Also
cut_phylo_for_focal_time() cut_densityMaps_for_focal_time()
Associated main function: extract_most_likely_trait_values_for_focal_time()
Sub-functions for other types of trait data:
extract_most_likely_trait_values_from_contMap_for_focal_time()
extract_most_likely_states_from_densityMaps_for_focal_time()
Examples
## Load biogeographic range data mapped on a phylogeny
data(eel_biogeo_data, package = "deepSTRAPP")
# Explore data
str(eel_biogeo_data, 1)
eel_biogeo_data$densityMaps # Two density maps: one per unique area: A, B.
eel_biogeo_data$densityMaps_all_ranges # Three density maps: one per range: A, B, and AB.
# Set focal time to 10 Mya
focal_time <- 10
# ----- Example 1: Using only unique areas ----- #
# (May take several minutes to run)
## Extract trait data and update densityMaps for the given focal_time
# Extract from the densityMaps
eel_biogeo_data_10My <- extract_most_likely_ranges_from_densityMaps_for_focal_time(
densityMaps = eel_biogeo_data$densityMaps,
# ace = eel_biogeo_data$ace,
focal_time = focal_time,
update_densityMaps = TRUE)
## Print trait data
str(eel_biogeo_data_10My, 1)
eel_biogeo_data_10My$trait_data
## Plot density maps as overlay of all range posterior probabilities
# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = eel_biogeo_data$densityMaps, fsize = 0.7)
abline(v = max(phytools::nodeHeights(eel_biogeo_data$densityMaps[[1]]$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_biogeo_data_10My$densityMaps, fsize = 0.7)
# ----- Example 2: Using all ranges ----- #
# (May take several minutes to run)
## Extract trait data and update densityMaps_all_ranges for the given focal_time
# Extract from the densityMaps
eel_biogeo_data_10My <- extract_most_likely_ranges_from_densityMaps_for_focal_time(
densityMaps = eel_biogeo_data$densityMaps_all_ranges,
# ace = eel_biogeo_data$ace_all_ranges,
focal_time = focal_time,
update_densityMaps = TRUE)
## Print trait data
str(eel_biogeo_data_10My, 1)
eel_biogeo_data_10My$trait_data
## Plot density maps as overlay of all range posterior probabilities
# Plot initial density maps with ACE pies
root_age <- max(phytools::nodeHeights(eel_biogeo_data$densityMaps_all_ranges[[1]]$tree)[,2])
plot_densityMaps_overlay(densityMaps = eel_biogeo_data$densityMaps_all_ranges, fsize = 0.7)
abline(v = root_age - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_biogeo_data_10My$densityMaps, fsize = 0.7)
Extract categorical trait data mapped on a phylogeny at a given time in the past
Description
Extracts the most likely states found along branches
at a specific time in the past (i.e. the focal_time).
Optionally, the function can update the mapped phylogeny (densityMaps)
such as branches overlapping the focal_time are shorten to the focal_time,
and the trait mapping for the cut off branches are removed
by updating the $tree$maps and $tree$mapped.edge elements.
Usage
extract_most_likely_states_from_densityMaps_for_focal_time(
densityMaps,
ace = NULL,
tip_data = NULL,
focal_time,
update_densityMaps = FALSE,
keep_tip_labels = TRUE
)
Arguments
densityMaps |
List of objects of class |
ace |
(Optional) Numerical matrix that record the posterior probabilities of ancestral states at internal nodes,
obtained with |
tip_data |
(Optional) Named character string vector of tip states. Names are nodes_ID of the internal nodes. Needed to provide accurate tip values. |
focal_time |
Integer. The time, in terms of time distance from the present, at which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny. |
update_densityMaps |
Logical. Specify whether the mapped phylogeny ( |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip
must retained their initial |
Details
The mapped phylogeny (densityMaps) is cut at a specific time in the past
(i.e. the focal_time) and the current trait values of the overlapping edges/branches are extracted.
—– Extract trait_data —–
Most likely states are extracted from the posterior probabilities displayed in the densityMaps.
The state with the highest probability is assigned to each tip and cut branches at focal_time.
True ML estimates will be used if tip_data and/or ace are provided as optional inputs.
In practice the discrepancy is negligible.
—– Update the densityMaps —–
To obtain updated densityMaps alongside the trait data, set update_densityMaps = TRUE.
The update consists in cutting off branches and mapping that are younger than the focal_time.
When a branch with a single descendant tip is cut and
keep_tip_labels = TRUE, the leaf left is labeled with the tip.label of the unique descendant tip.When a branch with a single descendant tip is cut and
keep_tip_labels = FALSE, the leaf left is labeled with the node ID of the unique descendant tip.In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.
The categorical trait mapping in densityMap ($tree$maps and $tree$mapped.edge) is updated accordingly by removing mapping associated with the cut off branches.
Value
By default, the function returns a list with three elements.
-
$trait_dataA named character string vector with ML states found along branches overlapping thefocal_time. Names are the tip.label/tipward node ID. -
$focal_timeInteger. The time, in terms of time distance from the present, at which the trait data were extracted. -
$trait_data_typeCharacter string. Define the type of trait data as "categorical". Used in downstream analyses to select appropriate statistical processing.
If update_densityMaps = TRUE, the output is a list with four elements: $trait_data, $focal_time, $trait_data_type, and $densityMaps.
-
$densityMapsA list of objects of class"densityMap"that contains the updateddensityMapof each state, with branches and mapping that are younger than thefocal_timecut off. The function also adds multiple useful sub-elements to the$densityMaps$treeelements.-
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels().
-
Author(s)
Maël Doré
See Also
cut_phylo_for_focal_time() cut_densityMaps_for_focal_time()
Associated main function: extract_most_likely_trait_values_for_focal_time()
Sub-functions for other types of trait data:
extract_most_likely_trait_values_from_contMap_for_focal_time()
extract_most_likely_ranges_from_densityMaps_for_focal_time()
Examples
# ----- Example 1: Only extent taxa (Ultrametric tree) ----- #
## Load categorical trait data mapped on a phylogeny
data(eel_cat_3lvl_data, package = "deepSTRAPP")
# Explore data
str(eel_cat_3lvl_data, 1)
eel_cat_3lvl_data$densityMaps # Three density maps: one per state
# Set focal time to 10 Mya
focal_time <- 10
# (May take several minutes to run)
## Extract trait data and update densityMaps for the given focal_time
# Extract from the densityMaps
eel_cat_3lvl_data_10My <- extract_most_likely_states_from_densityMaps_for_focal_time(
densityMaps = eel_cat_3lvl_data$densityMaps,
# ace = eel_cat_3lvl_data$ace,
focal_time = focal_time,
update_densityMaps = TRUE)
## Print trait data
str(eel_cat_3lvl_data_10My, 1)
eel_cat_3lvl_data_10My$trait_data
## Plot density maps as overlay of all state posterior probabilities
# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = eel_cat_3lvl_data$densityMaps, fsize = 0.5)
abline(v = max(phytools::nodeHeights(eel_cat_3lvl_data$densityMaps[[1]]$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_cat_3lvl_data_10My$densityMaps, fsize = 0.7)
# ----- Example 2: Include fossils (Non-ultrametric tree) ----- #
## Test with non-ultrametric trees like mammals in motmot
## Prepare data
# Load mammals phylogeny and data from the R package motmot included within deepSTRAPP
# Data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data("mammals", package = "deepSTRAPP")
# Obtain mammal tree
mammals_tree <- mammals$mammal.phy
# Convert mass data into categories
mammals_mass <- setNames(object = mammals$mammal.mass$mean,
nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]
mammals_data <- mammals_mass
mammals_data[seq_along(mammals_data)] <- "small"
mammals_data[mammals_mass > 5] <- "medium"
mammals_data[mammals_mass > 10] <- "large"
table(mammals_data)
# (May take several minutes to run)
## Produce densityMaps using stochastic character mapping based on an equal-rates (ER) Mk model
mammals_cat_data <- prepare_trait_data(tip_data = mammals_data, phylo = mammals_tree,
trait_data_type = "categorical",
evolutionary_models = "ER",
nb_simulations = 100,
plot_map = FALSE)
# Set focal time
focal_time <- 80
## Extract trait data and update densityMaps for the given focal_time
# Extract from the densityMaps
mammals_cat_data_80My <- extract_most_likely_states_from_densityMaps_for_focal_time(
densityMaps = mammals_cat_data$densityMaps,
focal_time = focal_time,
update_densityMaps = TRUE)
## Print trait data
str(mammals_cat_data_80My, 1)
mammals_cat_data_80My$trait_data
## Plot density maps as overlay of all state posterior probabilities
# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = mammals_cat_data$densityMaps, fsize = 0.7)
abline(v = max(phytools::nodeHeights(mammals_cat_data$densityMaps[[1]]$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(mammals_cat_data_80My$densityMaps, fsize = 0.8)
Extract trait data mapped on a phylogeny at a given time in the past
Description
Extracts the most likely trait values found along branches
at a specific time in the past (i.e. the focal_time).
Optionally, the function can update the mapped phylogeny (contMap or densityMaps)
such as branches overlapping the focal_time are shorten to the focal_time,
and the trait mapping for the cut off branches are removed
by updating the $tree$maps and $tree$mapped.edge elements.
Usage
extract_most_likely_trait_values_for_focal_time(
contMap = NULL,
densityMaps = NULL,
ace = NULL,
tip_data = NULL,
trait_data_type,
focal_time,
update_map = FALSE,
keep_tip_labels = TRUE
)
Arguments
contMap |
For continuous trait data. Object of class |
densityMaps |
For categorical trait or biogeographic data. List of objects of class |
ace |
(Optional) Ancestral Character Estimates (ACE) at the internal nodes.
Obtained with
|
tip_data |
(Optional) Named vector of tip values of the trait.
|
trait_data_type |
Character string. Specify the type of trait data. Must be one of "continuous", "categorical", "biogeographic". |
focal_time |
Integer. The time, in terms of time distance from the present, at which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny. |
update_map |
Logical. Specify whether the mapped phylogeny ( |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip
must retained their initial |
Details
The mapped phylogeny (contMap or densityMaps) is cut at a specific time in the past
(i.e. the focal_time) and the current trait values of the overlapping edges/branches are extracted.
—– Extract trait_data —–
For continuous trait data:
If providing only the contMap trait values at tips and internal nodes will be extracted from
the mapping of the contMap leading to a slight discrepancy with the actual tip data
and estimated ancestral character values.
True ML trait estimates will be used if tip_data and/or ace are provided as optional inputs.
In practice the discrepancy is negligible.
For categorical trait and biogeographic data:
Most likely states/ranges are extracted from the posterior probabilities displayed in the densityMaps.
The states/ranges with the highest probability is assigned to each tip and cut branches at focal_time.
True ML states/ranges will be used if tip_data and/or ace are provided as optional inputs.
In practice the discrepancy is negligible.
—– Update the contMap/densityMaps —–
To obtain an updated contMap/densityMaps alongside the trait data, set update_map = TRUE.
The update consists in cutting off branches and mapping that are younger than the focal_time.
When a branch with a single descendant tip is cut and
keep_tip_labels = TRUE, the leaf left is labeled with the tip.label of the unique descendant tip.When a branch with a single descendant tip is cut and
keep_tip_labels = FALSE, the leaf left is labeled with the node ID of the unique descendant tip.In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.
The mapping in contMap/densityMaps ($tree$maps and $tree$mapped.edge) is updated accordingly by removing mapping associated with the cut off branches.
A specific sub-function (that can be used independently) is called according to the type of trait data:
For continuous traits:
extract_most_likely_trait_values_from_contMap_for_focal_time()For categorical traits:
extract_most_likely_states_from_densityMaps_for_focal_time()For biogeographic ranges:
extract_most_likely_ranges_from_densityMaps_for_focal_time()
Value
By default, the function returns a list with three elements.
-
$trait_dataA named numerical vector with ML trait values found along branches overlapping thefocal_time. Names are the tip.label/tipward node ID. -
$focal_timeInteger. The time, in terms of time distance from the present, at which the trait data were extracted. -
$trait_data_typeCharacter string. Define the type of trait data as "continuous", "categorical", or "biogeographic". Used in downstream analyses to select appropriate statistical processing.
If update_map = TRUE, the output is a list with four elements: $trait_data, $focal_time, $trait_data_type, and $contMap or $densityMaps.
For continuous trait data:
-
$contMapAn object of class"contMap"that contains the updatedcontMapwith branches and mapping that are younger than thefocal_timecut off. The function also adds multiple useful sub-elements to the$contMap$treeelement.-
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels().
-
For categorical trait and biogeographic data:
-
$densityMapsA list of objects of class"densityMap"that contains the updateddensityMapof each state/range, with branches and mapping that are younger than thefocal_timecut off. The function also adds multiple useful sub-elements to the$densityMaps$treeelements.-
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels().
-
Author(s)
Maël Doré
See Also
cut_phylo_for_focal_time() cut_contMap_for_focal_time() cut_densityMaps_for_focal_time()
Associated sub-functions per type of trait data:
extract_most_likely_trait_values_from_contMap_for_focal_time()
extract_most_likely_states_from_densityMaps_for_focal_time()
extract_most_likely_ranges_from_densityMaps_for_focal_time()
Examples
# ----- Example 1: Continuous trait ----- #
## Prepare data
# Load eel data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
library(phytools)
data(eel.tree)
data(eel.data)
# Extract body size
eel_data <- setNames(eel.data$Max_TL_cm,
rownames(eel.data))
# (May take several minutes to run)
## Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
eel_ACE <- phytools::fastAnc(tree = eel.tree, x = eel_data)
## Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
eel_contMap <- phytools::contMap(eel.tree, x = eel_data,
res = 100, # Number of time steps
plot = FALSE)
# Set focal time to 50 Mya
focal_time <- 50
## Extract trait data and update contMap for the given focal_time
# Extract from the contMap (values are not exact ML estimates)
eel_cont_50 <- extract_most_likely_trait_values_for_focal_time(
contMap = eel_contMap,
trait_data_type = "continuous",
focal_time = focal_time,
update_map = TRUE)
# Extract from tip data and ML estimates of ancestral characters (values are true ML estimates)
eel_cont_50 <- extract_most_likely_trait_values_for_focal_time(
contMap = eel_contMap,
ace = eel_ACE, tip_data = eel_data,
trait_data_type = "continuous",
focal_time = focal_time,
update_map = TRUE)
## Visualize outputs
# Print trait data
eel_cont_50$trait_data
# Plot node labels on initial stochastic map with cut-off
plot(eel_contMap, fsize = c(0.5, 1))
ape::nodelabels()
abline(v = max(phytools::nodeHeights(eel_contMap$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated contMap with initial node labels
plot(eel_cont_50$contMap)
ape::nodelabels(text = eel_cont_50$contMap$tree$initial_nodes_ID)
# ----- Example 2: Categorical trait ----- #
# (May take several minutes to run)
## Load categorical trait data mapped on a phylogeny
data(eel_cat_3lvl_data, package = "deepSTRAPP")
# Explore data
str(eel_cat_3lvl_data, 1)
eel_cat_3lvl_data$densityMaps # Three density maps: one per state
# Set focal time to 10 Mya
focal_time <- 10
## Extract trait data and update densityMaps for the given focal_time
# Extract from the densityMaps
eel_cat_3lvl_data_10My <- extract_most_likely_trait_values_for_focal_time(
densityMaps = eel_cat_3lvl_data$densityMaps,
trait_data_type = "categorical",
focal_time = focal_time,
update_map = TRUE)
## Print trait data
str(eel_cat_3lvl_data_10My, 1)
eel_cat_3lvl_data_10My$trait_data
## Plot density maps as overlay of all state posterior probabilities
# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = eel_cat_3lvl_data$densityMaps)
abline(v = max(phytools::nodeHeights(eel_cat_3lvl_data$densityMaps[[1]]$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_cat_3lvl_data_10My$densityMaps)
# ----- Example 3: Biogeographic ranges ----- #
# (May take several minutes to run)
## Load biogeographic range data mapped on a phylogeny
data(eel_biogeo_data, package = "deepSTRAPP")
# Explore data
str(eel_biogeo_data, 1)
eel_biogeo_data$densityMaps # Two density maps: one per unique area: A, B.
eel_biogeo_data$densityMaps_all_ranges # Three density maps: one per range: A, B, and AB.
# Set focal time to 10 Mya
focal_time <- 10
## Extract trait data and update densityMaps for the given focal_time
# Extract from the densityMaps
eel_biogeo_data_10My <- extract_most_likely_trait_values_for_focal_time(
densityMaps = eel_biogeo_data$densityMaps,
# ace = eel_biogeo_data$ace,
trait_data_type = "biogeographic",
focal_time = focal_time,
update_map = TRUE)
## Print trait data
str(eel_biogeo_data_10My, 1)
eel_biogeo_data_10My$trait_data
## Plot density maps as overlay of all range posterior probabilities
# Plot initial density maps with ACE pies
plot_densityMaps_overlay(densityMaps = eel_biogeo_data$densityMaps)
abline(v = max(phytools::nodeHeights(eel_biogeo_data$densityMaps[[1]]$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated densityMaps with ACE pies
plot_densityMaps_overlay(eel_biogeo_data_10My$densityMaps)
Extract continuous trait data mapped on a phylogeny at a given time in the past
Description
Extracts the most likely trait values found along branches
at a specific time in the past (i.e. the focal_time).
Optionally, the function can update the mapped phylogeny (contMap) such as
branches overlapping the focal_time are shorten to the focal_time, and
the continuous trait mapping for the cut off branches are removed
by updating the $tree$maps and $tree$mapped.edge elements.
Usage
extract_most_likely_trait_values_from_contMap_for_focal_time(
contMap,
ace = NULL,
tip_data = NULL,
focal_time,
update_contMap = FALSE,
keep_tip_labels = TRUE
)
Arguments
contMap |
Object of class |
ace |
Named numeric vector (Optional). Ancestral Character Estimates (ACE) of the internal nodes,
typically generated with |
tip_data |
Named numeric vector (Optional). Tip values of the trait. Names are nodes_ID of the internal nodes. Needed to provide accurate tip values. |
focal_time |
Integer. The time, in terms of time distance from the present, at which the tree and mapping must be cut. It must be smaller than the root age of the phylogeny. |
update_contMap |
Logical. Specify whether the mapped phylogeny ( |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip
must retained their initial |
Details
The mapped phylogeny (contMap) is cut at a specific time in the past
(i.e. the focal_time) and the current trait values of the overlapping edges/branches are extracted.
—– Extract trait_data —–
If providing only the contMap trait values at tips and internal nodes will be extracted from
the mapping of the contMap leading to a slight dependency with the actual tip data
and estimated ancestral character values.
True ML estimates will be used if tip_data and/or ace are provided as optional inputs.
In practice the discrepancy is negligible.
—– Update the contMap —–
To obtain an updated contMap alongside the trait data, set update_contMap = TRUE.
The update consists in cutting off branches and mapping that are younger than the focal_time.
When a branch with a single descendant tip is cut and
keep_tip_labels = TRUE, the leaf left is labeled with the tip.label of the unique descendant tip.When a branch with a single descendant tip is cut and
keep_tip_labels = FALSE, the leaf left is labeled with the node ID of the unique descendant tip.In all cases, when a branch with multiple descendant tips (i.e., a clade) is cut, the leaf left is labeled with the node ID of the MRCA of the cut-off clade.
The continuous trait mapping contMap ($tree$maps and $tree$mapped.edge) is updated accordingly by removing mapping associated with the cut off branches.
Value
By default, the function returns a list with three elements.
-
$trait_dataA named numerical vector with ML trait values found along branches overlapping thefocal_time. Names are the tip.label/tipward node ID. -
$focal_timeInteger. The time, in terms of time distance from the present, at which the trait data were extracted. -
$trait_data_typeCharacter string. Define the type of trait data as "continuous". Used in downstream analyses to select appropriate statistical processing.
If update_contMap = TRUE, the output is a list with four elements: $trait_data, $focal_time, $trait_data_type, and $contMap.
-
$contMapAn object of class that contains the updatedcontMapwith branches and mapping that are younger than thefocal_timecut off. The function also adds multiple useful sub-elements to the$contMap$treeelement.-
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels().
-
Author(s)
Maël Doré
See Also
cut_phylo_for_focal_time() cut_contMap_for_focal_time()
Associated main function: extract_most_likely_trait_values_for_focal_time()
Sub-functions for other types of trait data:
extract_most_likely_states_from_densityMaps_for_focal_time()
extract_most_likely_ranges_from_densityMaps_for_focal_time()
Examples
# ----- Example 1: Only extent taxa (Ultrametric tree) ----- #
## Prepare data
# Load eel data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
library(phytools)
data(eel.tree)
data(eel.data)
# Extract body size
eel_data <- setNames(eel.data$Max_TL_cm,
rownames(eel.data))
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
eel_ACE <- phytools::fastAnc(tree = eel.tree, x = eel_data)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
eel_contMap <- phytools::contMap(eel.tree, x = eel_data,
res = 100, # Number of time steps
plot = FALSE)
# Set focal time to 50 Mya
focal_time <- 50
# (May take several minutes to run)
## Extract trait data and update contMap for the given focal_time
# Extract from the contMap (values are not exact ML estimates)
eel_test <- extract_most_likely_trait_values_from_contMap_for_focal_time(
contMap = eel_contMap,
focal_time = focal_time,
update_contMap = TRUE)
# Extract from tip data and ML estimates of ancestral characters (values are true ML estimates)
eel_test <- extract_most_likely_trait_values_from_contMap_for_focal_time(
contMap = eel_contMap,
ace = eel_ACE, tip_data = eel_data,
focal_time = focal_time,
update_contMap = TRUE)
## Visualize outputs
# Print trait data
eel_test$trait_data
# Plot node labels on initial stochastic map with cut-off
plot(eel_contMap, fsize = c(0.5, 1))
ape::nodelabels()
abline(v = max(phytools::nodeHeights(eel_contMap$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated contMap with initial node labels
plot(eel_test$contMap)
ape::nodelabels(text = eel_test$contMap$tree$initial_nodes_ID)
# ----- Example 2: Include fossils (Non-ultrametric tree) ----- #
## Test with non-ultrametric trees like mammals in motmot
## Prepare data
# Load mammals phylogeny and data from the R package motmot included within deepSTRAPP
# Data source: Slater, 2013; DOI: 10.1111/2041-210X.12084
data("mammals", package = "deepSTRAPP")
mammals_tree <- mammals$mammal.phy
mammals_data <- setNames(object = mammals$mammal.mass$mean,
nm = row.names(mammals$mammal.mass))[mammals_tree$tip.label]
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
mammals_ACE <- phytools::fastAnc(tree = mammals_tree, x = mammals_data)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
mammals_contMap <- phytools::contMap(mammals_tree, x = mammals_data,
res = 100, # Number of time steps
plot = FALSE)
# Set focal time to 80 Mya
focal_time <- 80
# (May take several minutes to run)
## Extract trait data and update contMap for the given focal_time
# Extract from the contMap (values are not exact ML estimates)
mammals_test <- extract_most_likely_trait_values_from_contMap_for_focal_time(
contMap = mammals_contMap,
focal_time = focal_time,
update_contMap = TRUE)
# Extract from tip data and ML estimates of ancestral characters (values are true ML)
mammals_test <- extract_most_likely_trait_values_from_contMap_for_focal_time(
contMap = mammals_contMap,
ace = mammals_ACE, tip_data = mammals_data,
focal_time = focal_time,
update_contMap = TRUE)
## Visualize outputs
# Print trait data
mammals_test$trait_data
# Plot node labels on initial stochastic map with cut-off
phytools::plot.contMap(mammals_contMap, fsize = c(0.5, 1))
ape::nodelabels()
abline(v = max(phytools::nodeHeights(mammals_contMap$tree)[,2]) - focal_time,
col = "red", lty = 2, lwd = 2)
# Plot updated contMap with initial node labels
phytools::plot.contMap(mammals_test$contMap, fsize = c(0.8, 1))
ape::nodelabels(text = mammals_test$contMap$tree$initial_nodes_ID)
Check whether deepSTRAPP is a development version
Description
Detect whether the current deepSTRAPP installation corresponds to a development version (e.g., version sourced from GitHub), as opposed to a CRAN release.
The development versions include deepSTRAPP outputs as additional internal datasets to help produces examples and vignettes outputs. These additional datasets are removed from the CRAN releases because their size is not compatible with CRAN policies.
This function is only used to check if an example must be ran or a vignette chunk evaluated to produce output from data.
Usage
is_dev_version(pkg = "deepSTRAPP")
Arguments
pkg |
Character string. Name of the R package for which the version type is inspected. Default is "deepSTRAPP". |
Value
Logical. TRUE if running a development version.
Author(s)
Maël Doré
Examples
# Check the current deepSTRAPP installation is a development version
is_dev_version(pkg = "deepSTRAPP")
Phylogeny and body mass data for extant and extinct mammals families/genera from Slater, 2013
Description
A list containing two elements:
-
$mammal.massA data.frame with mean and standard error of mammal body masses in ln(mass in g). -
$mammal.phyA time-calibrated phylogeny of extinct and extant mammal families/genera.
Source: Slater, G. J. (2013). Phylogenetic evidence for a shift in the mode of mammalian body size evolution at the Cretaceous-Palaeogene boundary. Methods in Ecology and Evolution, 4(8), 734-744. doi:10.1111/2041-210X.12084
Usage
data(mammals)
Format
A list with 2 elements.
-
$mammal.massA data.frame of 213 observations and two columns. Values are numerical. -
$mammal.phyA time-calibrated phylogeny of 211 tips.
Details
Initial dataset from Slater, G. J. (2013). The R object was imported from R package motmot to include in deepSTRAPP.
Plot diversification rates and regime shifts from BAMM on phylogeny
Description
Plot on a time-calibrated phylogeny the evolution of diversification rates and
the location of regime shifts estimated from a BAMM (Bayesian Analysis of Macroevolutionary Mixtures).
Each branch is colored accroding to the estimated rates of speciation, extinction, or net diversification
stored in an object of class bammdata. Rates can vary along time, thus colors evolved along individual branches.
This function is a wrapper of original functions from the R package {BAMMtools}:
Step 1: Use
BAMMtools::plot.bammdata()to map rates on the phylogeny.Step 2: Add the location of regime shifts with
BAMMtools::addBAMMshifts()(ifadd_regime_shifts = TRUE).
Usage
plot_BAMM_rates(
BAMM_object,
rate_type = "net_diversification",
method = "phylogram",
add_regime_shifts = TRUE,
configuration_type = "MAP",
sample_index = 1,
adjust_size_to_prob = TRUE,
regimes_fill = "grey",
regimes_size = 1,
regimes_pch = 21,
regimes_border_col = "black",
regimes_border_width = 1,
...,
display_plot = TRUE,
PDF_file_path = NULL
)
Arguments
BAMM_object |
Object of class |
rate_type |
A character string specifying the type of diversification rates to plot. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). |
method |
A character string indicating the method for plotting the phylogenetic tree.
|
add_regime_shifts |
Logical. Whether to add the location of regime shifts on the phylogeny (Step 2). Default is |
configuration_type |
A character string specifying how to select the location of regime shifts across posterior samples.
|
sample_index |
Integer. Index of the posterior samples to use to plot the location of regime shifts.
Used only if |
adjust_size_to_prob |
Logical. Whether to scale the size of the symbols showing the location of regime shifts according to
the marginal shift probability of the shift happening on each location/branch. This will only works if there is an |
regimes_fill |
Character string. Set the color of the background of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_size |
Numerical. Set the size of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_pch |
Integer. Set the shape of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_border_col |
Character string. Set the color of the border of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_border_width |
Numerical. Set the width of the border of the symbols showing the location of regime shifts.
Equivalent to the |
... |
Additional graphical arguments to pass down to |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
Details
The main input BAMM_object is the typical output of prepare_diversification_data().
It provides information on rates and regimes shifts across the posterior samples of a BAMM.
$MAP_BAMM_object and $MSC_BAMM_object elements are required to plot regime shift locations following the
"MAP" or "MSC" configuration_type respectively.
A $MSP_tree element is required to scale the size of the symbols showing the location of regime shifts according marginal shift probabilities.
(If adjust_size_to_prob = TRUE).
The default option to display regime shift is to use the average locations from the posterior samples with the Maximum A Posteriori probability (MAP) configuration.
However, sometimes, multiple configurations have similarly high frequency in the posterior samples (See BAMMtools::credibleShiftSet().
An alternative is to use the average locations from posterior samples with the Maximum Shift Credibility (MSC) configuration instead.
This regime shift configuration has the highest product of marginal probabilities across branches where a shift is estimated.
It may differ from the MAP configuration. (See BAMMtools::maximumShiftCredibility()).
Value
The function returns (invisibly) a list with three three elements similarly to BAMMtools::plot.bammdata().
-
$coords: A matrix of plot coordinates. Rows correspond to branches. Columns 1-2 are starting (x,y) coordinates of each branch and columns 3-4 are ending (x,y) coordinates of each branch. If method = "polar" a fifth column gives the angle(in radians) of each branch. -
$colorbreaks: A vector of percentiles used to group macroevolutionary rates into color bins. -
$colordens: A matrix of the kernel density estimates (column 2) of evolutionary rates (column 1) and the color (column 3) corresponding to each rate value.
Author(s)
Maël Doré
Original functions by Mike Grundler & Pascal Title in R package {BAMMtools}.
See Also
Initial functions in BAMMtools: BAMMtools::plot.bammdata() BAMMtools::addBAMMshifts()
Associated functions in deepSTRAPP: prepare_diversification_data() update_rates_and_regimes_for_focal_time() run_deepSTRAPP_for_focal_time() run_deepSTRAPP_over_time()
Examples
# Load BAMM output
data(whale_BAMM_object, package = "deepSTRAPP")
## Plot overall mean rates with MAP configuration for regime shifts
# (rates are averaged only all posterior samples)
plot_BAMM_rates(whale_BAMM_object, add_regime_shifts = TRUE,
configuration_type = "MAP", bg = "black",
regimes_size = 3)
## Plot overall mean rates with MSC configuration for regime shifts
# (rates are averaged only all posterior samples)
plot_BAMM_rates(whale_BAMM_object, add_regime_shifts = TRUE,
configuration_type = "MSC", bg = "black",
regimes_size = 3)
## Plot mean MAP rates with regime shifts
# (rates are averaged only across MAP samples)
plot_BAMM_rates(whale_BAMM_object$MAP_BAMM_object, add_regime_shifts = TRUE,
configuration_type = "index",
# Set to index to use the regime shift data from the '$MAP_BAMM_object'
regimes_size = 3, bg = "black")
## Plot mean MSC rates (rates averaged only across MSC samples) with regime shifts
# (rates averaged only across MSC samples)
plot_BAMM_rates(whale_BAMM_object$MSC_BAMM_object, add_regime_shifts = TRUE,
configuration_type = "index",
# Set to index to use the regime shift data from the '$MSC_BAMM_object'
regimes_size = 3, bg = "black")
Plot evolution of p-values of STRAPP tests over time
Description
Plot the evolution of the p-values of STRAPP tests
carried out for across multiple time_steps, obtained from
run_deepSTRAPP_over_time().
By default, return a plot with a single line for p-values of overall tests.
If plot_posthoc_tests = TRUE, it will return a plot with multiple lines, one per pair in post hoc tests
(only for multinominal data, with more than two states).
If a PDF file path is provided in PDF_file_path, the plot will be saved directly in a PDF file.
Usage
plot_STRAPP_pvalues_over_time(
deepSTRAPP_outputs,
time_range = NULL,
pvalues_max = NULL,
alpha = 0.05,
display_plot = TRUE,
plot_significant_time_frame = TRUE,
plot_posthoc_tests = FALSE,
select_posthoc_pairs = "all",
plot_adjusted_pvalues = FALSE,
PDF_file_path = NULL
)
Arguments
deepSTRAPP_outputs |
List of elements generated with |
time_range |
Vector of two numerical values. Time boundaries used for X-axis the plot.
If |
pvalues_max |
Numerical. Set the max boundary used for the Y-axis of the plot.
If |
alpha |
Numerical. Significance level to display as a red dashed line on the plot. If set to |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
plot_significant_time_frame |
Logical. Whether to display a green band over the time frame that yields significant results according to the chosen alpha level. Default is |
plot_posthoc_tests |
Logical. For multinominal data only. Whether to plot the p-values for the overall Kruskal-Wallis test across all states ( |
select_posthoc_pairs |
Vector of character strings used to specify the pairs to include in the plot. Names of pairs must match the pairs found in
|
plot_adjusted_pvalues |
Logical. Whether to display the p-values adjusted for multiple testing rather than the raw p-values.
See argument 'p.adjust_method' in |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
Details
Plots are build based on the p-values recorded in summary_df provided by run_deepSTRAPP_over_time().
For overall tests, those p-values are found in $pvalues_summary_df.
For multinominal data (categorical or biogeographic data with more than 2 states), it is possible to plot p-values of post hoc pairwise tests.
Set plot_posthoc_tests = TRUE to generate plots for the pairwise post hoc Dunn's test across pairs of states.
To achieve this, the deepSTRAPP_outputs input object must contain a $pvalues_summary_df_for_posthoc_pairwise_tests element that summarizes p-values
computed across pairs of states for all post hoc tests. This is obtained from run_deepSTRAPP_over_time() when setting
posthoc_pairwise_tests = TRUE to carry out post hoc tests.
Value
The function returns a list of classes gg and ggplot.
This object is a ggplot that can be displayed on the console with print(output).
It corresponds to the plot being displayed on the console when the function is run, if display_plot = TRUE,
and can be further modify for aesthetics using the ggplot2 grammar.
If a PDF_file_path is provided, the function will also generate a PDF file of the plot.
Author(s)
Maël Doré
See Also
Examples
if (deepSTRAPP::is_dev_version())
{
## Load results of run_deepSTRAPP_over_time() for categorical data with 3-levels
data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Plot results of overall Kruskal-Wallis / Mann-Whitney-Wilcoxon tests across all time-steps
plot_overall <- plot_STRAPP_pvalues_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
alpha = 0.10,
time_range = c(0, 30), # Adjust time range if needed
display_plot = FALSE)
# Adjust aesthetics a posteriori
plot_overall <- plot_overall +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 16))
print(plot_overall)
## Plot results of post hoc pairwise Dunn's tests between selected pairs of states
plot_posthoc <- plot_STRAPP_pvalues_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
alpha = 0.10,
plot_posthoc_tests = TRUE,
# PDF_file_path = "./pvalues_over_time.pdf",
select_posthoc_pairs = c("arboreal != subterranean",
"arboreal != terricolous"),
display_plot = FALSE)
# Adjust aesthetics a posteriori
plot_posthoc <- plot_posthoc +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 16),
legend.title = ggplot2::element_text(size = 14),
legend.position.inside = c(0.25, 0.25))
print(plot_posthoc)
}
Plot continuous trait evolution on the tree
Description
Plot on a time-calibrated phylogeny the evolution of a continuous trait as
summarized in a contMap object typically generated with prepare_trait_data().
This function is a wrapper of original functions from the R package {phytools}:
Step 1: Use
phytools::setMap()to update the color scale if requested.Step 2: Use
phytools::plot.contMap()to plot the mapped phylogeny.
Usage
plot_contMap(
contMap,
color_scale = NULL,
...,
display_plot = TRUE,
PDF_file_path = NULL
)
Arguments
contMap |
List of class |
color_scale |
Vector of character string. List of colors to use to build the color scale with |
... |
Additional arguments to pass down to |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
Details
This function is a wrapper of original functions phytools::setMap() and phytools::plot.contMap(). Additions are listed below:
The color scale can be controlled directly with the argument
color_scale.The plot can be exported in PDF using
PDF_file_pathto define the output file.
Value
If display_plot = TRUE, the function plots the time-calibrated phylogeny displaying the evolution of a continuous trait.
If PDF_file_path is provided, the function exports the plot into a PDF file.
An object of class "contMap" with an (optionally) updated color scale ($cols) is returned invisibly.
Author(s)
Maël Doré
Original functions by Liam Revell in R package {phytools}. Contact: liam.revell@umb.edu
See Also
phytools::plot.contMap() plot_densityMaps_overlay()
Examples
# Load phylogeny
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_tree, package = "deepSTRAPP")
## Prepare trait data
# (May take several minutes to run)
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree, x = Ponerinae_cont_tip_data)
# Infer Ancestral Character Estimates based on a Brownian Motion model
# and run a Stochastic Mapping to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap with the phytools method
plot(x = Ponerinae_contMap, fsize = c(0.5, 1))
# Plot contMap with an updated color scale
plot_contMap(contMap = Ponerinae_contMap, fsize = c(0.5, 1),
color_scale = c("darkgreen", "limegreen", "orange", "red"))
# PDF_file_path = "Ponerinae_contMap.pdf")
Plot posterior probabilities of states/ranges on phylogeny from densityMaps
Description
Plot on a time-calibrated phylogeny the evolution of a categorical trait/biogeographic ranges
summarized from densityMaps typically generated with prepare_trait_data().
Each branch is colored according to the posterior probability of being in a given state/range.
Color for each state/range are overlaid using transparency to produce a single plot for all states/ranges.
Usage
plot_densityMaps_overlay(
densityMaps,
colors_per_levels = NULL,
add_ACE_pies = TRUE,
cex_pies = 0.5,
ace = NULL,
...,
display_plot = TRUE,
PDF_file_path = NULL
)
Arguments
densityMaps |
List of objects of class |
colors_per_levels |
Named character string. To set the colors to use to map each state/range posterior probabilities. Names = states/ranges; values = colors.
If |
add_ACE_pies |
Logical. Whether to add pies of posterior probabilities of states/ranges at internal nodes on the mapped phylogeny. Default = |
cex_pies |
Numerical. To adjust the size of the ACE pies. Default = |
ace |
Numerical matrix. To provide the posterior probabilities of ancestral states/ranges (characters) estimates (ACE) at internal nodes
used to plot the ACE pies. Rows are internal nodes. Columns are states/ranges. Values are posterior probabilities of each state per node.
Typically generated with |
... |
Additional arguments to pass down to |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
Value
If display_plot = TRUE, the function plots a time-calibrated phylogeny displaying the evolution of a categorical trait/biogeographic ranges.
If PDF_file_path is provided, the function exports the plot into a PDF file.
Author(s)
Maël Doré
Original functions by Liam Revell in R package {phytools}. Contact: liam.revell@umb.edu
See Also
phytools::plot.densityMap() phytools::plotSimmap()
Examples
# Load phylogeny and tip data
library(phytools)
data(eel.tree)
data(eel.data)
# Transform feeding mode data into a 3-level factor
eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
eel_data <- as.character(eel_data)
eel_data[c(1, 5, 6, 7, 10, 11, 15, 16, 17, 24, 25, 28, 30, 51, 52, 53, 55, 58, 60)] <- "kiss"
eel_data <- stats::setNames(eel_data, rownames(eel.data))
table(eel_data)
# Manually define a Q_matrix for rate classes of state transition to use in the 'matrix' model
# Does not allow transitions from state 1 ("bite") to state 2 ("kiss") or state 3 ("suction")
# Does not allow transitions from state 3 ("suction") to state 1 ("bite")
# Set symmetrical rates between state 2 ("kiss") and state 3 ("suction")
Q_matrix = rbind(c(NA, 0, 0), c(1, NA, 2), c(0, 2, NA))
# Set colors per states
colors_per_levels <- c("limegreen", "orange", "dodgerblue")
names(colors_per_levels) <- c("bite", "kiss", "suction")
# (May take several minutes to run)
# Run evolutionary models to prepare trait data
eel_cat_3lvl_data <- prepare_trait_data(tip_data = eel_data, phylo = eel.tree,
trait_data_type = "categorical",
colors_per_levels = colors_per_levels,
evolutionary_models = c("ER", "SYM", "ARD", "meristic", "matrix"),
Q_matrix = Q_matrix,
nb_simulations = 1000,
plot_map = TRUE,
plot_overlay = TRUE,
return_best_model_fit = TRUE,
return_model_selection_df = TRUE)
# Load directly output
data(eel_cat_3lvl_data, package = "deepSTRAPP")
# Plot densityMaps one by one
plot(eel_cat_3lvl_data$densityMaps[[1]]) # densityMap for state n°1 ("bite")
plot(eel_cat_3lvl_data$densityMaps[[2]]) # densityMap for state n°1 ("kiss")
plot(eel_cat_3lvl_data$densityMaps[[3]]) # densityMap for state n°1 ("suction")
# Plot overlay of all densityMaps
plot_densityMaps_overlay(densityMaps = eel_cat_3lvl_data$densityMaps)
Plot histogram of STRAPP test statistics to assess results
Description
Plot an histogram of the distribution of the test statistics
obtained from a STRAPP test carried out for a unique focal_time.
(See compute_STRAPP_test_for_focal_time() and
run_deepSTRAPP_for_focal_time()).
Returns a single histogram for overall tests.
If plot_posthoc_tests = TRUE, it will return a faceted plot with an histogram per post hoc tests.
If a PDF file path is provided in PDF_file_path, the plot will be saved directly in a PDF file.
Usage
plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs,
focal_time = NULL,
display_plot = TRUE,
plot_posthoc_tests = FALSE,
PDF_file_path = NULL
)
Arguments
deepSTRAPP_outputs |
List of elements generated with |
focal_time |
Numerical. (Optional) If |
display_plot |
Logical. Whether to display the histogram(s) generated in the R console. Default is |
plot_posthoc_tests |
Logical. For multinominal data only. Whether to plot the histogram for the overall Kruskal-Wallis test across all states ( |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
Details
The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_for_focal_time().
It provides information on results of a STRAPP test performed at a given focal_time.
Histograms are built based on the distribution of the test statistics.
Such distributions are recorded in the outputs of a deepSTRAPP run carried out with run_deepSTRAPP_for_focal_time()
when return_perm_data = TRUE so that the distributions of test stats computed across posterior samples are returned
among the outputs under $STRAPP_results$perm_data_df.
For multinominal data (categorical or biogeographic data with more than 2 states), it is possible to plot the histograms of post hoc pairwise tests.
Set plot_posthoc_tests = TRUE to generate histograms for all the pairwise post hoc Dunn's test across pairs of states.
To achieve this, the deepSTRAPP_outputs input object must contain a $STRAPP_results$posthoc_pairwise_tests$perm_data_array element
that summarizes test statistics computed across posterior samples for all pairwise post hoc tests.
This is obtained from run_deepSTRAPP_for_focal_time() when setting both posthoc_pairwise_tests = TRUE to carry out post hoc tests,
and return_perm_data = TRUE to record distributions of test statistics.
Alternatively, the main input deepSTRAPP_outputs can be the output of run_deepSTRAPP_over_time(),
providing results of STRAPP tests over multiple time-steps. In this case, you must provide a focal_time to select the
unique time-step used for plotting.
-
return_perm_datamust be set toTRUEso that the permutation data used to compute the tests are returned among the outputs under$STRAPP_results_over_time[[i]]$perm_data_df. -
posthoc_pairwise_testsmust be set toTRUEso that the permutation data used to performed the post hoc tests are also returned among the outputs under$STRAPP_results_over_time[[i]]$posthoc_pairwise_tests$perm_data_array.
For plotting all time-steps at once, see plot_histograms_STRAPP_tests_over_time().
Value
By default, the function returns a list of classes gg and ggplot.
This object is a ggplot that can be displayed on the console with print(output).
It corresponds to the histogram being displayed on the console when the function is run, if display_plot = TRUE, and can be further
modify for aesthetics using the ggplot2 grammar.
If using multinominal data and set plot_posthoc_tests = TRUE, the function will return a list of objects.
Each object is the ggplot associated with a pairwise post hoc test.
To plot each histogram i individually, use print(output_list[[i]]).
To plot all histograms at once in a multifaceted plot, as displayed on the console if display_plot = TRUE, use cowplot::plot_grid(plotlist = output_list).
Each plot also displays summary statistics for the STRAPP test associated with the data displayed.
The quantile of null statistic distribution at the significant threshold used to define test significance. This is the value found on the red dashed line. The test will be considered significant (i.e., the null hypothesis is rejected) if this value is higher than zero (the black dashed line).
The p-value of the STRAPP test which correspond the proportion of cases in which the statistics was lower than expected under the null hypothesis (i.e., the proportion of the histogram found below / on the left-side of the black dashed line).
If a PDF_file_path is provided, the function will also generate a PDF file of the plot. For post hoc tests, this will save the multifaceted plot.
Author(s)
Maël Doré
See Also
Associated functions in deepSTRAPP: run_deepSTRAPP_for_focal_time() plot_histograms_STRAPP_tests_over_time()
Examples
if (deepSTRAPP::is_dev_version())
{
# ----- Example 1: Continuous trait ----- #
# Load fake trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny with old calibration
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Select a color scheme from lowest to highest values
color_scale = c("darkgreen", "limegreen", "orange", "red")
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)
# (May take several minutes to run)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap = stochastic mapping of continuous trait
plot_contMap(contMap = Ponerinae_contMap,
color_scale = color_scale)
## Set focal time to 10 Mya
focal_time <- 10
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
Ponerinae_deepSTRAPP_cont_old_calib_10My <- run_deepSTRAPP_for_focal_time(
contMap = Ponerinae_contMap,
ace = Ponerinae_ACE,
tip_data = Ponerinae_cont_tip_data,
trait_data_type = "continuous",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
return_perm_data = TRUE,
extract_diversification_data_melted_df = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(Ponerinae_deepSTRAPP_cont_old_calib_10My, max.level = 1)
# ----- Plot histogram of STRAPP overall test results from run_deepSTRAPP_for_focal_time() ----- #
histogram_ggplot <- plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_10My,
display_plot = TRUE,
# PDF_file_path = "./plot_STRAPP_histogram_10My.pdf",
plot_posthoc_tests = FALSE)
# Adjust aesthetics a posteriori
histogram_ggplot_adj <- histogram_ggplot +
ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
print(histogram_ggplot_adj)
# ----- Plot histogram of STRAPP overall test results from run_deepSTRAPP_over_time() ----- #
## Load directly outputs from run_deepSTRAPP_over_time()
data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Select focal_time = 10My
focal_time <- 10
## Plot histogram for overall test
histogram_ggplot <- plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
focal_time = focal_time,
display_plot = TRUE,
# PDF_file_path = "./plot_STRAPP_histogram_10My.pdf",
plot_posthoc_tests = FALSE)
# Adjust aesthetics a posteriori
histogram_ggplot_adj <- histogram_ggplot +
ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
print(histogram_ggplot_adj)
# ----- Example 2: Categorical trait ----- #
## Load data
# Load phylogeny
data(Ponerinae_tree, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract categorical data with 3-levels
Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
table(Ponerinae_cat_3lvl_tip_data)
# Select color scheme for states
colors_per_states <- c("forestgreen", "sienna", "goldenrod")
names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")
# (May take several minutes to run)
## Produce densityMaps using stochastic character mapping based on an ARD Mk model
Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
tip_data = Ponerinae_cat_3lvl_tip_data,
phylo = Ponerinae_tree_old_calib,
trait_data_type = "categorical",
colors_per_states = colors_per_states,
evolutionary_models = "ARD", # Use default ARD model
nb_simulations = 100, # Reduce number of simulations to save time
seed = 1234, # Seet seed for reproducibility
return_best_model_fit = TRUE,
return_model_selection_df = TRUE,
plot_map = FALSE)
# Load directly output
data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")
## Set focal time to 10 Mya
focal_time <- 10
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My <- run_deepSTRAPP_for_focal_time(
densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
ace = Ponerinae_cat_3lvl_data_old_calib$ace,
tip_data = Ponerinae_cat_3lvl_tip_data,
trait_data_type = "categorical",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
posthoc_pairwise_tests = TRUE,
return_perm_data = TRUE,
extract_diversification_data_melted_df = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My, max.level = 1)
# ------ Plot histogram of STRAPP overall test results ------ #
histogram_ggplot <- plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My,
display_plot = TRUE,
# PDF_file_path = "./plot_STRAPP_histogram_overall_test.pdf",
plot_posthoc_tests = FALSE)
# Adjust aesthetics a posteriori
histogram_ggplot_adj <- histogram_ggplot +
ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
print(histogram_ggplot_adj)
# ------ Plot histograms of STRAPP post hoc test results ------ #
histograms_ggplot_list <- plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My,
display_plot = TRUE,
# PDF_file_path = "./plot_STRAPP_histograms_posthoc_tests.pdf",
plot_posthoc_tests = TRUE)
# Plot all histograms one by one
print(histograms_ggplot_list)
# Plot all histograms on one faceted plot
cowplot::plot_grid(plotlist = histograms_ggplot_list)
}
Plot multiple histograms of STRAPP test statistics over time-steps
Description
Plot an histogram of the distribution of the test statistics
obtained from a deepSTRAPP workflow carried out for each focal time in $time_steps.
Main input = output of a deepSTRAPP run over time using run_deepSTRAPP_over_time()).
Returns one histogram for overall tests for each focal time in $time_steps.
If plot_posthoc_tests = TRUE, it will return one faceted plot with an histogram
per post hoc tests for each focal time in $time_steps.
If a PDF file path is provided in PDF_file_path, the plots will be saved directly in a PDF file,
with one page per focal time in $time_steps.
Usage
plot_histograms_STRAPP_tests_over_time(
deepSTRAPP_outputs,
display_plots = TRUE,
plot_posthoc_tests = FALSE,
PDF_file_path = NULL
)
Arguments
deepSTRAPP_outputs |
List of elements generated with |
display_plots |
Logical. Whether to display the histograms generated in the R console. Default is |
plot_posthoc_tests |
Logical. For multinominal data only. Whether to plot the histograms for the overall Kruskal-Wallis test across all states ( |
PDF_file_path |
Character string. If provided, the plots will be saved in a unique PDF file following the path provided here. The path must end with ".pdf".
Each page of the PDF corresponds to a focal time in |
Details
The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_over_time().
It provides information on results of a STRAPP tests performed over multiple time-steps.
Histograms are built based on the distribution of the test statistics.
Such distributions are recorded in the outputs of a deepSTRAPP run carried out with run_deepSTRAPP_over_time()
when return_STRAPP_results = TRUE AND return_perm_data = TRUE. The $STRAPP_results_over_time objects provided within the input are lists that must contain
a $perm_data_df element that summarizes test statistics computed across posterior samples.
For multinominal data (categorical or biogeographic data with more than 2 states), it is possible to plot the histograms of post hoc pairwise tests.
Set plot_posthoc_tests = TRUE to generate histograms for all the pairwise post hoc Dunn's test across pairs of states.
To achieve this, the $STRAPP_results_over_time objects must contain a $posthoc_pairwise_tests$perm_data_array element that summarizes test statistics
computed across posterior samples for all pairwise post hoc tests. This is obtained from run_deepSTRAPP_over_time() when setting
return_STRAPP_results = TRUE to return the STRAPP results, posthoc_pairwise_tests = TRUE to carry out post hoc tests,
and return_perm_data = TRUE to record distributions of test statistics.
Time-steps for which the data do not yield more than two states/ranges will show a warning and generate no plot.
Value
By default, the function returns a list of sub-lists of classes gg and ggplot ordered as in $time_steps.
Each sub-list corresponds to a ggplot for a given focal_time i that can be displayed on the console with print(output[[i]]).
If display_plots = TRUE, the histograms are being displayed on the console one by one while generated.
If using multinominal data and set plot_posthoc_tests = TRUE, the function will return a list of sub-lists of objects ordered as in $time_steps.
Each sub-list is a list of the ggplots associated with pairwise post hoc tests carried out for this a given focal_time.
For a given focal_time i, to plot each histogram j individually, use print(output_list[[i]][[j]]).
To plot all histograms of a given focal_time i at once in a multifaceted plot, as displayed sequentially on the console if display_plots = TRUE,
use cowplot::plot_grid(plotlist = output_list[[i]]).
Each plot also displays summary statistics for the STRAPP test associated with the data displayed.
The quantile of null statistic distribution at the significant threshold used to define test significance. This is the value found on the red dashed line. The test will be considered significant (i.e., the null hypothesis is rejected) if this value is higher than zero (the black dashed line).
The p-value of the STRAPP test which correspond the proportion of cases in which the statistics was lower than expected under the null hypothesis (i.e., the proportion of the histogram found below / on the left-side of the black dashed line).
If a PDF_file_path is provided, the function will also generate a PDF file of the plots with one page per $time_steps.
For post hoc tests, this will save the multifaceted plots.
Author(s)
Maël Doré
See Also
Associated functions in deepSTRAPP: run_deepSTRAPP_over_time() plot_histogram_STRAPP_test_for_focal_time()
Examples
if (deepSTRAPP::is_dev_version())
{
# ----- Example 1: Continuous trait ----- #
# Load fake trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny with old calibration
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Select a color scheme from lowest to highest values
color_scale = c("darkgreen", "limegreen", "orange", "red")
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)
# (May take several minutes to run)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap = stochastic mapping of continuous trait
plot_contMap(contMap = Ponerinae_contMap,
color_scale = color_scale)
## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
# nb_time_steps <- 5
time_step_duration <- 5
time_range <- c(0, 40)
## Run deepSTRAPP on net diversification rates
Ponerinae_deepSTRAPP_cont_old_calib_0_40 <- run_deepSTRAPP_over_time(
contMap = Ponerinae_contMap,
ace = Ponerinae_ACE,
tip_data = Ponerinae_cont_tip_data,
trait_data_type = "continuous",
BAMM_object = Ponerinae_BAMM_object_old_calib,
# nb_time_steps = nb_time_steps,
time_range = time_range,
time_step_duration = time_step_duration,
return_perm_data = TRUE,
extract_trait_data_melted_df = TRUE,
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly trait data output
data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Plot histograms of STRAPP overall test results
# Tests are Spearman's rank correlation tests
# Plot all histograms
histogram_ggplots <- plot_histograms_STRAPP_tests_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
display_plot = TRUE,
# PDF_file_path = "./plot_STRAPP_histogram_overall_test.pdf",
plot_posthoc_tests = FALSE)
# Print histogram for time step 1 = 0 My
print(histogram_ggplots[[1]])
# Adjust aesthetics of plot for time step 1 a posteriori
histogram_ggplot_adj <- histogram_ggplots[[1]] +
ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
print(histogram_ggplot_adj)
# ----- Example 2: Categorical data ----- #
## Load data
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract categorical data with 3-levels
Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
table(Ponerinae_cat_3lvl_tip_data)
# Select color scheme for states
colors_per_states <- c("forestgreen", "sienna", "goldenrod")
names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")
# (May take several minutes to run)
## Produce densityMaps using stochastic character mapping based on an ARD Mk model
Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
tip_data = Ponerinae_cat_3lvl_tip_data,
phylo = Ponerinae_tree_old_calib,
trait_data_type = "categorical",
colors_per_levels = colors_per_states,
evolutionary_models = "ARD",
nb_simulations = 100,
return_best_model_fit = TRUE,
return_model_selection_df = TRUE,
plot_map = FALSE)
# Load directly trait data output
data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")
## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
# nb_time_steps <- 5
time_step_duration <- 5
time_range <- c(0, 40)
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates across time-steps.
Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40 <- run_deepSTRAPP_over_time(
densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
ace = Ponerinae_cat_3lvl_data_old_calib$ace,
tip_data = Ponerinae_cat_3lvl_tip_data,
trait_data_type = "categorical",
BAMM_object = Ponerinae_BAMM_object_old_calib,
# nb_time_steps = nb_time_steps,
time_range = time_range,
time_step_duration = time_step_duration,
rate_type = "net_diversification",
seed = 1234, # Set for reproducibility
alpha = 0.10, # Select a generous level of significance for the sake of the example
posthoc_pairwise_tests = TRUE,
return_perm_data = TRUE,
extract_trait_data_melted_df = TRUE,
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly deepSTRAPP output
data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Explore output
str(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, max.level = 1)
## Plot histograms of STRAPP overall test results #
# Tests are Kruskall-Wallis H tests when more than two states/ranges are present.
# Tests are Mann–Whitney–Wilcoxon rank-sum tests when only two states/ranges are present.
histogram_ggplots <- plot_histograms_STRAPP_tests_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
display_plot = TRUE,
# PDF_file_path = "./plot_STRAPP_histograms_overall_tests.pdf",
plot_posthoc_tests = FALSE)
# Print histogram for time step 1 = 0 My
print(histogram_ggplots[[1]])
# Adjust aesthetics of plot for time step 1 a posteriori
histogram_ggplot_adj <- histogram_ggplots[[1]] +
ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
print(histogram_ggplot_adj)
## Plot histograms of STRAPP post hoc test results ------ #
# Tests are Dunn's multiple comparison pairwise post hoc tests possible
# only when more than two states/ranges are present.
histograms_ggplots_list <- plot_histograms_STRAPP_tests_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
display_plot = TRUE,
# PDF_file_path = "./plot_STRAPP_histograms_posthoc_tests.pdf",
plot_posthoc_tests = TRUE)
# Print all histograms for time step 1 (= 0 My) one by one
print(histograms_ggplots_list[[1]])
# Plot all histograms for time step 1 (= 0 My) on one faceted plot
cowplot::plot_grid(plotlist = histograms_ggplots_list[[1]])
}
Plot evolution of diversification rates in relation to trait values over time
Description
Plot the evolution of diversification rates in relation to trait values
extracted for multiple time_steps with run_deepSTRAPP_over_time().
Rates are averaged across branches at each time step (i.e., focal_time).
For continuous data, branches are grouped by ranges of trait values defined by
quantile_ranges.For categorical data, branches are grouped by trait states.
For biogeographic data, branches are grouped by ranges.
Usage
plot_rates_through_time(
deepSTRAPP_outputs,
rate_type = "net_diversification",
quantile_ranges = c(0, 0.25, 0.5, 0.75, 1),
select_trait_levels = "all",
time_range = NULL,
color_scale = NULL,
colors_per_levels = NULL,
plot_CI = FALSE,
CI_type = "fuzzy",
CI_quantiles = 0.95,
display_plot = TRUE,
PDF_file_path = NULL,
return_mean_data_per_samples_df = FALSE,
return_median_data_across_samples_df = FALSE
)
Arguments
deepSTRAPP_outputs |
List of elements generated with |
rate_type |
A character string specifying the type of diversification rates to use.
Must be one of 'speciation', 'extinction' or 'net_diversification' (default).
Even if the |
quantile_ranges |
Vector of numerical. Only for continuous trait data. Quantiles used as thresholds to group branches
by trait values. It must start with 0 and finish with 1. Default is |
select_trait_levels |
(Vector of) character string. Only for categorical and biogeographic trait data.
To provide a list of a subset of states/ranges to plot. Names must match the ones found in
|
time_range |
Vector of two numerical values. Time boundaries used for the plot.
If |
color_scale |
Vector of character string. List of colors to use to build the color scale with |
colors_per_levels |
Named character string. To set the colors to use to plot rates of each state/range. Names = states/ranges; values = colors.
If |
plot_CI |
Logical. Whether to plot a confidence interval (CI) based on the distribution of rates found in posterior samples. Default is |
CI_type |
Character string. To select the type of confidence interval (CI) to plot.
|
CI_quantiles |
Numerical. Proportion of rate values across posterior samples encompassed by the confidence interval. Only if |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
return_mean_data_per_samples_df |
Logical. Whether to include in the output the data.frame of mean rates per trait values computed for
each posterior sample at each time-step (aggregated across groups of branches based on trait data). This is used to draw the confidence interval. Default is |
return_median_data_across_samples_df |
Logical. Whether to include in the output the data.frame of median rates per trait values
across posterior samples computed for at each time-step (aggregated across groups of branches based on trait data AND posterior samples).
This is used to draw the lines on the plot. Default is |
Value
The function returns a list with at least one element.
-
rates_TT_ggplotAn object of classesggandggplot. This is a ggplot that can be displayed on the console withprint(output$rates_TT_ggplot). It corresponds to the plot being displayed on the console when the function is run, ifdisplay_plot = TRUE, and can be further modify for aesthetics using the ggplot2 grammar.
Optional summary data frames:
-
mean_data_per_samples_dfA data.frame with four columns providing the$mean_ratesobserved along branches with a similar$trait_value(if categorical or biogeographic) or falling into the same$quantile_ranges. Data are extracted for each posterior sample ($BAMM_sample_ID) at each time-step (i.e.,$focal_time). This is used to draw the confidence interval. Included ifreturn_mean_data_per_samples_df = TRUE. -
$median_data_across_samples_dfA data.frame with three columns providing the$median_ratesobserved across all posterior samples in$mean_data_per_samples_df. This is used to draw the lines on the plot. Included ifreturn_median_data_across_samples_df = TRUE.
If a PDF_file_path is provided, the function will also generate a PDF file of the plot.
Author(s)
Maël Doré
See Also
For a guided tutorial, see this vignette: vignette("plot_rates_through_time", package = "deepSTRAPP")
Examples
# ------ Example 1: Plot rates through time for continuous data ------ #
if (deepSTRAPP::is_dev_version())
{
## Load results of run_deepSTRAPP_over_time()
data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Visualize trait data
hist(Ponerinae_deepSTRAPP_cont_old_calib_0_40$trait_data_df_over_time$trait_value,
xlab = "Trait values", main = NULL)
# Generate plot
plotTT_continuous <- plot_rates_through_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
quantile_ranges = c(0, 0.25, 0.5, 0.75, 1.0),
time_range = c(0, 50), # Control range of the X-axis
# color_scale = c("limegreen", "red"),
plot_CI = TRUE,
CI_type = "quantiles_rect",
CI_quantiles = 0.9,
display_plot = FALSE,
# PDF_file_path = "./plotTT_continuous.pdf",
return_mean_data_per_samples_df = TRUE,
return_median_data_across_samples_df = TRUE)
# Explore output
# str(plotTT_continuous, max.level = 1)
# Plot
print(plotTT_continuous$rates_TT_ggplot)
# Adjust aesthetics of plot a posteriori
plotTT_continuous_adj <- plotTT_continuous$rates_TT_ggplot +
ggplot2::theme(
plot.title = ggplot2::element_text(color = "red", size = 15),
axis.title = ggplot2::element_text(size = 14),
axis.text = ggplot2::element_text(size = 12))
# Plot again
print(plotTT_continuous_adj)
}
# ------ Example 2: Plot rates through time for categorical data ------ #
if (deepSTRAPP::is_dev_version())
{
## Load results of run_deepSTRAPP_over_time()
data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Explore trait data
table(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40$trait_data_df_over_time$trait_value)
# Set colors to use
colors_per_states <- c("forestgreen", "sienna", "goldenrod")
names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")
# Generate plot only for "arboreal" and "terricolous"
plotTT_categorical <- plot_rates_through_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
select_trait_levels = c("arboreal", "terricolous"),
time_range = c(0, 50),
colors_per_levels = colors_per_states,
plot_CI = TRUE,
CI_type = "quantiles_rect",
CI_quantiles = 0.9,
display_plot = FALSE,
# PDF_file_path = "./plotTT_categorical.pdf",
return_mean_data_per_samples_df = TRUE,
return_median_data_across_samples_df = TRUE)
# Explore output
# str(plotTT_categorical, max.level = 1)
# Adjust aesthetics of plot a posteriori
plotTT_categorical_adj <- plotTT_categorical$rates_TT_ggplot +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 15),
axis.title = ggplot2::element_text(size = 14),
axis.text = ggplot2::element_text(size = 12))
print(plotTT_categorical_adj)
}
# ------ Example 3: Plot rates through time for biogeographic data ------ #
if (deepSTRAPP::is_dev_version())
{
## Load results of run_deepSTRAPP_over_time()
data(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Explore range data
table(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40$trait_data_df_over_time$trait_value)
# Set colors to use
colors_per_ranges <- c("mediumpurple2", "peachpuff2")
names(colors_per_ranges) <- c("N", "O")
plotTT_biogeographic <- plot_rates_through_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_biogeo_old_calib_0_40,
select_trait_levels = "all",
time_range = c(0, 50),
colors_per_levels = colors_per_ranges,
plot_CI = TRUE,
CI_type = "quantiles_rect",
CI_quantiles = 0.9,
display_plot = FALSE,
# PDF_file_path = "./plotTT_biogeographic.pdf",
return_mean_data_per_samples_df = TRUE,
return_median_data_across_samples_df = TRUE)
# Explore output
# str(plotTT_biogeographic, max.level = 1)
# Adjust aesthetics of plot a posteriori
plotTT_biogeographic_adj <- plotTT_biogeographic$rates_TT_ggplot +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 15),
axis.title = ggplot2::element_text(size = 14),
axis.text = ggplot2::element_text(size = 12))
print(plotTT_biogeographic_adj)
}
Plot rates vs. trait data for a given focal time
Description
Plot rates vs. trait data as extracted for a given focal time.
Data are extracted from the output of a deepSTRAPP run carried out with
run_deepSTRAPP_for_focal_time() or
run_deepSTRAPP_over_time()).
Returns a single plot showing rates vs. trait data for a given focal time. If the trait data are 'continuous', the plot is a scatter plot. If the trait data are 'categorical' or 'biogeographic', the plot is a boxplot.
If a PDF file path is provided in PDF_file_path, the plot will be saved directly in a PDF file.
Usage
plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs,
focal_time = NULL,
rate_type = "net_diversification",
select_trait_levels = "all",
color_scale = NULL,
colors_per_levels = NULL,
display_plot = TRUE,
PDF_file_path = NULL,
return_mean_rates_vs_trait_data_df = FALSE
)
Arguments
deepSTRAPP_outputs |
List of elements generated with |
focal_time |
Numerical. (Optional) If |
rate_type |
A character string specifying the type of diversification rates to plot.
Must be one of 'speciation', 'extinction' or 'net_diversification' (default).
Even if the |
select_trait_levels |
(Vector of) character string. Only for categorical and biogeographic trait data.
To provide a list of a subset of states/ranges to plot. Names must match the ones found in the |
color_scale |
Vector of character string. List of colors to use to build the color scale with |
colors_per_levels |
Named character string. To set the colors to use to plot data points and box for each state/range. Names = states/ranges; values = colors.
If |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
return_mean_rates_vs_trait_data_df |
Logical. Whether to include in the output the data.frame of mean rates per trait values/states/ranges computed for
each posterior sample at the focal time. Default is |
Details
The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_for_focal_time().
It provides information on results of a STRAPP test performed at a given focal_time.
Plots are built based on both trait data and diversification data as extracted for the given focal_time.
Such data are recorded in the outputs of a deepSTRAPP run carried out with run_deepSTRAPP_for_focal_time()
when return_updated_trait_data_with_Map = TRUE for trait data, and extract_diversification_data_melted_df = TRUE for diversification data.
Please ensure to select those arguments when running deepSTRAPP.
Alternatively, the main input deepSTRAPP_outputs can be the output of run_deepSTRAPP_over_time(),
providing results of STRAPP tests over multiple time-steps. In this case, you must provide a focal_time to select the
unique time-step used for plotting.
-
return_updated_trait_data_with_Mapmust be set toTRUEso that the trait data used to compute the tests are returned among the outputs under$updated_trait_data_with_Map_over_time. Alternatively, and more efficiently,extract_trait_data_melted_dfcan be set toTRUEso that trait data are already returned in a melted data.frame among the outputs under$trait_data_df_over_time. -
extract_diversification_data_melted_dfmust be set toTRUEso that the diversification rates are returned among the outputs under$diversification_data_df_over_time.
For plotting all time-steps at once, see plot_rates_vs_trait_data_over_time().
Value
The function returns a list with at least one element.
-
rates_vs_trait_ggplotAn object of classesggandggplot. This is a ggplot that can be displayed on the console withprint(output$rates_vs_trait_ggplot). It corresponds to the plot being displayed on the console when the function is run, ifdisplay_plot = TRUE, and can be further modify for aesthetics using the ggplot2 grammar.
If the trait data are 'continuous', the plot is a scatter plot showing how diversification rates varies with trait values. If the trait data are 'categorical' or 'biogeographic', the plot is a boxplot showing diversification rates per states/ranges.
Each plot also displays summary statistics for the STRAPP test associated with the data displayed:
An observed statistic computed across the mean traits/ranges and rates values shown on the plot. This is not the statistic of the STRAPP test itself, which is conducted across all BAMM posterior samples.
The quantile of null statistic distribution at the significant threshold used to define test significance. The test will be considered significant (i.e., the null hypothesis is rejected) if this value is higher than zero.
The p-value of the associated STRAPP test.
Optional summary data.frame:
-
mean_rates_vs_trait_data_dfA data.frame with three columns providing the$mean_ratesand$trait_valueobserved along branches atfocal_time. Rates are averaged across all BAMM posterior samples. This is the raw data used to draw the plot. Included ifreturn_mean_rates_vs_trait_data_df = TRUE.
If a PDF_file_path is provided, the function will also generate a PDF file of the plot.
Author(s)
Maël Doré
See Also
Associated functions in deepSTRAPP: run_deepSTRAPP_for_focal_time() plot_rates_vs_trait_data_over_time()
Examples
if (deepSTRAPP::is_dev_version())
{
# ----- Example 1: Continuous trait ----- #
# Load fake trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny with old calibration
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Select a color scheme from lowest to highest values
color_scale = c("darkgreen", "limegreen", "orange", "red")
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)
# (May take several minutes to run)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap = stochastic mapping of continuous trait
plot_contMap(contMap = Ponerinae_contMap,
color_scale = color_scale)
## Set focal time to 10 Mya
focal_time <- 10
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
Ponerinae_deepSTRAPP_cont_old_calib_10My <- run_deepSTRAPP_for_focal_time(
contMap = Ponerinae_contMap,
ace = Ponerinae_ACE,
tip_data = Ponerinae_cont_tip_data,
trait_data_type = "continuous",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
return_perm_data = TRUE,
# Need to be set to TRUE to save diversification data
extract_diversification_data_melted_df = TRUE,
# Need to be set to TRUE to save trait data
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(Ponerinae_deepSTRAPP_cont_old_calib_10My, max.level = 1)
# ----- Plot histogram of STRAPP overall test results from run_deepSTRAPP_for_focal_time() ----- #
# Get plot
rates_vs_trait_output <- plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs = deepPonerinae_deepSTRAPP_cont_old_calib_10My,
color_scale = c("grey80", "orange"),
display_plot = TRUE,
# PDF_file_path = "./plot_rates_vs_trait_10My.pdf"
return_mean_rates_vs_trait_data_df = TRUE)
# Adjust aesthetics a posteriori
rates_vs_trait_ggplot_adj <- rates_vs_trait_output$rates_vs_trait_ggplot +
ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
print(rates_vs_trait_ggplot_adj)
# Explore melted data.frame of mean rates and trait values extracted for the given focal time.
head(rates_vs_trait_output$mean_rates_vs_trait_data_df)
# ----- Plot histogram of STRAPP overall test results from run_deepSTRAPP_over_time() ----- #
## Load directly outputs from run_deepSTRAPP_over_time()
data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Select focal_time = 10My
focal_time <- 10
# Get plot
rates_vs_trait_output <- plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
focal_time = focal_time,
color_scale = c("grey80", "purple"),
display_plot = TRUE)
# PDF_file_path = "./plot_rates_vs_trait_10My.pdf"
# Adjust aesthetics a posteriori
rates_vs_trait_ggplot_adj <- rates_vs_trait_output$rates_vs_trait_ggplot +
ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
print(rates_vs_trait_ggplot_adj)
# ----- Example 2: Categorical trait ----- #
## Load data
# Load phylogeny
data(Ponerinae_tree, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract categorical data with 3-levels
Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
table(Ponerinae_cat_3lvl_tip_data)
# Select color scheme for states
colors_per_states <- c("forestgreen", "sienna", "goldenrod")
names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")
# (May take several minutes to run)
## Produce densityMaps using stochastic character mapping based on an ARD Mk model
Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
tip_data = Ponerinae_cat_3lvl_tip_data,
phylo = Ponerinae_tree_old_calib,
trait_data_type = "categorical",
colors_per_states = colors_per_states,
evolutionary_models = "ARD", # Use default ARD model
nb_simulations = 100, # Reduce number of simulations to save time
seed = 1234, # Seet seed for reproducibility
return_best_model_fit = TRUE,
return_model_selection_df = TRUE,
plot_map = FALSE)
# Load directly output
data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")
## Set focal time to 10 Mya
focal_time <- 10
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My <- run_deepSTRAPP_for_focal_time(
densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
ace = Ponerinae_cat_3lvl_data_old_calib$ace,
tip_data = Ponerinae_cat_3lvl_tip_data,
trait_data_type = "categorical",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
posthoc_pairwise_tests = TRUE,
return_perm_data = TRUE,
# Need to be set to TRUE to save diversification data
extract_diversification_data_melted_df = TRUE,
# Need to be set to TRUE to save trait data
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My, max.level = 1)
## Plot rates vs. states
rates_vs_trait_output <- plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_10My,
focal_time = 10,
select_trait_levels = c("arboreal", "terricolous"), # Select only two levels
colors_per_levels = colors_per_states[c("arboreal", "terricolous")], # Adjust colors
display_plot = TRUE,
# PDF_file_path = "./plot_rates_vs_trait_10My.pdf",
return_mean_rates_vs_trait_data_df = TRUE)
# Adjust aesthetics a posteriori
rates_vs_trait_ggplot_adj <- rates_vs_trait_output$rates_vs_trait_ggplot +
ggplot2::theme(plot.title = ggplot2::element_text(color = "red", size = 15))
print(rates_vs_trait_ggplot_adj)
# Explore melted data.frame of mean rates and states extracted for the given focal time.
head(rates_vs_trait_output$mean_rates_vs_trait_data_df)
}
Plot rates vs. trait data over time-steps
Description
Plot rates vs. trait data as extracted for all trait_data_dftime-steps.
Data are extracted from the output of a deepSTRAPP run carried out with
run_deepSTRAPP_over_time()) over multiple time-steps.
For each time-step, returns a plot showing rates vs. trait data. If the trait data are 'continuous', the plot is a scatter plot. If the trait data are 'categorical' or 'biogeographic', the plot is a boxplot.
If a PDF file path is provided in PDF_file_path, all plots will be saved directly in a unique PDF file,
with one page per plot/time-step.
Usage
plot_rates_vs_trait_data_over_time(
deepSTRAPP_outputs,
rate_type = "net_diversification",
select_trait_levels = "all",
color_scale = NULL,
colors_per_levels = NULL,
display_plot = TRUE,
PDF_file_path = NULL,
return_mean_rates_vs_trait_data_df = FALSE
)
Arguments
deepSTRAPP_outputs |
List of elements generated with |
rate_type |
A character string specifying the type of diversification rates to plot.
Must be one of 'speciation', 'extinction' or 'net_diversification' (default).
Even if the |
select_trait_levels |
(Vector of) character string. Only for categorical and biogeographic trait data.
To provide a list of a subset of states/ranges to plot. Names must match the ones found in the |
color_scale |
Vector of character string. List of colors to use to build the color scale with |
colors_per_levels |
Named character string. To set the colors to use to plot data points and box for each state/range. Names = states/ranges; values = colors.
If |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
return_mean_rates_vs_trait_data_df |
Logical. Whether to include in the output the data.frame of mean rates per trait values/states/ranges computed for
each posterior sample over all time-steps. Default is |
Details
The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_over_time().
It provides information on results of a STRAPP test performed over multiple time-steps.
Plots are built based on both trait data and diversification data as extracted for the time-steps.
Such data are recorded in the outputs of a deepSTRAPP run carried out with run_deepSTRAPP_over_time().
-
return_updated_trait_data_with_Mapmust be set toTRUEso that the trait data used to compute the tests are returned among the outputs under$updated_trait_data_with_Map_over_time. Alternatively, and more efficiently,extract_trait_data_melted_dfcan be set toTRUEso that trait data are already returned in a melted data.frame among the outputs under$trait_data_df_over_time. -
extract_diversification_data_melted_dfmust be set toTRUEso that the diversification rates are returned among the outputs under$diversification_data_df_over_time.
For plotting a single focal_time, see plot_rates_vs_trait_data_for_focal_time().
Value
The function returns a list with at least one element.
-
rates_vs_trait_ggplotsA list of objects of classesggandggplotordered as in$time_steps. Each element corresponds to a ggplot for a givenfocal_time. They can be displayed on the console withprint(output$rates_vs_trait_ggplots[[i]]). They correspond to the plots being displayed on the console one by one when the function is run, ifdisplay_plot = TRUE, and can be further modify for aesthetics using the ggplot2 grammar.
If the trait data are 'continuous', the plots are scatter plots showing how diversification rates varies with trait values. If the trait data are 'categorical' or 'biogeographic', the plots are boxplots showing diversification rates per states/ranges.
Each plot also displays summary statistics for the STRAPP test associated with the data displayed:
An observed statistic computed across the mean traits/ranges and rates values shown on the plot. This is not the statistic of the STRAPP test itself, which is conducted across all BAMM posterior samples.
The quantile of null statistic distribution at the significant threshold used to define test significance. The test will be considered significant (i.e., the null hypothesis is rejected) if this value is higher than zero.
The p-value of the associated STRAPP test.
Optional summary data.frame:
-
mean_rates_vs_trait_data_dfA data.frame with three columns providing the$mean_ratesand$trait_valueobserved along branches at the differentfocal_time. Rates are averaged across all BAMM posterior samples. This is the raw data used to draw each plot for eachfocal_time. Included ifreturn_mean_rates_vs_trait_data_df = TRUE.
If a PDF_file_path is provided, the function will also generate a unique PDF file with one plot/page per $time_steps.
Author(s)
Maël Doré
See Also
Associated functions in deepSTRAPP: run_deepSTRAPP_over_time() plot_rates_vs_trait_data_for_focal_time()
Examples
if (deepSTRAPP::is_dev_version())
{
# ----- Example 1: Continuous trait ----- #
# Load fake trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny with old calibration
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Select a color scheme from lowest to highest values
color_scale = c("darkgreen", "limegreen", "orange", "red")
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)
# (May take several minutes to run)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap = stochastic mapping of continuous trait
plot_contMap(contMap = Ponerinae_contMap,
color_scale = color_scale)
## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
# nb_time_steps <- 5
time_step_duration <- 5
time_range <- c(0, 40)
## Run deepSTRAPP on net diversification rates
Ponerinae_deepSTRAPP_cont_old_calib_0_40 <- run_deepSTRAPP_over_time(
contMap = Ponerinae_contMap,
ace = Ponerinae_ACE,
tip_data = Ponerinae_cont_tip_data,
trait_data_type = "continuous",
BAMM_object = Ponerinae_BAMM_object_old_calib,
# nb_time_steps = nb_time_steps,
time_range = time_range,
time_step_duration = time_step_duration,
return_perm_data = TRUE,
# Need to be set to TRUE to save trait data
extract_trait_data_melted_df = TRUE,
# Need to be set to TRUE to save diversification data
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly trait data output
data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Explore output
str(Ponerinae_deepSTRAPP_cont_old_calib_0_40, max.level = 1)
## Plot for all time-steps
rates_vs_trait_outputs <- plot_rates_vs_trait_data_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
color_scale = c("grey80", "purple"), # Adjust color scale
display_plot = TRUE,
# PDF_file_path = "./plot_rates_vs_trait_0_40My.pdf",
return_mean_rates_vs_trait_data_df = TRUE)
## Print plot for time step 3 = 10 My
print(rates_vs_trait_outputs$rates_vs_trait_ggplots[[3]])
## Explore melted data.frame of rates and trait data
head(rates_vs_trait_outputs$mean_rates_vs_trait_data_df)
# ----- Example 2: Categorical data ----- #
## Load data
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract categorical data with 3-levels
Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
table(Ponerinae_cat_3lvl_tip_data)
# Select color scheme for states
colors_per_states <- c("forestgreen", "sienna", "goldenrod")
names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")
# (May take several minutes to run)
## Produce densityMaps using stochastic character mapping based on an ARD Mk model
Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
tip_data = Ponerinae_cat_3lvl_tip_data,
phylo = Ponerinae_tree_old_calib,
trait_data_type = "categorical",
colors_per_levels = colors_per_states,
evolutionary_models = "ARD",
nb_simulations = 100,
return_best_model_fit = TRUE,
return_model_selection_df = TRUE,
plot_map = FALSE)
# Load directly trait data output
data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")
## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
# nb_time_steps <- 5
time_step_duration <- 5
time_range <- c(0, 40)
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates across time-steps.
Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40 <- run_deepSTRAPP_over_time(
densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
ace = Ponerinae_cat_3lvl_data_old_calib$ace,
tip_data = Ponerinae_cat_3lvl_tip_data,
trait_data_type = "categorical",
BAMM_object = Ponerinae_BAMM_object_old_calib,
# nb_time_steps = nb_time_steps,
time_range = time_range,
time_step_duration = time_step_duration,
rate_type = "net_diversification",
seed = 1234, # Set for reproducibility
alpha = 0.10, # Select a generous level of significance for the sake of the example
posthoc_pairwise_tests = TRUE,
return_perm_data = TRUE,
# Need to be set to TRUE to save trait data
extract_trait_data_melted_df = TRUE,
# Need to be set to TRUE to save diversification data
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly deepSTRAPP output
data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Explore output
str(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, max.level = 1)
# Adjust color scheme
colors_per_states <- c("orange", "dodgerblue", "red")
names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")
## Plot for all time-steps
rates_vs_trait_outputs <- plot_rates_vs_trait_data_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40,
colors_per_levels = colors_per_states, # Adjust color scheme
display_plot = TRUE,
# PDF_file_path = "./plot_rates_vs_trait_0_40My.pdf",
return_mean_rates_vs_trait_data_df = TRUE)
## Print plot for time step 3 = 10 My
print(rates_vs_trait_outputs$rates_vs_trait_ggplots[[3]])
## Explore melted data.frame of rates and states
head(rates_vs_trait_outputs$mean_rates_vs_trait_data_df)
}
Plot trait/range evolution vs. diversification rates and regime shifts on phylogeny
Description
Plot two mapped phylogenies with evolutionary data with branches cut off at focal_time.
Left facet: plot the evolution of trait data/geographic ranges on the left time-calibrated phylogeny.
For continuous data: Each branch is colored according to the estimates value of the traits.
For categorical and biogeographic data: Each branch is colored according to the posterior probability of being in a given state/range. Color for each state/range are overlaid using transparency to produce a single plot for all states/ranges.
Right facet: plot the evolution of diversification rates and location of regime shits estimated from a BAMM (Bayesian Analysis of Macroevolutionary Mixtures). Each branch is colored according to the estimated rates of speciation, extinction, or net diversification stored in an object of class
bammdata. Rates can vary along time, thus colors evolved along individual branches.
This function is a wrapper multiple plotting functions:
For continuous traits:
plot_contMap()For categorical and biogeographic data:
plot_densityMaps_overlay()For BAMM rates and regime shifts:
plot_BAMM_rates()
Usage
plot_traits_vs_rates_on_phylogeny_for_focal_time(
deepSTRAPP_outputs,
focal_time = NULL,
color_scale = NULL,
colors_per_levels = NULL,
add_ACE_pies = TRUE,
cex_pies = 0.5,
rate_type = "net_diversification",
keep_initial_colorbreaks = FALSE,
add_regime_shifts = TRUE,
configuration_type = "MAP",
sample_index = 1,
adjust_size_to_prob = TRUE,
regimes_fill = "grey",
regimes_size = 1,
regimes_pch = 21,
regimes_border_col = "black",
regimes_border_width = 1,
...,
display_plot = TRUE,
PDF_file_path = NULL
)
Arguments
deepSTRAPP_outputs |
List of elements generated with |
focal_time |
Numerical. (Optional) If |
color_scale |
Vector of character string. List of colors to use to build the color scale with |
colors_per_levels |
Named character string. To set the colors to use to map each state/range posterior probabilities. Names = states/ranges; values = colors.
If |
add_ACE_pies |
Logical. Whether to add pies of posterior probabilities of states/ranges at internal nodes on the mapped phylogeny. Default = |
cex_pies |
Numerical. To adjust the size of the ACE pies. Default = |
rate_type |
A character string specifying the type of diversification rates to plot. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). |
keep_initial_colorbreaks |
Logical. Whether to keep the same color breaks as used for the most recent focal time. Typically, the current time (t = 0).
This will only works if you provide the output of |
add_regime_shifts |
Logical. Whether to add the location of regime shifts on the phylogeny (Step 2). Default is |
configuration_type |
A character string specifying how to select the location of regime shifts across posterior samples.
|
sample_index |
Integer. Index of the posterior samples to use to plot the location of regime shifts.
Used only if |
adjust_size_to_prob |
Logical. Whether to scale the size of the symbols showing the location of regime shifts according to
the marginal shift probability of the shift happening on each location/branch. This will only works if there is an |
regimes_fill |
Character string. Set the color of the background of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_size |
Numerical. Set the size of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_pch |
Integer. Set the shape of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_border_col |
Character string. Set the color of the border of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_border_width |
Numerical. Set the width of the border of the symbols showing the location of regime shifts.
Equivalent to the |
... |
Additional graphical arguments to pass down to |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
Details
The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_for_focal_time().
It provides information on results of a STRAPP test performed at a given focal_time, and can also encompass
updated phylogenies with mapped trait evolution and diversification rates and regimes shifts if appropriate arguments are set.
-
return_updated_trait_data_with_Mapmust be set toTRUEso that the trait data extracted for the givenfocal_timeand the updated version of mapped phylogeny (contMap/densityMaps) are returned among the outputs under$updated_trait_data_with_Map. The updatedcontMap/densityMapsconsists in cutting off branches and mappings that are younger than thefocal_time. -
return_updated_BAMM_objectmust be set toTRUEso that theupdated_BAMM_objectwith phylogeny and mapped diversification rates cut-off at thefocal_timeare returned among the outputs under$updated_BAMM_object.
$MAP_BAMM_object and $MSC_BAMM_object elements are required in $updated_BAMM_object to plot regime shift locations following the
"MAP" or "MSC" configuration_type respectively.
A $MSP_tree element is required to scale the size of the symbols showing the location of regime shifts according marginal shift probabilities.
(If adjust_size_to_prob = TRUE).
Alternatively, the main input deepSTRAPP_outputs can be the output of run_deepSTRAPP_over_time(),
providing results of STRAPP tests over multiple time-steps. In this case, you must provide a focal_time to select the
unique time-step used for plotting.
-
return_updated_trait_data_with_Mapmust be set toTRUEso that the trait data extracted and the updated version of mapped phylogenies (contMap/densityMaps) are returned among the outputs under$updated_trait_data_with_Map_over_time. -
return_updated_BAMM_objectmust be set toTRUEso that theBAMM_objectswith phylogeny and mapped diversification rates cut-off at the specified time-steps are returned among the outputs under$updated_BAMM_objects_over_time.
For plotting all time-steps at once, see plot_traits_vs_rates_on_phylogeny_over_time().
Value
If display_plot = TRUE, the function displays a plot with two facets in the R console:
(Left) A time-calibrated phylogeny displaying the evolution of trait/biogeographic data.
(Right) A time-calibrated phylogeny displaying diversification rates and regime shifts.
If PDF_file_path is provided, the plot will be exported in a PDF file.
Author(s)
Maël Doré
See Also
phytools::plot.densityMap() plot_densityMaps_overlay() plot_BAMM_rates()
Functions in deepSTRAPP needed to produce the deepSTRAPP_outputs as input: run_deepSTRAPP_for_focal_time() run_deepSTRAPP_over_time()
Function in deepSTRAPP to plot all time-steps at once: plot_traits_vs_rates_on_phylogeny_over_time()
Examples
if (deepSTRAPP::is_dev_version())
{
# ----- Example 1: Continuous trait ----- #
## Load data
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny with old calibration
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Select a color scheme from lowest to highest values
color_scale = c("darkgreen", "limegreen", "orange", "red")
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)
# (May take several minutes to run)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap = stochastic mapping of continuous trait
plot_contMap(contMap = Ponerinae_contMap,
color_scale = color_scale)
## Set focal time to 10 Mya
focal_time <- 10
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
Ponerinae_deepSTRAPP_cont_old_calib_10My <- run_deepSTRAPP_for_focal_time(
contMap = Ponerinae_contMap,
ace = Ponerinae_ACE,
tip_data = Ponerinae_cont_tip_data,
trait_data_type = "continuous",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
return_perm_data = TRUE,
extract_diversification_data_melted_df = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(Ponerinae_deepSTRAPP_cont_old_calib_10My, max.level = 1)
## Plot updated contMap vs. updated diversification rates
plot_traits_vs_rates_on_phylogeny_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_10My,
keep_initial_colorbreaks = TRUE, # To use the same color breaks as for t = 0 My
color_scale = c("limegreen", "orange", "red"), # Adjust color scale on contMap
legend = TRUE, labels = TRUE, # Show legend and label on BAMM plot
cex = 0.7) # Adjust label size on contMap
# PDF_file_path = "Updated_maps_cont_10My.pdf")
# ----- Example 2: Biogeographic data ----- #
## Load data
# Load phylogeny
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_binary_range_table, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare range data for Old World vs. New World
# No overlap in ranges
table(Ponerinae_binary_range_table$Old_World, Ponerinae_binary_range_table$New_World)
Ponerinae_NO_data <- stats::setNames(object = Ponerinae_binary_range_table$Old_World,
nm = Ponerinae_binary_range_table$Taxa)
Ponerinae_NO_data <- as.character(Ponerinae_NO_data)
Ponerinae_NO_data[Ponerinae_NO_data == "TRUE"] <- "O" # O = Old World
Ponerinae_NO_data[Ponerinae_NO_data == "FALSE"] <- "N" # N = New World
names(Ponerinae_NO_data) <- Ponerinae_binary_range_table$Taxa
table(Ponerinae_NO_data)
colors_per_levels <- c("mediumpurple2", "peachpuff2")
names(colors_per_levels) <- c("N", "O")
# (May take several minutes to run)
## Run evolutionary models
Ponerinae_biogeo_data <- prepare_trait_data(
tip_data = Ponerinae_NO_data,
trait_data_type = "biogeographic",
phylo = Ponerinae_tree_old_calib,
evolutionary_models = "DEC+J", # Default = "DEC" for biogeographic
BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
keep_BioGeoBEARS_files = FALSE,
prefix_for_files = "Ponerinae_old_calib",
max_range_size = 2,
split_multi_area_ranges = TRUE, # Set to TRUE to display the two outputs
nb_simulations = 100, # Reduce to save time (Default = '1000')
colors_per_levels = colors_per_levels,
return_model_selection_df = TRUE,
verbose = TRUE)
# Load directly output
data(Ponerinae_biogeo_data_old_calib, package = "deepSTRAPP")
## Explore output
str(Ponerinae_biogeo_data_old_calib, 1)
## Set focal time to 10 Mya
focal_time <- 10
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
Ponerinae_deepSTRAPP_biogeo_old_calib_10My <- run_deepSTRAPP_for_focal_time(
densityMaps = Ponerinae_biogeo_data_old_calib$densityMaps,
ace = Ponerinae_biogeo_data_old_calib$ace,
tip_data = Ponerinae_NO_data,
trait_data_type = "biogeographic",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
return_perm_data = TRUE,
extract_diversification_data_melted_df = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(Ponerinae_deepSTRAPP_biogeo_old_calib_10My, max.level = 1)
## Plot updated contMap vs. updated diversification rates
plot_traits_vs_rates_on_phylogeny_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_biogeo_old_calib_10My,
# Adjust colors on densityMaps
colors_per_levels = c("N" = "dodgerblue2", "O" = "orange"),
# Show legend and label on BAMM plot
legend = TRUE, labels = TRUE,
cex_pies = 0.2, # Adjust size of ACE pies on densityMaps
cex = 0.7) # Adjust size of tip labels on BAMM plot
# PDF_file_path = "Updated_maps_biogeo_old_calib_10My.pdf")
# ----- Example 3: With outputs over_time ----- #
## Load directly outputs from run_deepSTRAPP_over_time()
data(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Explore output
str(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, max.level = 1)
## Plot updated contMap vs. updated diversification rates for focal_time = 40My
plot_traits_vs_rates_on_phylogeny_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_biogeo_old_calib_0_40,
focal_time = 40, # Select focal_time = 40My
# Adjust colors on densityMaps
colors_per_levels = c("N" = "dodgerblue2", "O" = "orange"),
# Show legend and label on BAMM plot
legend = TRUE, labels = TRUE,
cex_pies = 0.2, # Adjust size of ACE pies on densityMaps
cex = 0.7) # Adjust size of tip labels on BAMM plot
# PDF_file_path = "Updated_maps_biogeo_old_calib_40My.pdf")
}
Plot multiple mapped phylogenies of trait/range evolution vs. diversification rates and regime shifts over time-steps
Description
Plot two mapped phylogenies with evolutionary data with branches cut off for each time-step.
Main input = output of a deepSTRAPP run over time using run_deepSTRAPP_over_time()).
Left facet: plot the evolution of trait data/geographic ranges on the left time-calibrated phylogeny.
For continuous data: Each branch is colored according to the estimates value of the traits.
For categorical and biogeographic data: Each branch is colored according to the posterior probability of being in a given state/range. Color for each state/range are overlaid using transparency to produce a single plot for all states/ranges.
Right facet: plot the evolution of diversification rates and location of regime shits estimated from a BAMM (Bayesian Analysis of Macroevolutionary Mixtures). Each branch is colored according to the estimated rates of speciation, extinction, or net diversification stored in an object of class
bammdata. Rates can vary along time, thus colors evolved along individual branches.
This function is a wrapper of run_deepSTRAPP_for_focal_time() used to plot mapped phylogenies for
a unique given focal_time.
Returns one plot per focal time in $time_steps.
If a PDF file path is provided in PDF_file_path, the plots will be saved directly in a PDF file,
with one page per focal time in $time_steps.
Usage
plot_traits_vs_rates_on_phylogeny_over_time(
deepSTRAPP_outputs,
color_scale = NULL,
colors_per_levels = NULL,
add_ACE_pies = TRUE,
cex_pies = 0.5,
rate_type = "net_diversification",
keep_initial_colorbreaks = FALSE,
add_regime_shifts = TRUE,
configuration_type = "MAP",
sample_index = 1,
adjust_size_to_prob = TRUE,
regimes_fill = "grey",
regimes_size = 1,
regimes_pch = 21,
regimes_border_col = "black",
regimes_border_width = 1,
...,
display_plot = TRUE,
PDF_file_path = NULL
)
Arguments
deepSTRAPP_outputs |
List of elements generated with |
color_scale |
Vector of character string. List of colors to use to build the color scale with |
colors_per_levels |
Named character string. To set the colors to use to map each state/range posterior probabilities. Names = states/ranges; values = colors.
If |
add_ACE_pies |
Logical. Whether to add pies of posterior probabilities of states/ranges at internal nodes on the mapped phylogeny. Default = |
cex_pies |
Numerical. To adjust the size of the ACE pies. Default = |
rate_type |
A character string specifying the type of diversification rates to plot. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). |
keep_initial_colorbreaks |
Logical. Whether to keep the same color breaks as used for the most recent focal time. Typically, the current time (t = 0). Default = |
add_regime_shifts |
Logical. Whether to add the location of regime shifts on the phylogeny (Step 2). Default is |
configuration_type |
A character string specifying how to select the location of regime shifts across posterior samples.
|
sample_index |
Integer. Index of the posterior samples to use to plot the location of regime shifts.
Used only if |
adjust_size_to_prob |
Logical. Whether to scale the size of the symbols showing the location of regime shifts according to
the marginal shift probability of the shift happening on each location/branch. This will only works if there is an |
regimes_fill |
Character string. Set the color of the background of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_size |
Numerical. Set the size of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_pch |
Integer. Set the shape of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_border_col |
Character string. Set the color of the border of the symbols showing the location of regime shifts.
Equivalent to the |
regimes_border_width |
Numerical. Set the width of the border of the symbols showing the location of regime shifts.
Equivalent to the |
... |
Additional graphical arguments to pass down to |
display_plot |
Logical. Whether to display the plot generated in the R console. Default is |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
Details
The main input deepSTRAPP_outputs is the typical output of run_deepSTRAPP_over_time().
It provides information on results of a STRAPP run performed over multiple time-steps, and can also encompass
updated phylogenies with mapped trait evolution and diversification rates and regimes shifts if appropriate arguments are set.
-
return_updated_trait_data_with_Mapmust be set toTRUEso that the trait data extracted for the given$time_stepsand the list of updated versions of mapped phylogenies (contMap/densityMaps) is returned among the outputs under$updated_trait_data_with_Map_over_time. The updatedcontMap/densityMapsconsist in cutting off branches and mappings that are younger than the successive$time_steps. -
return_updated_BAMM_objectmust be set toTRUEso that the list ofupdated_BAMM_objectwith phylogenies and mapped diversification rates cut-off at the successive$time_stepsare returned among the outputs under$updated_BAMM_objects_over_time.
$MAP_BAMM_object and $MSC_BAMM_object elements are required in the elements of $updated_BAMM_objects_over_time to plot regime shift locations
following the "MAP" or "MSC" configuration_type respectively.
A $MSP_tree element is required to scale the size of the symbols showing the location of regime shifts according marginal shift probabilities.
(If adjust_size_to_prob = TRUE).
Alternatively, the main input deepSTRAPP_outputs can be the output of run_deepSTRAPP_over_time(),
providing results of STRAPP tests over multiple time-steps. In this case, you must provide a focal_time to select the
unique time-step used for plotting.
-
return_updated_trait_data_with_Mapmust be set toTRUEso that the trait data extracted and the updated version of mapped phylogenies (contMap/densityMaps) are returned among the outputs under$updated_trait_data_with_Map_over_time. -
return_updated_BAMM_objectmust be set toTRUEso that theBAMM_objectswith phylogeny and mapped diversification rates cut-off at the specified time-steps are returned among the outputs under$updated_BAMM_objects_over_time.
For plotting a single time-step (i.e., $focal_time) at once, see plot_traits_vs_rates_on_phylogeny_for_focal_time().
Value
If display_plot = TRUE, the function displays the successive plots with two facets in the R console:
(Left) A time-calibrated phylogeny displaying the evolution of trait/biogeographic data.
(Right) A time-calibrated phylogeny displaying diversification rates and regime shifts. A plot per time-steps is generated and displayed successively in the console.
If PDF_file_path is provided, the plots will be exported in a unique PDF file with one page per time-step.
Author(s)
Maël Doré
See Also
plot_contMap() phytools::plot.densityMap() plot_densityMaps_overlay() plot_BAMM_rates()
Function in deepSTRAPP needed to produce the deepSTRAPP_outputs as input: run_deepSTRAPP_over_time()
Function in deepSTRAPP to a single time-step at once: plot_traits_vs_rates_on_phylogeny_for_focal_time()
Examples
if (deepSTRAPP::is_dev_version())
{
# ----- Example 1: Continuous trait ----- #
## Load data
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny with old calibration
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Select a color scheme from lowest to highest values
color_scale = c("darkgreen", "limegreen", "orange", "red")
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)
# (May take several minutes to run)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap = stochastic mapping of continuous trait
plot_contMap(contMap = Ponerinae_contMap,
color_scale = color_scale)
## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
# nb_time_steps <- 5
time_step_duration <- 5
time_range <- c(0, 40)
## Run deepSTRAPP on net diversification rates
Ponerinae_deepSTRAPP_cont_old_calib_0_40 <- run_deepSTRAPP_over_time(
contMap = Ponerinae_contMap,
ace = Ponerinae_ACE,
tip_data = Ponerinae_cont_tip_data,
trait_data_type = "continuous",
BAMM_object = Ponerinae_BAMM_object_old_calib,
# nb_time_steps = nb_time_steps,
time_range = time_range,
time_step_duration = time_step_duration,
return_perm_data = TRUE,
extract_trait_data_melted_df = TRUE,
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly trait data output
data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Plot updated contMap vs. updated diversification rates
plot_traits_vs_rates_on_phylogeny_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
# To use the same color breaks as for t = 0 My in all BAMM plots
keep_initial_colorbreaks = TRUE,
color_scale = c("limegreen", "orange", "red"), # Adjust color scale on contMap
legend = TRUE, labels = TRUE, # Show legend and label on BAMM plot
cex = 0.7) # Adjust label size on contMap
# PDF_file_path = "Updated_maps_cont_old_calib_0_40My.pdf")
# ----- Example 2: Biogeographic data ----- #
## Load data
# Load phylogeny
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_binary_range_table, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare range data for Old World vs. New World
# No overlap in ranges
table(Ponerinae_binary_range_table$Old_World, Ponerinae_binary_range_table$New_World)
Ponerinae_NO_data <- stats::setNames(object = Ponerinae_binary_range_table$Old_World,
nm = Ponerinae_binary_range_table$Taxa)
Ponerinae_NO_data <- as.character(Ponerinae_NO_data)
Ponerinae_NO_data[Ponerinae_NO_data == "TRUE"] <- "O" # O = Old World
Ponerinae_NO_data[Ponerinae_NO_data == "FALSE"] <- "N" # N = New World
names(Ponerinae_NO_data) <- Ponerinae_binary_range_table$Taxa
table(Ponerinae_NO_data)
colors_per_levels <- c("mediumpurple2", "peachpuff2")
names(colors_per_levels) <- c("N", "O")
# (May take several minutes to run)
## Run evolutionary models
Ponerinae_biogeo_data <- prepare_trait_data(
tip_data = Ponerinae_NO_data,
trait_data_type = "biogeographic",
phylo = Ponerinae_tree_old_calib,
evolutionary_models = "DEC+J", # Default = "DEC" for biogeographic
BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
keep_BioGeoBEARS_files = FALSE,
prefix_for_files = "Ponerinae_old_calib",
max_range_size = 2,
split_multi_area_ranges = TRUE, # Set to TRUE to use only unique areas "A" and "B"
nb_simulations = 100, # Reduce to save time (Default = '1000')
colors_per_levels = colors_per_levels,
return_model_selection_df = TRUE,
verbose = TRUE)
# Load directly output
data(Ponerinae_biogeo_data_old_calib, package = "deepSTRAPP")
## Set for time steps of 5 My. Will generate deepSTRAPP workflows from 0 to 40 Mya.
time_range <- c(0, 40)
time_step_duration <- 5
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates for time-steps = 0 to 40 Mya.
Ponerinae_deepSTRAPP_biogeo_old_calib_0_40 <- run_deepSTRAPP_over_time(
densityMaps = Ponerinae_biogeo_data_old_calib$densityMaps,
ace = Ponerinae_biogeo_data_old_calib$ace,
tip_data = Ponerinae_ON_tip_data,
trait_data_type = "biogeographic",
BAMM_object = Ponerinae_BAMM_object_old_calib,
time_range = time_range,
time_step_duration = time_step_duration,
seed = 1234, # Set seed for reproducibility
alpha = 0.10, # Select a generous level of significance for the sake of the example
rate_type = "net_diversification",
return_perm_data = TRUE,
extract_trait_data_melted_df = TRUE,
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly output
data(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Explore output
str(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, max.level = 1)
## Plot updated contMap vs. updated diversification rates
plot_traits_vs_rates_on_phylogeny_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_biogeo_old_calib_0_40,
# Adjust colors on densityMaps
colors_per_levels = c("N" = "dodgerblue2", "O" = "orange"),
legend = TRUE, labels = TRUE, # Show legend and label on BAMM plot
cex_pies = 0.2, # Adjust size of ACE pies on densityMaps
cex = 0.7) # Adjust label size on contMap
# PDF_file_path = "Updated_maps_biogeo_old_calib_0_40My.pdf")
}
Run a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow
Description
Run a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow
to produce a BAMM_object that contains a phylogenetic tree and associated diversification rates
mapped along branches, across selected posterior samples:
Step 1: Set BAMM - Record BAMM settings and generate all input files needed for BAMM.
Step 2: Run BAMM - Run BAMM and move output files in dedicated directory.
Step 3: Evaluate BAMM - Produce evaluation plots and ESS data.
Step 4: Import BAMM outputs - Load
BAMM_objectin R and subset posterior samples.Step 5: Clean BAMM files - Remove files generated during the BAMM run.
The BAMM_object output is typically used as input to run deepSTRAPP with run_deepSTRAPP_for_focal_time()
or run_deepSTRAPP_over_time(). Diversification rates and regimes shift can be visualized with plot_BAMM_rates().
BAMM is a model of diversification for time-calibrated phylogenies that explores complex diversification dynamics by allowing multiple regime shifts across clades without a priori hypotheses on the location of such shifts. It uses reversible jump Markov chain Monte Carlo (rjMCMC) to automatically explore a vast range of models with different speciation and extinction rates, and different number and location of regime shits.
This function will work only if you have the BAMM C++ program installed in your machine.
See the BAMM website: http://bamm-project.org/ and the companion R package {BAMMtools}.
Usage
prepare_diversification_data(
BAMM_install_directory_path,
phylo,
prefix_for_files = NULL,
seed = NULL,
numberOfGenerations = 10^7,
globalSamplingFraction = 1,
sampleProbsFilename = NULL,
expectedNumberOfShifts = NULL,
eventDataWriteFreq = NULL,
burn_in = 0.25,
nb_posterior_samples = 1000,
additional_BAMM_settings = list(),
BAMM_output_directory_path = NULL,
keep_BAMM_outputs = TRUE,
MAP_odd_ratio_threshold = 5,
skip_evaluations = FALSE,
plot_evaluations = TRUE,
save_evaluations = TRUE
)
Arguments
BAMM_install_directory_path |
Character string. The path to the directory where BAMM is. Use '/' to separate directory and sub-directories. The path must end with '/'. |
phylo |
Time-calibrated phylogeny. Object of class |
prefix_for_files |
Character string. Prefix to add to all BAMM files stored in the |
seed |
Integer. Set the seed to ensure reproducibility. Default is |
numberOfGenerations |
Integer. Number of steps in the MCMC run. It should be set high enough to reach the equilibrium distribution
and allows posterior samples to be uncorrelated. Check the Effective Sample Size of parameters with coda::effectiveSize() in the Evaluation step.
Default value is |
globalSamplingFraction |
Numerical. Global sampling fraction representing the overall proportion of terminals in the phylogeny compared to
the estimated overall richness in the clade. It acts as a multiplier on the rates needed to achieve such extant diversity.
Default is |
sampleProbsFilename |
Character string. The path to the |
expectedNumberOfShifts |
Integer. Set the expected number of regime shifts. It acts as an hyperparameter controlling the exponential prior distribution
used to modulate reversible jumps across model configurations in the rjMCMC run.
If set to |
eventDataWriteFreq |
Integer. Set the frequency in which to write the event data to the output file = the sampling frequency of posterior samples.
If set to |
burn_in |
Numerical. Proportion of posterior samples removed from the BAMM output to ensure that the remaining samples where drawn once the equilibrium distribution was reached.
This can be evaluated looking at the MCMC trace (see Evaluation step). Default is |
nb_posterior_samples |
Numerical. Number of posterior samples to extract, after removing the burn-in, in the final |
additional_BAMM_settings |
List of named elements. Additional settings options for BAMM provided as a list of named arguments.
Ex: |
BAMM_output_directory_path |
Character string. The path to the directory used to store input/output files generated. Use '/' to separate directory and subdirectories. It must end with '/'. |
keep_BAMM_outputs |
Logical. Whether the |
MAP_odd_ratio_threshold |
Numerical. Controls the definition of 'core-shifts' used to distinguish across configurations when fetching the MAP samples.
Shifts that have an odd-ratio of marginal posterior probability / prior lower than |
skip_evaluations |
Logical. Whether to skip the Evaluation step including MCMC trace, ESS, and prior/posterior comparisons for expected number of shifts. Default = |
plot_evaluations |
Logical. Whether to display the plots generated during the Evaluation step: MCMC trace, and prior/posterior comparisons for expected number of shifts. Default = |
save_evaluations |
Logical. Whether to save the outputs of evaluations in a table (ESS), and PDFs (MCMC trace, and prior/posterior comparisons for expected number of shifts)
in the |
Details
This function runs a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow
to produce a BAMM_object that contains a phylogenetic tree and associated diversification rates
mapped along branches, across selected posterior samples.
Step 1: Set BAMM
Produces a tree file for the phylogeny. Default file: 'phylogeny.tree'.
Save configuration settings used for the BAMM run. Default file: 'config_file.txt'.
Save default priors generated by BAMMtools::setBAMMpriors based on the phylogeny. Default file: 'priors.txt'.
Step 2: Run BAMM
Run BAMM using the system console
Move output files in dedicated
BAMM_output_directory. Default directory is./BAMM_outputs/.'run_info.txt' containing a summary of your parameters/settings.
'mcmc_log.txt' containing raw MCMC information useful in diagnosing convergence.
'event_data.txt' containing all evolutionary rate parameters and their topological mappings.
'chain_swap.txt' containing data about each chain swap proposal (when a proposal occurred, which chains might be swapped, and whether the swap was accepted).
'acceptance_info.txt' containing the history of acceptance/proposal of MCMC steps (If additional setting
outputAcceptanceInfois set to 1).
Step 3: Evaluate BAMM
Plot the MCMC trace = evolution of logLik across MCMC generations. Output file = 'MCMC_trace_logLik.pdf'.
Compute the Effective Sample Size (ESS) across posterior samples (after removing burn-in) using
coda::effectiveSize(). This is a way to evaluate if your MCMC runs has enough generations to produce robust estimates. Ideally, ESS should be higher than 200. Output file = 'ESS_df.csv'.Plot the comparison of prior and posterior distributions of the number of regime shifts with BAMMtools::plotPrior. Output file = 'PP_nb_shifts_plot.pdf'. A good value for
expectedNumberOfShiftsis one with high similarities between the distributions hinting that the information in the data coincides with your expectations for the number of regime shifts. The best practice consists in trying several values to control if it affects or not the final output.
Step 4: Import BAMM outputs
Load BAMM outputs with BAMMtools::getEventData.
Subset posterior samples to the requested
nb_posterior_sampleswith BAMMtools::subsetEventData.Record the
$expectedNumberOfShiftsused to set the prior. This is useful for downstream analyses involving comparison of prior vs. posterior probabilities (SeeBAMMtools::distinctShiftConfigurations()).Record the marginal posterior probability of regime shift along branches based on the proportion of samples harboring a regime shift along each branch. (See
BAMMtools::marginalShiftProbsTree()). Result is stored in$MSP_treeas phylogenetic tree with$edge.lengthscaled to the marginal posterior probability.Extract the Maximum A Posteriori probability (MAP) configuration = the configuration of regime shift location found the most frequently among the posterior samples. (See
BAMMtools::getBestShiftConfiguration()). This ignores shifts that have an odd-ratio of marginal posterior probability / prior lower thanMAP_odd_ratio_thresholdto avoid noise from non-core shifts. MAP sample indices are stored in$MAP_indices. Diversification rates and shift locations on branches are then averaged across all MAP samples and recorded as an object of class"bammdata"in$MAP_BAMM_objectwith a single$eventDatatable used to plot regime shifts on the phylogeny withplot_BAMM_rates().Extract the Maximum Shift Credibility (MSC) configuration = the configuration of regime shift location with the highest product of marginal probabilities across branches. (See
BAMMtools::maximumShiftCredibility()). MSC sample indices are stored in$MSC_indices. Diversification rates and shift locations on branches are then averaged across all MSC samples and recorded as an object of class"bammdata"in$MSC_BAMM_objectwith a single$eventDatatable used to plot regime shifts on the phylogeny withplot_BAMM_rates().
Step 5: Clean BAMM files
Remove files generated in Steps 1 & 2 if
keep_BAMM_outputs = FALSE.Delete the
BAMM_output_directoryif empty after cleaning files.
The BAMM_object output:
is typically used as input to run deepSTRAPP with
run_deepSTRAPP_for_focal_time()orrun_deepSTRAPP_over_time().can be used to extract rates and regimes for any
focal_timein the past withupdate_rates_and_regimes_for_focal_time().can be used to map diversification rates and regime shifts on the phylogeny with
plot_BAMM_rates().
Value
The function returns a BAMM_object of class "bammdata" which is a list with at least 22 elements.
Phylogeny-related elements used to plot a phylogeny with ape::plot.phylo():
-
$edgeMatrix of integers. Defines the tree topology by providing rootward and tipward node ID of each edge. -
$NnodeInteger. Number of internal nodes. -
$tip.labelVector of character strings. Labels of all tips. -
$edge.lengthVector of numerical. Length of edges/branches. -
$node.labelVector of character strings. Labels of all internal nodes. (Present only if present in the initialBAMM_object)
BAMM internal elements used for tree exploration:
-
$beginVector of numerical. Absolute time since root of edge/branch start (rootward). -
$endVector of numerical. Absolute time since root of edge/branch end (tipward). -
$downseqVector of integers. Order of node visits when using a pre-order tree traversal. -
$lastvisitID of the last node visited when starting from the node in the corresponding position in$downseq.
BAMM elements summarizing diversification data:
-
$numberEventsVector of integer. Number of events/macroevolutionary regimes (k+1) recorded in each posterior configuration. k = number of shifts. -
$eventDataList of data.frames. One per posterior sample. Records shift events and macroevolutionary regimes parameters. 1st line = Background root regime. -
$eventVectorsList of integer vectors. One per posterior sample. Record regime ID per branches. -
$tipStatesList of named integer vectors. One per posterior sample. Record regime ID per tips. -
$tipLambdaList of named numerical vectors. One per posterior sample. Record speciation rates per tips. -
$tipMuList of named numerical vectors. One per posterior sample. Record extinction rates per tips. -
$eventBranchSegsList of matrix of numerical. One per posterior sample. Record regime ID per segments of branches. -
$meanTipLambdaVector of named numerical. Mean tip speciation rates across all posterior configurations of tips. -
$meanTipMuVector of named numerical. Mean tip extinction rates across all posterior configurations of tips. -
$typeCharacter string. Set the type of data modeled with BAMM. Should be "diversification".
Additional elements providing key information for downstream analyses:
-
$expectedNumberOfShiftsInteger. The expected number of regime shifts used to set the prior in BAMM. -
$MSP_treeObject of classphylo. List of 4 elements duplicating information from the Phylogeny-related elements above, except$MSP_tree$edge.lengthis recording the Marginal Shift Probability of each branch (i.e., the probability of a regime shift to occur along each branch) -
$MAP_indicesVector of integers. The indices of the Maximum A Posteriori probability (MAP) configurations among the posterior samples. -
$MAP_BAMM_object. List of 18 elements of class '"bammdata" recording the mean rates and regime shift locations found across the Maximum A Posteriori probability (MAP) configurations. All BAMM elements summarizing diversification data holds a single entry describing this mean diversification history. -
$MSC_indicesVector of integers. The indices of the Maximum Shift Credibility (MSC) configurations among the posterior samples. -
$MSC_BAMM_objectList of 18 elements of class '"bammdata" recording the mean rates and regime shift locations found across the Maximum Shift Credibility (MSC) configurations. All BAMM elements summarizing diversification data holds a single entry describing this mean diversification history.
The function also produces files listed in the Details section and stored in the the BAMM_output_directory.
Note on diversification models for time-calibrated phylogenies
This function relies on BAMM to provide a reliable solution to map diversification rates and regime shifts on a time-calibrated phylogeny
and obtain the BAMM_object object needed to run the deepSTRAPP workflow (run_deepSTRAPP_for_focal_time, run_deepSTRAPP_over_time).
However, it is one option among others for modeling diversification on phylogenies.
You may wish to explore alternatives models such as LSBDS model in RevBayes (Höhna et al., 2016), the MTBD model (Barido-Sottani et al., 2020),
or the ClaDS2 model (Maliet et al., 2019) for your own data.
However, you will need Bayesian models that infer regime shifts to be able to perform STRAPP tests (Rabosky & Huang, 2016).
Additionally, you need to format the model output such as in BAMM_object, so it can be used in a deepSTRAPP workflow.
This function perform a single BAMM run to infer diversification rates and regime shifts. Due to the stochastic nature of the exploration of the parameter space with MCMC process, best practice recommend to ran multiple runs and check for convergence of the MCMC traces, ensuring that the region of high probability has been reached by your MCMC runs.
Author(s)
Maël Doré
References
For BAMM: Rabosky, D. L. (2014). Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PloS one, 9(2), e89543. doi:10.1371/journal.pone.0089543. Website: http://bamm-project.org/.
For {BAMMtools}: Rabosky, D. L., Grundler, M., Anderson, C., Title, P., Shi, J. J., Brown, J. W., ... & Larson, J. G. (2014).
BAMM tools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5(7), 701-707.
doi:10.1111/2041-210X.12199
See Also
run_deepSTRAPP_for_focal_time() run_deepSTRAPP_over_time() update_rates_and_regimes_for_focal_time() prepare_trait_data() plot_BAMM_rates()
For a guided tutorial, see this vignette: vignette("model_diversification_dynamics", package = "deepSTRAPP")
Examples
# ----- Example 1: Whale phylogeny ----- #
library(phytools)
data(whale.tree)
## Not run:
## You need to install the BAMM C++ software locally prior to run this function
# Visit the official BAMM website (\url{http://bamm-project.org/}) for information.
# Run BAMM workflow with deepSTRAPP
whale_BAMM_object <- prepare_diversification_data(
BAMM_install_directory_path = "./software/bamm-2.5.0/",
phylo = whale.tree,
prefix_for_files = "whale",
numberOfGenerations = 100000, # Set low for the example
BAMM_output_directory_path = tempdir(), # Can be adjusted such as "./BAMM_outputs/"
keep_BAMM_outputs = FALSE, # Adjust if needed
)
## End(Not run)
# Load directly the result
data(whale_BAMM_object)
# Explore output
str(whale_BAMM_object, 1)
# Plot mean net diversification rates and regime shifts on the phylogeny
plot_BAMM_rates(whale_BAMM_object, cex = 0.5,
labels = TRUE, legend = TRUE)
# ----- Example 2: Ponerinae phylogeny ----- #
# Load phylogeny
data("Ponerinae_tree", package = "deepSTRAPP")
plot(Ponerinae_tree, show.tip.label = FALSE)
## Not run:
## You need to install the BAMM C++ software locally prior to run this function
# Visit the official BAMM website (http://bamm-project.org/) for information.
# Run BAMM workflow with deepSTRAPP
Ponerinae_BAMM_object <- prepare_diversification_data(
BAMM_install_directory_path = "./software/bamm-2.5.0/",
phylo = Ponerinae_tree,
prefix_for_files = "Ponerinae",
numberOfGenerations = 10^7, # Set high for optimal run, but will take ages
BAMM_output_directory_path = tempdir(), # Can be adjusted such as "./BAMM_outputs/"
keep_BAMM_outputs = FALSE, # Adjust if needed
)
## End(Not run)
if (deepSTRAPP::is_dev_version())
{
# Load directly the result
data(Ponerinae_BAMM_object)
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Explore output
str(Ponerinae_BAMM_object, 1)
# Plot mean net diversification rates and regime shifts on the phylogeny
plot_BAMM_rates(Ponerinae_BAMM_object,
labels = FALSE, legend = TRUE)
}
Map trait evolution on a time-calibrated phylogeny
Description
Map trait evolution on a time-calibrated phylogeny in several steps:
Step 1: Fit evolutionary models to trait data using Maximum Likelihood.
Step 2: Select the best fitting model comparing AICc.
Step 3: Infer ancestral characters estimates (ACE) at nodes.
Step 4: Run stochastic mapping simulations to generate evolutionary histories compatible with the best model and inferred ACE. (Only for categorical and biogeographic data)
Step 5: Infer ancestral states along branches.
For continuous traits: use interpolation to produce a
contMap.For categorical and biogeographic data: compute posterior frequencies of each state/range to produce a
densityMapfor each state/range.
Usage
prepare_trait_data(
tip_data,
trait_data_type,
phylo,
seed = NULL,
evolutionary_models = NULL,
Q_matrix = NULL,
BioGeoBEARS_directory_path = NULL,
keep_BioGeoBEARS_files = TRUE,
prefix_for_files = NULL,
nb_cores = 1,
max_range_size = 2,
split_multi_area_ranges = FALSE,
...,
res = 100,
nb_simulations = 1000,
color_scale = NULL,
colors_per_levels = NULL,
plot_map = TRUE,
plot_overlay = TRUE,
add_ACE_pies = TRUE,
PDF_file_path = NULL,
return_ace = TRUE,
return_BSM = FALSE,
return_simmaps = FALSE,
return_best_model_fit = FALSE,
return_model_selection_df = FALSE,
verbose = TRUE
)
Arguments
tip_data |
Named numerical or character string vector of trait values/states/ranges at tips.
Names should be ordered as the tip labels in the phylogeny found in |
trait_data_type |
Character string. Type of trait data. Either: "continuous", "categorical" or "biogeographic". |
phylo |
Time-calibrated phylogeny. Object of class |
seed |
Integer. Set the seed to ensure reproducibility. Default is |
evolutionary_models |
(Vector of) character string(s). To provide the set of evolutionary models to fit on the data.
|
Q_matrix |
Custom Q-matrix for categorical data representing transition classes between states.
Transitions with similar integers are estimated with a shared rate parameter.
Transitions with |
BioGeoBEARS_directory_path |
Character string. The path to the directory used to store input/output files generated for/by BioGeoBEARS during biogeographic historical inferences. Use '/' to separate directory and sub-directories. It must end with '/'. Only for biogeographic data. |
keep_BioGeoBEARS_files |
Logical. Whether the |
prefix_for_files |
Character string. Prefix to add to all BioGeoBEARS files stored in the |
nb_cores |
Interger. Number of cores to use for parallel computation during BioGeoBEARS runs. Default = |
max_range_size |
Integer. Maximum number of unique areas encompassed by multi-range areas. Default = |
split_multi_area_ranges |
Logical. Whether to split multi-area ranges across unique areas when mapping ranges.
Ex: For range EW, posterior probabilities will be split equally between Eastern Palearctic (E) and Western Palearctic (W).
Default = |
... |
Additional arguments to be passed down to the functions used to fit models (See |
res |
Integer. Define the number of time steps used to interpolate/estimate trait value/state/range in |
nb_simulations |
Integer. Define the number of simulations generated for stochastic mapping. Default = |
color_scale |
Vector of character string. List of colors to use to build the color scale with |
colors_per_levels |
Named character string. To set the colors to use to map each state/range posterior probabilities. Names = states/ranges; values = colors.
If |
plot_map |
Logical. Whether to plot or not the phylogeny with mapped trait evolution. Default = |
plot_overlay |
Logical. If |
add_ACE_pies |
Logical. Whether to add pies of posterior probabilities of states/ranges at internal nodes on the mapped phylogeny. Default = |
PDF_file_path |
Character string. If provided, the plot will be saved in a PDF file following the path provided here. The path must end with ".pdf". |
return_ace |
Logical. Whether the named vector of ancestral characters estimates (ACE) at internal nodes should be returned in the output. Default = |
return_BSM |
Logical. (Only for Biogeographic data) Whether the summary tables of anagenetic and cladogenetic events generated during the Biogeographic Stochastic Mapping (BSM)
process should be returned in the output. Default = |
return_simmaps |
Logical. Whether the evolutionary histories simulated during stochastic mapping (i.e., |
return_best_model_fit |
Logical. Whether to include the output of the best fitting model in the function output. Default = |
return_model_selection_df |
Logical. Whether to include the data.frame summarizing model comparisons used to select the best fitting model should be returned in the output. Default = |
verbose |
Logical. Should progression be displayed? A message will be printed for every steps in the process. Default is |
Details
Map trait evolution on a time-calibrated phylogeny in several steps:
Step 1: Models are fit using Maximum Likelihood approach:
For "continuous" data models are fit with
geiger::fitContinuous(): "BM", "OU", "EB", "rate_trend", "lambda", "kappa", "delta". Default is"BM".For "categorical" data models are fit with
geiger::fitDiscrete(): "ER", "SYM", "ARD", "meristic", "matrix". Default is"ARD".For "biogeographic" data models are fit with R package
BioGeoBEARS: "BAYAREALIKE", "DIVALIKE", "DEC", "BAYAREALIKE+J", "DIVALIKE+J", "DEC+J". Default is"DEC".
Step 2: Best model is identified among the list of evolutionary_models by comparing the corrected AIC (AICc)
and selecting the model with lowest AICc.
Step 3: For continuous traits: Ancestral characters estimates (ACE) are inferred with phytools::fastAnc on a tree
with modified branch lengths scaled to reflect the evolutionary rates estimated from the best model using phytools::rescale().
Step 4: Stochastic Mapping.
For categorical and biogeographic data, stochastic mapping simulations are performed to generate evolutionary histories compatible with the best model and inferred ACE. Node states/ranges are drawn from the scaled marginal likelihoods of ACE, and states/ranges shifts along branches are simulated according to the transition matrix Q estimated from the best fitting model.
Step 5: Infer ancestral states along branches.
For continuous traits: ancestral trait values along branches are interpolated with
phytools::contMap(). This provides quick estimates of trait value at any point in time, but it does not provide accurate ML estimates in case of models that are time or trait-value dependent (such as "EB" or "OU") as the interpolation used to built the contMap is assuming a constant rate along each branch. However, ancestral trait values at nodes remain accurateFor categorical and biogeographic data: compute posterior frequencies of each state/range among the simulated evolutionary histories (
simmaps) to produce adensityMapfor each state/range that reflects the changes along branches in probability of harboring a given state/range.
Value
The function returns a list with at least two elements.
-
$contMap(For "continuous" data) Object of class"contMap", typically generated withphytools::contMap(), that contains a phylogenetic tree and associated continuous trait mapping. -
$densityMaps(For "categorical" and "biogeographic" data) List of objects of class"densityMap, typically generated withphytools::densityMap(), that contains a phylogenetic tree and associated mapping of probability to harbor a given state/range along branches. The list contains one"densityMapper state/range found in thetip_data. -
$densityMaps_all_ranges(For "biogeographic" data only, ifsplit_multi_area_ranges = TRUE) Same as$densityMaps, but for all ranges including the multi-areas ranges (e.g., AB) while$densityMapswill display posterior probabilities for unique areas only (e.g., A and B), with multi-areas ranges split across the unique areas they encompass. -
$trait_data_typeCharacter string. Record the type of trait data. Either: "continuous", "categorical" or "biogeographic".
If return_ace = TRUE,
-
$aceFor continuous traits: Named vector that record the ancestral characters estimates (ACE) at internal nodes. For categorical and biogeographic data: Matrix that record the posterior probabilities of ancestral states/ranges (characters) estimates (ACE) at internal nodes. Rows are internal nodes. Columns are states/ranges. Values are posterior probabilities of each state per node. -
$ace_all_rangesFor biogeographic data, ifsplit_multi_area_ranges = TRUE: Named vector that record the ancestral characters estimates (ACE) at internal nodes, but including all ranges observed in the simmaps, including the multi-areas ranges (e.g., AB), while$acewill display posterior probabilities for unique areas only (e.g., A and B), with multi-areas ranges split across the unique areas they encompass.
If return_BSM = TRUE, (Only for biogeographic data)
-
$BSM_outputList of two lists that contains summary information of cladogenetic ($RES_caldo_events_tables) and anagenetic ($RES_ana_events_tables) events recording across the N simulations of biogeographic histories performed during Biogeographic Stochastic Mapping (BSM). Each element of the list is a data.frame recording events occurring during one simulation.
If return_simmaps = TRUE, (Only for categorical and biogeographic data)
-
$simmapsList that contains as many objects of class"simmap"thatnb_simulationswere requested. Each simmap object is a phylogeny with one simulated geographic history (i.e., transitions in geographic ranges) mapped along branches.
If return_best_model_fit = TRUE,
-
$best_model_fitList that provides the output of the best fitting model.
If model_selection_df = TRUE,
-
$model_selection_dfData.frame that summarizes model comparisons used to select the best fitting model.
For biogeographic data, the function also produce input and output files associated with BioGeoBEARS and stored in the directory
specified in BioGeoBEARS_directory_path. The directory and its content are kept if keep_BioGeoBEARS_files = TRUE
Note on macroevolutionary models of trait evolution
This function provides an easy solution to map trait evolution on a time-calibrated phylogeny
and obtain the contMap/densityMaps objects needed to run the deepSTRAPP workflow (run_deepSTRAPP_for_focal_time, run_deepSTRAPP_over_time).
However, it does not explore the most complex options for trait evolution. You may need to explore more complex models to capture the dynamics of trait evolution.
such as trait-dependent multi-rate models (phytools::brownie.lite(), OUwie::OUwie), Bayesian MCMC implementations allowing a thorough exploration
of location and number of regime shifts (Ex: BayesTraits, RevBayes), or RRphylo for a penalized phylogenetic ridge regression approach that allows regime shifts across all branches.
Note on macroevolutionary models of biogeographic history
This function provides an easy solution to infer ancestral geographic ranges using the BioGeoBEARS framework. It allows to directly compare model fits across 6 models: DEC, DEC+J, DIVALIKE, DIVALIKE+J, BAYAREALIKE, BAYAREALIKE+J. It uses a step-wise approach by using MLE estimates of previous runs as staring parameter values when increasing complexity (adding +J parameters). However, it does not explore the most complex options for historical biogeography. You may need to explore more complex models to capture the dynamics of range evolution such as time-stratification with adjacency matrices, dispersal multipliers (+W), distance-based dispersal probabilities (+X), or other features. See for instance, http://phylo.wikidot.com/biogeobears).
The R package BioGeoBEARS is needed for this function to work with biogeographic data.
Please install it manually from: https://github.com/nmatzke/BioGeoBEARS.
Author(s)
Maël Doré
References
For macroevolutionary models in geiger: Pennell, M. W., Eastman, J. M., Slater, G. J., Brown, J. W., Uyeda, J. C., FitzJohn, R. G., ... & Harmon, L. J. (2014). geiger v2. 0: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics, 30(15), 2216-2218. doi:10.1093/bioinformatics/btu181.
For BioGeoBEARS: Matzke, Nicholas J. (2018). BioGeoBEARS: BioGeography with Bayesian (and likelihood) Evolutionary Analysis with R Scripts. version 1.1.1, published on GitHub on November 6, 2018. doi:10.5281/zenodo.1478250. Website: http://phylo.wikidot.com/biogeobears.
See Also
geiger::fitContinuous() geiger::fitDiscrete() phytools::contMap() phytools::densityMap()
For a guided tutorial, see the associated vignettes:
For continuous trait data:
vignette("model_continuous_trait_evolution", package = "deepSTRAPP")For categorical trait data:
vignette("model_categorical_trait_evolution", package = "deepSTRAPP")For biogeographic range data:
vignette("model_biogeographic_range_evolution", package = "deepSTRAPP")
Examples
# ----- Example 1: Continuous data ----- #
## Load phylogeny and tip data
# Load eel phylogeny and tip data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
data("eel.tree", package = "phytools")
data("eel.data", package = "phytools")
# Extract body size
eel_data <- stats::setNames(eel.data$Max_TL_cm,
rownames(eel.data))
# (May take several minutes to run)
## Map trait evolution on the phylogeny
mapped_cont_traits <- prepare_trait_data(
tip_data = eel_data,
trait_data_type = "continuous",
phylo = eel.tree,
evolutionary_models = c("BM", "OU", "lambda", "kappa"),
# Example of an additional argument ('control') that can be provided to geiger::fitContinuous()
control = list(niter = 200),
color_scale = c("darkgreen", "limegreen", "orange", "red"),
plot_map = FALSE,
return_best_model_fit = TRUE,
return_model_selection_df = TRUE,
verbose = TRUE)
## Explore output
plot_contMap(mapped_cont_traits$contMap) # contMap with interpolated trait values
mapped_cont_traits$model_selection_df # Summary of model selection
# Parameter estimates and optimization summary of the best model
# (Here, the best model is Pagel's lambda)
mapped_cont_traits$best_model_fit$opt
mapped_cont_traits$ace # Ancestral character estimates at internal nodes
# ----- Example 2: Categorical data ----- #
## Load phylogeny and tip data
# Load eel phylogeny and tip data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
data("eel.tree", package = "phytools")
data("eel.data", package = "phytools")
# Transform feeding mode data into a 3-level factor
eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
eel_data <- as.character(eel_data)
eel_data[c(1, 5, 6, 7, 10, 11, 15, 16, 17, 24, 25, 28, 30, 51, 52, 53, 55, 58, 60)] <- "kiss"
eel_data <- stats::setNames(eel_data, rownames(eel.data))
table(eel_data)
# Manually define a Q_matrix for rate classes of state transition to use in the 'matrix' model
# Does not allow transitions from state 1 ("bite") to state 2 ("kiss") or state 3 ("suction")
# Does not allow transitions from state 3 ("suction") to state 1 ("bite")
# Set symmetrical rates between state 2 ("kiss") and state 3 ("suction")
Q_matrix = rbind(c(NA, 0, 0), c(1, NA, 2), c(0, 2, NA))
# Set colors per states
colors_per_states <- c("limegreen", "orange", "dodgerblue")
names(colors_per_states) <- c("bite", "kiss", "suction")
# (May take several minutes to run)
## Run evolutionary models
eel_cat_3lvl_data <- prepare_trait_data(tip_data = eel_data, phylo = eel.tree,
trait_data_type = "categorical",
colors_per_levels = colors_per_states,
evolutionary_models = c("ER", "SYM", "ARD", "meristic", "matrix"),
Q_matrix = Q_matrix,
nb_simulations = 1000,
plot_map = TRUE,
plot_overlay = TRUE,
return_best_model_fit = TRUE,
return_model_selection_df = TRUE)
# Load directly output
data(eel_cat_3lvl_data, package = "deepSTRAPP")
## Explore output
plot(eel_cat_3lvl_data$densityMaps[[1]]) # densityMap for state n°1 ("bite")
eel_cat_3lvl_data$model_selection_df # Summary of model selection
# Parameter estimates and optimization summary of the best model
# (Here, the best model is ER)
print(eel_cat_3lvl_data$best_model_fit)$ # Summary of the best evolutionary model
eel_cat_3lvl_data$ace # Posterior probabilities of each state (= ACE) at internal nodes
# ----- Example 3: Biogeographic data ----- #
if (deepSTRAPP::is_dev_version())
{
## The R package 'BioGeoBEARS' is needed for this function to work with biogeographic data.
# Please install it manually from: https://github.com/nmatzke/BioGeoBEARS.
## Load phylogeny and tip data
# Load eel phylogeny and tip data from the R package phytools
# Source: Collar et al., 2014; DOI: 10.1038/ncomms6505
data("eel.tree", package = "phytools")
data("eel.data", package = "phytools")
# Transform feeding mode data into biogeographic data with ranges A, B, and AB.
eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
eel_data <- as.character(eel_data)
eel_data[eel_data == "bite"] <- "A"
eel_data[eel_data == "suction"] <- "B"
eel_data[c(5, 6, 7, 15, 25, 32, 33, 34, 50, 52, 57, 58, 59)] <- "AB"
eel_data <- stats::setNames(eel_data, rownames(eel.data))
table(eel_data)
colors_per_ranges <- c("dodgerblue3", "gold")
names(colors_per_ranges) <- c("A", "B")
# (May take several minutes to run)
## Run evolutionary models
eel_biogeo_data <- prepare_trait_data(
tip_data = eel_data,
trait_data_type = "biogeographic",
phylo = eel.tree,
# Default = "DEC" for biogeographic
evolutionary_models = c("BAYAREALIKE", "DIVALIKE", "DEC",
"BAYAREALIKE+J", "DIVALIKE+J", "DEC+J"),
BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
keep_BioGeoBEARS_files = FALSE,
prefix_for_files = "eel",
max_range_size = 2,
split_multi_area_ranges = TRUE, # Set to TRUE to display the two outputs
# Reduce the number of Stochastic Mapping simulations to save time (Default = '1000')
nb_simulations = 100,
colors_per_levels = colors_per_ranges,
return_simmaps = TRUE,
return_best_model_fit = TRUE,
return_model_selection_df = TRUE,
verbose = TRUE)
# Load directly output
data(eel_biogeo_data, package = "deepSTRAPP")
## Explore output
str(eel_biogeo_data, 1)
eel_biogeo_data$model_selection_df # Summary of model selection
# Parameter estimates and optimization summary of the best model
# (Here, the best model is DEC+J)
eel_biogeo_data$best_model_fit$optim_result
# Posterior probabilities of each state (= ACE) at internal nodes
eel_biogeo_data$ace # Only with unique areas
eel_biogeo_data$ace_all_ranges # Including multi-area ranges (Here, AB)
## Plot densityMaps
# densityMap for range n°1 ("A")
plot(eel_biogeo_data$densityMaps[[1]])
# densityMaps with all unique areas overlaid
plot_densityMaps_overlay(eel_biogeo_data$densityMaps)
# densityMaps with all ranges (including multi-area ranges) overlaid
plot_densityMaps_overlay(eel_biogeo_data$densityMaps_all_ranges)
}
Run deepSTRAPP to test for a relationship between diversification rates and trait data at a given focal time
Description
Wrapper function to run deepSTRAPP workflow for a given point in the past (i.e. the focal_time).
It starts from traits mapped on a phylogeny (trait data) and BAMM output (diversification data)
and carries out the appropriate statistical method to test for a relationship between diversification rates and trait data.
Tests are based on block-permutations: rates data are randomized across tips following blocks
defined by the diversification regimes identified on each tip (typically from a BAMM).
Such tests are called STructured RAte Permutations on Phylogenies (STRAPP) as described in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. doi:10.1093/sysbio/syv066.
See the original BAMMtools::traitDependentBAMM() function used to
carry out STRAPP test on extant time-calibrated phylogenies.
Tests can be carried out on speciation, extinction and net diversification rates.
Usage
run_deepSTRAPP_for_focal_time(
contMap = NULL,
densityMaps = NULL,
ace = NULL,
tip_data = NULL,
trait_data_type,
BAMM_object,
focal_time,
keep_tip_labels = TRUE,
rate_type = "net_diversification",
seed = NULL,
nb_permutations = NULL,
replace_samples = FALSE,
alpha = 0.05,
two_tailed = TRUE,
one_tailed_hypothesis = NULL,
posthoc_pairwise_tests = FALSE,
p.adjust_method = "none",
return_perm_data = FALSE,
nthreads = 1,
print_hypothesis = TRUE,
extract_diversification_data_melted_df = FALSE,
return_updated_trait_data_with_Map = FALSE,
return_updated_BAMM_object = FALSE,
verbose = TRUE
)
Arguments
contMap |
For continuous trait data. Object of class |
densityMaps |
For categorical trait or biogeographic data. List of objects of class |
ace |
(Optional) Ancestral Character Estimates (ACE) at the internal nodes.
Obtained with
|
tip_data |
(Optional) Named vector of tip values of the trait.
|
trait_data_type |
Character string. Specify the type of trait data. Must be one of "continuous", "categorical", "biogeographic". |
BAMM_object |
Object of class |
focal_time |
Numerical. The time, in terms of time distance from the present, at which data must be extracted and the phylogeny and mappings must be cut. It must be smaller than the root age of the phylogeny. |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip
must retained their initial |
rate_type |
A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). |
seed |
Integer. Set the seed to ensure reproducibility. Default is |
nb_permutations |
Integer. To select the number of random permutations to perform during the tests. If NULL (default), all posterior samples will be used once. |
replace_samples |
Logical. To specify whether to allow 'replacement' (i.e., multiple use) of a posterior sample
when drawing samples used to carry out the STRAPP test. Default is |
alpha |
Numerical. Significance level to use to compute the |
two_tailed |
Logical. To define the type of tests. If
|
one_tailed_hypothesis |
A character string specifying the alternative hypothesis in the one-tailed test. For continuous data, it is either "negative" or "positive" correlation. For binary data, it lists the trait states with states ordered in increasing rates under the alternative hypothesis, separated by a greater-than such as c('A > B'). |
posthoc_pairwise_tests |
Logical. Only for multinominal data (with more than two states). If |
p.adjust_method |
A character string. Only for multinominal data (with more than two states). It specifies the type of correction to apply to the p-values
in the post hoc pairwise tests to account for multiple comparisons. See |
return_perm_data |
Logical. Whether to return the stats data computed from the posterior samples for observed and permuted data in the output.
This is needed to plot the histogram of the null distribution used to assess significance of the test with |
nthreads |
Integer. Number of threads to use for paralleled computing of the STRAPP tests across the permutations.
The R package |
print_hypothesis |
Logical. Whether to print information on what test is carried out, detailing the null and alternative hypotheses,
and what significant level is used to rejected or not the null hypothesis. Default is |
extract_diversification_data_melted_df |
Logical. Specify whether diversification data (regimes ID and tip rates) must be extracted from the |
return_updated_trait_data_with_Map |
Logical. Specify whether the |
return_updated_BAMM_object |
Logical. Specify whether the |
verbose |
Logical. Should progression be displayed? A message will be printed at each stepof the deepSTRAPP workflow,
and for every batch of 100 BAMM posterior samples whose rates are regimes are updated, and optionally extracted in a melted data.frame
(if |
Details
The function encapsulates several functions carrying out each step of the deepSTRAPP workflow:
Extract trait data
extract_most_likely_trait_values_for_focal_time() extracts the most likely trait values
found along branches at the focal_time.
Optionally, the function can update the mapped phylogeny (contMap/densityMaps) such as
branches overlapping the focal_time are shorten to the focal_time, and
the trait mapping for the cut off branches are removed
by updating the $tree$maps and $tree$mapped.edge elements.
Extract diversification data
update_rates_and_regimes_for_focal_time() updates the BAMM_object to obtain
the diversification rates/regimes found along branches the focal_time.
Optionally, the function can update the BAMM_object to display a mapped phylogeny
such as branches overlapping the focal_time are shorten to the focal_time
Extract diversification data in a melted df
If requested (extract_diversification_data_melted_df = TRUE), extract_diversification_data_melted_df_for_focal_time()
will be used to extract regimes ID and tip rates from the updated_BAMM_object and provide a melted data.frame summarizing the diversification data
as found on the phylogeny for the focal_time.
Compute STRAPP test
compute_STRAPP_test_for_focal_time() carries out the appropriate statistical method to test for
a relationship between diversification rates and trait data for a given point in the past (i.e. the focal_time).
It can handle three types of statistical tests depending on the type of trait data provided:
Continuous trait data: Test for correlations with the Spearman's rank correlation test (See stats::cor.test).
Binary trait data (two states): Test for differences in rates between states with the Mann-Whitney-Wilcoxon rank-sum test (See stats::wilcox.test).
Multinominal trait data (More than two states): Test for differences in rates across all states with the Kruskal-Wallis H test (See stats::kruskal.test). If
posthoc_pairwise_tests = TRUE, Dunn's post hoc pairwise rank-sum tests between pairs of states will be carried out too (See dunn.test::dunn.test).
Value
The function returns a list with at least two elements.
-
$STRAPP_resultsList with at least eight elements summarizing the results of the STRAPP tests. Seecompute_STRAPP_test_for_focal_time()for a detailed description of the output. -
$focal_timeInteger. The time, in terms of time distance from the present, at which the data were extracted and the STRAPP test carried out.
Optional formatted output:
-
$diversification_data_dfA data.frame with six columns summarizing the diversification data as found on the phylogeny for thefocal_time. Seeextract_diversification_data_melted_df_for_focal_time()for a detailed description of the output.
Optional data updated for the focal_time:
-
$updated_trait_data_with_MapA list with four elements that contains trait data found at thefocal_timeand an updatedcontMapordensityMapsthat can be used as input ofplot_contMap()orplot_densityMaps_overlay()to display a phylogeny mapped with trait values/states/ranges with branches cut at thefocal_time. Seeextract_most_likely_trait_values_for_focal_time()for a detailed description of the output. -
$updated_BAMM_objectAn updatedBAMM_objectof class"bammdata"that contains rates and regimes ID found at thefocal_time. Can be used as input ofplot_BAMM_rates()to display a phylogeny mapped with diversification rates with branches cut at thefocal_time. Seeupdate_rates_and_regimes_for_focal_time()for a detailed description of the output.
Author(s)
Maël Doré
See Also
extract_most_likely_trait_values_for_focal_time() update_rates_and_regimes_for_focal_time()
extract_diversification_data_melted_df_for_focal_time() compute_STRAPP_test_for_focal_time()
For a guided tutorial on complete deepSTRAPP workflow, see the associated vignettes:
For continuous trait data:
vignette("deepSTRAPP_continuous_data", package = "deepSTRAPP")For categorical trait data:
vignette("deepSTRAPP_categorical_3lvl_data", package = "deepSTRAPP")For biogeographic range data:
vignette("deepSTRAPP_biogeographic_data", package = "deepSTRAPP")
Examples
if (deepSTRAPP::is_dev_version())
{
# ----- Example 1: Continuous trait ----- #
## Load data
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny with old calibration
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Select a color scheme from lowest to highest values
color_scale = c("darkgreen", "limegreen", "orange", "red")
# (May take several minutes to run)
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap = stochastic mapping of continuous trait
plot_contMap(contMap = Ponerinae_contMap,
color_scale = color_scale)
## Set focal time to 10 Mya
focal_time <- 10
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
deepSTRAPP_output <- run_deepSTRAPP_for_focal_time(
contMap = Ponerinae_contMap,
ace = Ponerinae_ACE,
tip_data = Ponerinae_cont_tip_data,
trait_data_type = "continuous",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
return_perm_data = TRUE,
extract_diversification_data_melted_df = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(deepSTRAPP_output, max.level = 1)
# Access deepSTRAPP results
str(deepSTRAPP_output$STRAPP_results)
# Access trait data
head(deepSTRAPP_output$updated_trait_data_with_Map$trait_data)
# Access the diversification data in a melted data.frame
head(deepSTRAPP_output$diversification_data_df)
# Plot rates vs. trait values across branches
plot_rates_vs_trait_data_for_focal_time(deepSTRAPP_output)
# Plot updated contMap
plot_contMap(deepSTRAPP_output$updated_trait_data_with_Map$contMap)
ape::nodelabels(text =
deepSTRAPP_output$updated_trait_data_with_Map$contMap$tree$initial_nodes_ID)
# Plot diversification rates on updated phylogeny
plot_BAMM_rates(deepSTRAPP_output$updated_BAMM_object, labels = TRUE)
# Plot histogram of test stats
plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_output)
# ----- Example 2: Categorical trait ----- #
## Load data
# Load phylogeny
data(Ponerinae_tree, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract categorical data with 3-levels
Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
table(Ponerinae_cat_3lvl_tip_data)
# Select color scheme for states
colors_per_states <- c("forestgreen", "sienna", "goldenrod")
names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")
# (May take several minutes to run)
## Produce densityMaps using stochastic character mapping based on an equal-rates (ER) Mk model
Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
tip_data = Ponerinae_cat_3lvl_tip_data,
phylo = Ponerinae_tree_old_calib,
trait_data_type = "categorical",
colors_per_levels = colors_per_states,
evolutionary_models = "ARD", # Use default ARD model
nb_simulations = 100, # Reduce number of simulations to save time
seed = 1234, # Seet seed for reproducibility
return_best_model_fit = TRUE,
return_model_selection_df = TRUE,
plot_map = FALSE)
# Load directly output
data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")
## Set focal time to 10 Mya
focal_time <- 10
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
deepSTRAPP_output <- run_deepSTRAPP_for_focal_time(
densityMaps = Ponerinae_cat_data_old_calib$densityMaps,
ace = Ponerinae_cat_data_old_calib$ace,
tip_data = Ponerinae_cat_3lvl_tip_data,
trait_data_type = "categorical",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
posthoc_pairwise_tests = TRUE,
return_perm_data = TRUE,
extract_diversification_data_melted_df = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(deepSTRAPP_output, max.level = 1)
# Access deepSTRAPP results
str(deepSTRAPP_output$STRAPP_results, max.level = 2)
# Result for overall Kruskal-Wallis test
deepSTRAPP_output$STRAPP_results[1:3]
# Results for posthoc pairwise Dunn's tests
deepSTRAPP_output$STRAPP_results$posthoc_pairwise_tests$summary_df
# Access trait data
head(deepSTRAPP_output$updated_trait_data_with_Map$trait_data)
# Access the diversification data in a melted data.frame
head(deepSTRAPP_output$diversification_data_df)
# Plot rates vs. states across branches
plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_output,
colors_per_levels = colors_per_states)
# Plot updated densityMaps cut at focal time
plot_densityMaps_overlay(deepSTRAPP_output$updated_trait_data_with_Map$densityMaps)
# Plot diversification rates on updated phylogeny
plot_BAMM_rates(BAMM_object = deepSTRAPP_output$updated_BAMM_object, legend = TRUE, labels = FALSE,
colorbreaks = deepSTRAPP_output$updated_BAMM_object$initial_colorbreaks$net_diversification)
# Plot histogram of Kruskal-Wallis overall test stats
plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_output)
# Plot histograms of posthoc pairwise Dunn's test stats
plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_output,
plot_posthoc_tests = TRUE)
# ----- Example 3: Biogeographic ranges ----- #
## Load data
# Load phylogeny
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_binary_range_table, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare range data for Old World vs. New World
# No overlap in ranges
table(Ponerinae_binary_range_table$Old_World, Ponerinae_binary_range_table$New_World)
Ponerinae_NO_data <- stats::setNames(object = Ponerinae_binary_range_table$Old_World,
nm = Ponerinae_binary_range_table$Taxa)
Ponerinae_NO_data <- as.character(Ponerinae_NO_data)
Ponerinae_NO_data[Ponerinae_NO_data == "TRUE"] <- "O" # O = Old World
Ponerinae_NO_data[Ponerinae_NO_data == "FALSE"] <- "N" # N = New World
names(Ponerinae_NO_data) <- Ponerinae_binary_range_table$Taxa
table(Ponerinae_NO_data)
colors_per_ranges <- c("mediumpurple2", "peachpuff2")
names(colors_per_ranges) <- c("N", "O")
# (May take several minutes to run)
## Run evolutionary models
Ponerinae_biogeo_data <- prepare_trait_data(
tip_data = Ponerinae_NO_data,
trait_data_type = "biogeographic",
phylo = Ponerinae_tree_old_calib,
evolutionary_models = "DEC+J", # Default = "DEC" for biogeographic
BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
keep_BioGeoBEARS_files = FALSE,
prefix_for_files = "Ponerinae_old_calib",
max_range_size = 2,
split_multi_area_ranges = TRUE, # Set to TRUE to display the two outputs
nb_simulations = 100, # Reduce to save time (Default = '1000')
colors_per_levels = colors_per_ranges,
return_model_selection_df = TRUE,
verbose = TRUE)
# Load directly output
data(Ponerinae_biogeo_data_old_calib, package = "deepSTRAPP")
## Explore output
str(Ponerinae_biogeo_data_old_calib, 1)
## Set focal time to 10 Mya
focal_time <- 10
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates for focal time = 10 Mya.
deepSTRAPP_output <- run_deepSTRAPP_for_focal_time(
densityMaps = Ponerinae_biogeo_data_old_calib$densityMaps,
ace = Ponerinae_biogeo_data_old_calib$ace,
tip_data = Ponerinae_NO_data,
trait_data_type = "biogeographic",
BAMM_object = Ponerinae_BAMM_object_old_calib,
focal_time = focal_time,
rate_type = "net_diversification",
return_perm_data = TRUE,
extract_diversification_data_melted_df = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE)
## Explore output
str(deepSTRAPP_output, max.level = 1)
# Access deepSTRAPP results
str(deepSTRAPP_output$STRAPP_results, max.level = 2)
# Result for Mann-Whitney-Wilcoxon test
deepSTRAPP_output$STRAPP_results[1:3]
# Access trait data
head(deepSTRAPP_output$updated_trait_data_with_Map$trait_data)
# Access the diversification data in a melted data.frame
head(deepSTRAPP_output$diversification_data_df)
# Plot rates vs. ranges across branches
plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_output,
colors_per_levels = colors_per_ranges)
# Plot updated densityMaps cut at focal time
plot_densityMaps_overlay(deepSTRAPP_output$updated_trait_data_with_Map$densityMaps)
# Plot diversification rates on updated phylogeny
plot_BAMM_rates(BAMM_object = deepSTRAPP_output$updated_BAMM_object, legend = TRUE, labels = FALSE,
colorbreaks = deepSTRAPP_output$updated_BAMM_object$initial_colorbreaks$net_diversification)
# Plot histogram of Mann-Whitney-Wilcoxon test
plot_histogram_STRAPP_test_for_focal_time(
STRAPP_results = deepSTRAPP_output$STRAPP_results)
}
Run deepSTRAPP to test for a relationship between diversification rates and trait data over multiple time steps
Description
Wrapper function to run deepSTRAPP workflows over multiple time steps in the past.
It starts from traits mapped on a phylogeny (trait data) and BAMM output (diversification data)
and carries out the appropriate statistical method to test for a relationship between diversification rates and trait data.
The workflow is repeated over multiple points in time (i.e. the time_steps) and results are summarized in a data.frame.
The function can also provide summaries of trait values and diversification rates
extracted along branches over the different time_steps.
Statistical tests are based on block-permutations: rates data are randomized across tips following blocks defined by the diversification regimes identified on each tip (typically from a BAMM). Such tests are called STructured RAte Permutations on Phylogenies (STRAPP) as described in Rabosky, D. L., & Huang, H. (2016). A robust semi-parametric test for detecting trait-dependent diversification. Systematic biology, 65(2), 181-193. doi:10.1093/sysbio/syv066.
See the original BAMMtools::traitDependentBAMM() function used to
carry out STRAPP test on extant time-calibrated phylogenies.
Tests can be carried out on speciation, extinction and net diversification rates.
Usage
run_deepSTRAPP_over_time(
contMap = NULL,
densityMaps = NULL,
ace = NULL,
tip_data = NULL,
trait_data_type,
BAMM_object,
time_steps = NULL,
time_range = NULL,
nb_time_steps = NULL,
time_step_duration = NULL,
keep_tip_labels = TRUE,
rate_type = "net_diversification",
seed = NULL,
nb_permutations = NULL,
replace_samples = FALSE,
alpha = 0.05,
two_tailed = TRUE,
one_tailed_hypothesis = NULL,
posthoc_pairwise_tests = FALSE,
p.adjust_method = "none",
return_perm_data = FALSE,
nthreads = 1,
print_hypothesis = TRUE,
extract_trait_data_melted_df = FALSE,
extract_diversification_data_melted_df = FALSE,
return_STRAPP_results = FALSE,
return_updated_trait_data_with_Map = FALSE,
return_updated_BAMM_object = FALSE,
verbose = TRUE,
verbose_extended = FALSE
)
Arguments
contMap |
For continuous trait data. Object of class |
densityMaps |
For categorical trait or biogeographic data. List of objects of class |
ace |
(Optional) Ancestral Character Estimates (ACE) at the internal nodes.
Obtained with
|
tip_data |
(Optional) Named vector of tip values of the trait.
|
trait_data_type |
Character string. Specify the type of trait data. Must be one of "continuous", "categorical", "biogeographic". |
BAMM_object |
Object of class |
time_steps |
Numerical vector. Time steps at which the STRAPP tests should be carried out. If |
time_range |
Vector of two numerical values. Time boundaries within with the |
nb_time_steps, time_step_duration |
Numerical. Number of time steps and duration of each time step used to generate |
keep_tip_labels |
Logical. Specify whether terminal branches with a single descendant tip
must retained their initial |
rate_type |
A character string specifying the type of diversification rates to use. Must be one of 'speciation', 'extinction' or 'net_diversification' (default). |
seed |
Integer. Set the seed to ensure reproducibility. Default is |
nb_permutations |
Integer. To select the number of random permutations to perform during the tests. If NULL (default), all posterior samples will be used once. |
replace_samples |
Logical. To specify whether to allow 'replacement' (i.e., multiple use) of a posterior sample
when drawing samples used to carry out the STRAPP test. Default is |
alpha |
Numerical. Significance level to use to compute the |
two_tailed |
Logical. To define the type of tests. If
|
one_tailed_hypothesis |
A character string specifying the alternative hypothesis in the one-tailed test. For continuous data, it is either "negative" or "positive" correlation. For binary data, it lists the trait states with states ordered in increasing rates under the alternative hypothesis, separated by a greater-than such as c('A > B'). |
posthoc_pairwise_tests |
Logical. Only for multinominal data (with more than two states). If |
p.adjust_method |
A character string. Only for multinominal data (with more than two states). It specifies the type of correction to apply to the p-values
in the post hoc pairwise tests to account for multiple comparisons. See |
return_perm_data |
Logical. Whether to return the stats data computed from the posterior samples for observed and permuted data in the output.
This is needed to plot the histograms of the null distribution used to assess significance of the tests with |
nthreads |
Integer. Number of threads to use for paralleled computing of the STRAPP tests across the permutations.
The R package |
print_hypothesis |
Logical. Whether to print information on what test is carried out, detailing the null and alternative hypotheses,
and what significant level is used to rejected or not the null hypothesis. Default is |
extract_trait_data_melted_df |
Logical. Specify whether trait data must be extracted from the |
extract_diversification_data_melted_df |
Logical. Specify whether diversification data (regimes ID and tip rates) must be extracted from the |
return_STRAPP_results |
Logical. Specify whether the |
return_updated_trait_data_with_Map |
Logical. Specify whether the |
return_updated_BAMM_object |
Logical. Specify whether the |
verbose |
Logical. Should progression per |
verbose_extended |
Should progression per |
Details
The function is a wrapper of run_deepSTRAPP_for_focal_time() that runs the
deepSTRAPP workflow over multiple time_steps.
The deepSTRAPP workflow is described step by step in the run_deepSTRAPP_for_focal_time() documentation.
Its main output is the $pvalues_summary_df: a data.frame providing test stat estimates and p-values obtained across all time_steps,
that can be passed down to plot_STRAPP_pvalues_over_time() to generate a plot showing the evolution of the test results across time.
If using multinominal data (with more than two states) and posthoc_pairwise_tests = TRUE, the output will also contain
a data.frame providing test stat estimates and p-values for post hoc pairwise tests in $pvalues_summary_df_for_posthoc_pairwise_tests.
The function offers options to generate summary data.frames of the data extracted across time_steps:
If
extract_trait_data_melted_df = TRUE, a data.frame of trait values found along branches at each time step is provided in$trait_data_df_over_time.If
extract_diversification_data_melted_df = TRUE, a data.frame of diversification data (regimes ID and tip rates) found along branches at each time step is provided in$diversification_data_df_over_time.Those data.frames can be passed down to
plot_rates_through_time()to generate a plot showing the evolution diversification rates across trait values over time.
The function also allows to keep records of the intermediate objects generated during the STRAPP workflow:
If
return_STRAPP_results = TRUE, a list of STRAPP test outputs is provided in$STRAPP_results_over_time. Combined withreturn_perm_data = TRUE, it allows to plot the histograms of the null distributions used to assess significance of the tests withplot_histogram_STRAPP_test_for_focal_time(). (for a singlefocal_time) andplot_histograms_STRAPP_tests_over_time()(for multipletime_steps).If
return_updated_trait_data_with_Map = TRUE, a list of objects containing trait data and updatedcontMapordensityMapsis provided in$updated_trait_data_with_Map_over_time. UpdatedcontMap/densityMapscan be respectively plotted withplot_contMap()orplot_densityMaps_overlay(), to display a phylogeny mapped with trait values with branches cut at eachfocal_time.If
return_updated_BAMM_object = TRUE, a list of updatedBAMM_objectof class"bammdata"that contains rates and regimes ID found at eachfocal_time. UpdatedBAMM_objectcan be plotted withplot_BAMM_rates()to display a phylogeny mapped with diversification rates with branches cut at eachfocal_time.
Value
The function returns a list with at least five elements.
-
$pvalues_summary_dfData.frame with three columns providing test stat$estimateand$p_valueobtained for each time step (i.e.,$focal_time), that can be passed down toplot_STRAPP_pvalues_over_time()to generate a plot showing the evolution of the test results across time. -
$time_stepsNumerical vector. Time steps at which the STRAPP tests were carried out in the same order as the objects returned in the output lists. -
$trait_data_typeCharacter string. Specify the type of trait data. Possible values are: "continuous", "categorical", "biogeographic". -
$trait_data_type_for_statsCharacter string. The type of trait data used to select statistical method. One of 'continuous', 'binary', or 'multinominal'. -
$rate_typeCharacter string. The type of diversification rates used in the tests: 'speciation', 'extinction' or 'net_diversification'.
Optional summary df for multinominal data, if posthoc_pairwise_tests = TRUE:
-
$pvalues_summary_df_for_posthoc_pairwise_testsData.frame with four or five columns providing test stat$estimate,$p_value, and$p_value_adjusted(ifp.adjust_methodused is not "none") for each$pairof states involved in post hoc Dunn's tests obtained for each time step (i.e.,$focal_time). This data.frame can be passed down toplot_STRAPP_pvalues_over_time()to generate a plot showing the evolution of the post hoc test results across time.
Optional melted data.frames:
-
$trait_data_df_over_timeData.frame with three columns providing$trait_valueassociated with each$tip_IDfound along each time step (i.e.,$focal_time). Setextract_trait_data_melted_df = TRUEto include it in the output. -
$diversification_data_df_over_timeData.frame with six columns providing diversification regimes ($regime_ID) and$ratessorted by$rate_typealong tips ($tip_ID) found across all posterior samples ($BAMM_sample_ID) over each time step (i.e.,$focal_time). Setextract_diversification_data_melted_df = TRUEto include it in the output. Those data.frames can be passed down to
plot_rates_through_time()to generate a plot showing the evolution diversification rates across trait values over time.
Optional objects generated for each time step (i.e., focal_time) and ordered as in $time_steps:
-
$STRAPP_results_over_timeList of objects summarizing the results of the STRAPP tests Seecompute_STRAPP_test_for_focal_time()for a detailed description of the elements in each object. Setreturn_STRAPP_results = TRUEto include it in the output. Combined withreturn_perm_data = TRUE, it allows to plot the histograms of the null distributions used to assess significance of the tests withplot_histogram_STRAPP_test_for_focal_time(). (for a singlefocal_time) andplot_histograms_STRAPP_tests_over_time()(for multipletime_steps). -
$updated_trait_data_with_Map_over_timeList of objects containing trait data and updatedcontMap/densityMaps. UpdatedcontMap/densityMapscan be respectively plotted withplot_contMap()orplot_densityMaps_overlay(), to display a phylogeny mapped with trait values with branches cut at eachfocal_time. -
$updated_BAMM_objects_over_timeList of objects containing rates and regimes ID mapped on phylogeny. UpdatedBAMM_objectcan be plotted withplot_BAMM_rates()to display a phylogeny mapped with diversification rates with branches cut at eachfocal_time.
Author(s)
Maël Doré
See Also
To run the deepSTRAPP workflow for a single focal_time: run_deepSTRAPP_for_focal_time()
extract_most_likely_trait_values_for_focal_time() update_rates_and_regimes_for_focal_time()
extract_diversification_data_melted_df_for_focal_time() compute_STRAPP_test_for_focal_time()
For a guided tutorial on complete deepSTRAPP workflow, see the associated vignettes:
For continuous trait data:
vignette("deepSTRAPP_continuous_data", package = "deepSTRAPP")For categorical trait data:
vignette("deepSTRAPP_categorical_3lvl_data", package = "deepSTRAPP")For biogeographic range data:
vignette("deepSTRAPP_biogeographic_data", package = "deepSTRAPP")
Examples
if (deepSTRAPP::is_dev_version())
{
# ----- Example 1: Continuous trait ----- #
## Load data
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny with old calibration
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract continuous trait data as a named vector
Ponerinae_cont_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cont_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
# Select a color scheme from lowest to highest values
color_scale = c("darkgreen", "limegreen", "orange", "red")
# Get Ancestral Character Estimates based on a Brownian Motion model
# To obtain values at internal nodes
Ponerinae_ACE <- phytools::fastAnc(tree = Ponerinae_tree_old_calib, x = Ponerinae_cont_tip_data)
# (May take several minutes to run)
# Run a Stochastic Mapping based on a Brownian Motion model
# to interpolate values along branches and obtain a "contMap" object
Ponerinae_contMap <- phytools::contMap(Ponerinae_tree, x = Ponerinae_cont_tip_data,
res = 100, # Number of time steps
plot = FALSE)
# Plot contMap = stochastic mapping of continuous trait
plot_contMap(contMap = Ponerinae_contMap,
color_scale = color_scale)
## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
# nb_time_steps <- 5
time_step_duration <- 5
time_range <- c(0, 40)
## Run deepSTRAPP on net diversification rates
Ponerinae_deepSTRAPP_cont_old_calib_0_40 <- run_deepSTRAPP_over_time(
contMap = Ponerinae_contMap,
ace = Ponerinae_ACE,
tip_data = Ponerinae_cont_tip_data,
trait_data_type = "continuous",
BAMM_object = Ponerinae_BAMM_object_old_calib,
# nb_time_steps = nb_time_steps,
time_range = time_range,
time_step_duration = time_step_duration,
return_perm_data = TRUE,
extract_trait_data_melted_df = TRUE,
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly trait data output
data(Ponerinae_deepSTRAPP_cont_old_calib_0_40, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Explore output
str(Ponerinae_deepSTRAPP_cont_old_calib_0_40, max.level = 1)
# Display test summary
# Can be passed down to [deepSTRAPP::plot_STRAPP_pvalues_over_time()] to generate a plot
# showing the evolution of the test results across time.
Ponerinae_deepSTRAPP_cont_old_calib_0_40$pvalues_summary_df
# Access trait data in a melted data.frame
head(Ponerinae_deepSTRAPP_cont_old_calib_0_40$trait_data_df_over_time)
# Access the diversification data in a melted data.frame
head(Ponerinae_deepSTRAPP_cont_old_calib_0_40$diversification_data_df_over_time)
# Access deepSTRAPP results
str(Ponerinae_deepSTRAPP_cont_old_calib_0_40$STRAPP_results, max.level = 2)
## Visualize updated phylogenies
# (May take time to plot)
# Plot updated contMap for time step n°2
deepSTRAPP_outputs <- Ponerinae_deepSTRAPP_cont_old_calib_0_40
contMap_step2 <- deepSTRAPP_outputs$updated_trait_data_with_Map_over_time[[2]]
plot_contMap(contMap_step2$contMap, ftype = "off")
# Plot diversification rates on updated phylogeny for time step n°2
BAMM_object_step2 <- deepSTRAPP_outputs$updated_BAMM_objects_over_time[[2]]
plot_BAMM_rates(BAMM_object = BAMM_object_step2,
legend = TRUE, labels = FALSE,
colorbreaks = BAMM_object_step2$initial_colorbreaks$net_diversification)
## Visualize test results
# (May take time to plot)
# Plot p-values of Spearman tests across all time-steps
plot_STRAPP_pvalues_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
alpha = 0.10)
# Plot evolution of mean rates through time
plot_rates_through_time(deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40)
# Plot rates vs. trait values across branches for time step = 10 My
plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
focal_time = 10)
# Plot histogram of Spearman test stats for time step = 10 My
plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
focal_time = 10)
# Plot histograms of Spearman test results (One plot per time-step)
plot_histograms_STRAPP_tests_over_time(
deepSTRAPP_outputs = Ponerinae_deepSTRAPP_cont_old_calib_0_40,
display_plots = TRUE)
# ----- Example 2: Categorical trait ----- #
## Load data
# Load trait df
data(Ponerinae_trait_tip_data, package = "deepSTRAPP")
# Load phylogeny
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare trait data
# Extract categorical data with 3-levels
Ponerinae_cat_3lvl_tip_data <- setNames(object = Ponerinae_trait_tip_data$fake_cat_3lvl_tip_data,
nm = Ponerinae_trait_tip_data$Taxa)
table(Ponerinae_cat_3lvl_tip_data)
# Select color scheme for states
colors_per_states <- c("forestgreen", "sienna", "goldenrod")
names(colors_per_states) <- c("arboreal", "subterranean", "terricolous")
# (May take several minutes to run)
## Produce densityMaps using stochastic character mapping based on an ARD Mk model
Ponerinae_cat_3lvl_data_old_calib <- prepare_trait_data(
tip_data = Ponerinae_cat_3lvl_tip_data,
phylo = Ponerinae_tree_old_calib,
trait_data_type = "categorical",
colors_per_levels = colors_per_states,
evolutionary_models = "ARD",
nb_simulations = 100,
return_best_model_fit = TRUE,
return_model_selection_df = TRUE,
plot_map = FALSE)
# Load directly trait data output
data(Ponerinae_cat_3lvl_data_old_calib, package = "deepSTRAPP")
## Set for time steps of 5 My. Will generate deepSTRAPP workflows for 0 to 40 Mya.
# nb_time_steps <- 5
time_step_duration <- 5
time_range <- c(0, 40)
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates across time-steps.
Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40 <- run_deepSTRAPP_over_time(
densityMaps = Ponerinae_cat_3lvl_data_old_calib$densityMaps,
ace = Ponerinae_cat_3lvl_data_old_calib$ace,
tip_data = Ponerinae_cat_3lvl_tip_data,
trait_data_type = "categorical",
BAMM_object = Ponerinae_BAMM_object_old_calib,
# nb_time_steps = nb_time_steps,
time_range = time_range,
time_step_duration = time_step_duration,
rate_type = "net_diversification",
seed = 1234,
alpha = 0.10, # Select a generous level of significance for the sake of the example
posthoc_pairwise_tests = TRUE,
return_perm_data = TRUE,
extract_trait_data_melted_df = TRUE,
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly deepSTRAPP output
data(Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40, package = "deepSTRAPP")
deepSTRAPP_outputs <- Ponerinae_deepSTRAPP_cat_3lvl_old_calib_0_40
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Explore output
str(deepSTRAPP_outputs, max.level = 1)
# Display test summaries
# Can be passed down to [deepSTRAPP::plot_STRAPP_pvalues_over_time()] to generate a plot
# showing the evolution of the test results across time.
deepSTRAPP_outputs$pvalues_summary_df
# Results for posthoc pairwise Dunn's tests over time-steps
deepSTRAPP_outputs$pvalues_summary_df_for_posthoc_pairwise_tests
# Access trait data in a melted data.frame
head(deepSTRAPP_outputs$trait_data_df_over_time)
# Access the diversification data in a melted data.frame
head(deepSTRAPP_outputs$diversification_data_df_over_time)
# Access details of deepSTRAPP results
str(deepSTRAPP_outputs$STRAPP_results, max.level = 2)
## Visualize updated phylogenies
# (May take time to plot)
# Plot updated densityMaps for time step n°2 = 10My
densityMaps_10My <- deepSTRAPP_outputs$updated_trait_data_with_Map_over_time[[2]]
plot_densityMaps_overlay(densityMaps_10My$densityMaps)
# Plot diversification rates on updated phylogeny for time step n°2
BAMM_object_10My <- deepSTRAPP_outputs$updated_BAMM_objects_over_time[[2]]
plot_BAMM_rates(BAMM_object = BAMM_object_10My,
legend = TRUE, labels = FALSE,
colorbreaks = BAMM_object_10My$initial_colorbreaks$net_diversification)
## Visualize test results
# (May take time to plot)
# Plot p-values of overall Kruskal-Wallis test across all time-steps
plot_STRAPP_pvalues_over_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
alpha = 0.10)
# Plot p-values of post hoc pairwise Dunn's tests between pairs of tests across all time-steps
plot_STRAPP_pvalues_over_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
alpha = 0.10,
plot_posthoc_tests = TRUE)
# Plot evolution of mean rates through time
plot_rates_through_time(deepSTRAPP_outputs = deepSTRAPP_outputs,
colors_per_levels = colors_per_states)
# Plot rates vs. trait values across branches for time step n°2 = 10 My
plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
focal_time = 10,
colors_per_levels = colors_per_states)
# Plot histogram of overall Kruskal-Wallis test for time step n°2 = 10 My
plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
focal_time = 10)
# Plot histograms of overall Kruskal-Wallis test results across all time-steps
# (One plot per time-step)
plot_histograms_STRAPP_tests_over_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
plot_posthoc_tests = FALSE)
# Plot histograms of post hoc pairwise Dunn's test results across all time-steps
# (One multifaceted plot per time-step)
plot_histograms_STRAPP_tests_over_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
plot_posthoc_tests = TRUE)
# ----- Example 3: Biogeographic ranges ----- #
## Load data
# Load phylogeny
data(Ponerinae_tree_old_calib, package = "deepSTRAPP")
# Load trait df
data(Ponerinae_binary_range_table, package = "deepSTRAPP")
# Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object_old_calib, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Prepare range data for Old World vs. New World
# No overlap in ranges
table(Ponerinae_binary_range_table$Old_World, Ponerinae_binary_range_table$New_World)
Ponerinae_NO_data <- stats::setNames(object = Ponerinae_binary_range_table$Old_World,
nm = Ponerinae_binary_range_table$Taxa)
Ponerinae_NO_data <- as.character(Ponerinae_NO_data)
Ponerinae_NO_data[Ponerinae_NO_data == "TRUE"] <- "O" # O = Old World
Ponerinae_NO_data[Ponerinae_NO_data == "FALSE"] <- "N" # N = New World
names(Ponerinae_NO_data) <- Ponerinae_binary_range_table$Taxa
table(Ponerinae_NO_data)
colors_per_ranges <- c("mediumpurple2", "peachpuff2")
names(colors_per_ranges) <- c("N", "O")
# (May take several minutes to run)
## Run evolutionary models
Ponerinae_biogeo_data <- prepare_trait_data(
tip_data = Ponerinae_NO_data,
trait_data_type = "biogeographic",
phylo = Ponerinae_tree_old_calib,
evolutionary_models = "DEC+J", # Default = "DEC" for biogeographic
BioGeoBEARS_directory_path = tempdir(), # Ex: "./BioGeoBEARS_directory/"
keep_BioGeoBEARS_files = FALSE,
prefix_for_files = "Ponerinae_old_calib",
max_range_size = 2,
split_multi_area_ranges = TRUE, # Set to TRUE to use only unique areas "A" and "B"
nb_simulations = 100, # Reduce to save time (Default = '1000')
colors_per_levels = colors_per_ranges,
return_model_selection_df = TRUE,
verbose = TRUE)
# Load directly output
data(Ponerinae_biogeo_data_old_calib, package = "deepSTRAPP")
## Set for time steps of 5 My. Will generate deepSTRAPP workflows from 0 to 40 Mya.
time_range <- c(0, 40)
time_step_duration <- 10
# (May take several minutes to run)
## Run deepSTRAPP on net diversification rates for time-steps = 0 to 40 Mya.
Ponerinae_deepSTRAPP_biogeo_old_calib_0_40 <- run_deepSTRAPP_over_time(
densityMaps = Ponerinae_biogeo_data_old_calib$densityMaps,
ace = Ponerinae_biogeo_data_old_calib$ace,
tip_data = Ponerinae_ON_tip_data,
trait_data_type = "biogeographic",
BAMM_object = Ponerinae_BAMM_object_old_calib,
time_range = time_range,
time_step_duration = time_step_duration,
seed = 1234, # Set seed for reproducibility
alpha = 0.10, # Select a generous level of significance for the sake of the example
rate_type = "net_diversification",
return_perm_data = TRUE,
extract_trait_data_melted_df = TRUE,
extract_diversification_data_melted_df = TRUE,
return_STRAPP_results = TRUE,
return_updated_trait_data_with_Map = TRUE,
return_updated_BAMM_object = TRUE,
verbose = TRUE,
verbose_extended = TRUE)
## Load directly output
data(Ponerinae_deepSTRAPP_biogeo_old_calib_0_40, package = "deepSTRAPP")
deepSTRAPP_outputs <- Ponerinae_deepSTRAPP_biogeo_old_calib_0_40
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Explore output
str(deepSTRAPP_outputs, max.level = 1)
# Display test summaries
# Can be passed down to [deepSTRAPP::plot_STRAPP_pvalues_over_time()] to generate a plot
# showing the evolution of the test results across time.
deepSTRAPP_outputs$pvalues_summary_df
# Access bioregeographic range data in a melted data.frame
head(deepSTRAPP_outputs$trait_data_df_over_time)
# Access the diversification data in a melted data.frame
head(deepSTRAPP_outputs$diversification_data_df_over_time)
# Access details of deepSTRAPP results
str(deepSTRAPP_outputs$STRAPP_results, max.level = 2)
## Visualize updated phylogenies
# (May take time to plot)
# Plot updated densityMaps for time step n°2 = 10My
densityMaps_10My <- deepSTRAPP_outputs$updated_trait_data_with_Map_over_time[[2]]
plot_densityMaps_overlay(densityMaps_10My$densityMaps)
# Plot diversification rates on updated phylogeny for time step n°2
BAMM_object_10My <- deepSTRAPP_outputs$updated_BAMM_objects_over_time[[2]]
plot_BAMM_rates(BAMM_object = BAMM_object_10My,
legend = TRUE, labels = FALSE,
colorbreaks = BAMM_object_10My$initial_colorbreaks$net_diversification)
## Visualize test results
# (May take time to plot)
# Plot p-values of Mann-Whitney-Wilcoxon tests across all time-steps
plot_STRAPP_pvalues_over_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
alpha = 0.05)
# Plot evolution of mean rates through time
plot_rates_through_time(deepSTRAPP_outputs = deepSTRAPP_outputs,
colors_per_levels = colors_per_ranges)
# Plot rates vs. trait values across branches for time step n°2 = 10 My
plot_rates_vs_trait_data_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
focal_time = 10,
colors_per_levels = colors_per_ranges)
# Plot histogram of Mann-Whitney-Wilcoxon test stats for time step n°2 = 10My
plot_histogram_STRAPP_test_for_focal_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
focal_time = 10)
# Plot histograms of Mann-Whitney-Wilcoxon test stats for all time-steps (One plot per time-step)
plot_histograms_STRAPP_tests_over_time(
deepSTRAPP_outputs = deepSTRAPP_outputs,
display_plots = TRUE,
plot_posthoc_tests = FALSE)
}
Compare model fits with AICc and Akaike's weights
Description
Compare models fit with BioGeoBEARS::bear_optim_run() using AICc and Akaike's weights.
Generate a data.frame summarizing information. Identify the best model and extract its results.
Usage
select_best_model_from_BioGeoBEARS(list_model_fits)
Arguments
list_model_fits |
Named list with the results of a model fit with |
Value
The function returns a list with three elements.
-
$model_comparison_dfData.frame summarizing information to compare model fits. It includes the model name ($model), the log-likelihood ($logLik), the number of free-parameters ($k), the AIC ($AIC), the corrected AIC ($AICc), the delta to the best/lowest AICc ($delta_AICc), the Akaike weights ($Akaike_weights), and their rank based on AICc ($rank). -
$best_model_nameCharacter string. Name of the best model. -
$best_model_fitList containing the output ofBioGeoBEARS::bear_optim_run()for the model with the best fit.
Author(s)
Maël Doré
See Also
BioGeoBEARS::bear_optim_run() BioGeoBEARS::get_LnL_from_BioGeoBEARS_results_object() BioGeoBEARS::AICstats_2models()
Examples
if (deepSTRAPP::is_dev_version())
{
## The R package 'BioGeoBEARS' is needed for this function to work with biogeographic data.
# Please install it manually from: https://github.com/nmatzke/BioGeoBEARS.
# Load phylogeny and tip data
library(phytools)
data(eel.tree)
data(eel.data)
# Transform feeding mode data into biogeographic data with ranges A, B, and AB.
eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
eel_data <- as.character(eel_data)
eel_data[eel_data == "bite"] <- "A"
eel_data[eel_data == "suction"] <- "B"
eel_data[c(5, 6, 7, 15, 25, 32, 33, 34, 50, 52, 57, 58, 59)] <- "AB"
eel_data <- stats::setNames(eel_data, rownames(eel.data))
table(eel_data)
colors_per_levels <- c("dodgerblue3", "gold")
names(colors_per_levels) <- c("A", "B")
# (May take several minutes to run)
## Prepare phylo
# Set path to BioGeoBEARS directory
BioGeoBEARS_directory_path = "./BioGeoBEARS_directory/"
# Export phylo
path_to_phylo <- BioGeoBEARS::np(paste0(BioGeoBEARS_directory_path, "phylo.tree"))
write.tree(phy = eel.tree, file = path_to_phylo)
## Prepare range data
tip_data <- eel_data
# Convert tip_data to df
ranges_df <- as.data.frame(tip_data)
# Get list of all unique areas
unique_areas_in_ranges_list <- strsplit(x = tip_data, split = "")
unique_areas <- unique(unlist(unique_areas_in_ranges_list))
unique_areas <- unique_areas[order(unique_areas)]
# Loop per unique areas
for (i in seq_along(unique_areas))
{
# i <- 1
# Extract unique area
unique_area_i <- unique_areas[i]
# Detect presence in ranges
binary_match_i <- unlist(lapply(X = unique_areas_in_ranges_list,
FUN = function (x) { unique_area_i %in% x } ))
# Add to ranges_df
ranges_df <- cbind(ranges_df, binary_match_i)
}
# Extract binary df of presence/absence
binary_df <- ranges_df[, -1]
# Convert character strings into numerical factors
binary_df_num <- as.data.frame(apply(X = binary_df, MARGIN = 2, FUN = as.numeric))
row.names(binary_df_num) <- names(tip_data)
names(binary_df_num) <- unique_areas
# Produce tipranges object from numeric df
Taxa_bioregions_tipranges_obj <- BioGeoBEARS::define_tipranges_object(tmpdf = binary_df_num)
# Set path to tip ranges object
path_to_tip_ranges <- BioGeoBEARS::np(paste0(BioGeoBEARS_directory_path,"tip_ranges.data"))
# Export tip ranges in Lagrange/PHYLIP format
BioGeoBEARS::save_tipranges_to_LagrangePHYLIP(
tipranges_object = Taxa_bioregions_tipranges_obj,
lgdata_fn = path_to_tip_ranges,
areanames = colnames(Taxa_bioregions_tipranges_obj@df))
## Prepare models
# Prepare DEC model run
DEC_run <- BioGeoBEARS::define_BioGeoBEARS_run(
num_cores_to_use = 1,
max_range_size = 2, # To set the maximum number of bioregion encompassed
# by a lineage range at any time
trfn = path_to_phylo, # To provide path to the input tree file
geogfn = path_to_tip_ranges, # To provide path to the LagrangePHYLIP file with binary ranges
return_condlikes_table = TRUE, # To ask to obtain all marginal likelihoods
# computed by the model and used to display ancestral states
print_optim = TRUE)
# Check that starting parameter values are inside the min/max
DEC_run <- BioGeoBEARS::fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object = DEC_run)
# Check validity of set-up before run
BioGeoBEARS::check_BioGeoBEARS_run(DEC_run)
# Use the DEC model as template for DEC+J model
DEC_J_run <- DEC_run
# Update status of jump speciation parameter to be estimated
DEC_J_run$BioGeoBEARS_model_object@params_table["j","type"] <- "free"
# Set initial value of J for optimization to an arbitrary low non-null value
j_start <- 0.0001
DEC_J_run$BioGeoBEARS_model_object@params_table["j","init"] <- j_start
DEC_J_run$BioGeoBEARS_model_object@params_table["j","est"] <- j_start
# Check validity of set-up before run
DEC_J_run <- BioGeoBEARS::fix_BioGeoBEARS_params_minmax(BioGeoBEARS_run_object = DEC_J_run)
invisible(BioGeoBEARS::check_BioGeoBEARS_run(DEC_J_run))
## Run models
# Run DEC model
DEC_fit <- BioGeoBEARS::bears_optim_run(DEC_run)
# Set starting values for optimization of DEC+J model based on MLE in the DEC model
d_start <- DEC_fit$outputs@params_table["d","est"]
e_start <- DEC_fit$outputs@params_table["e","est"]
DEC_J_run$BioGeoBEARS_model_object@params_table["d","init"] <- d_start
DEC_J_run$BioGeoBEARS_model_object@params_table["d","est"] <- d_start
DEC_J_run$BioGeoBEARS_model_object@params_table["e","init"] <- e_start
DEC_J_run$BioGeoBEARS_model_object@params_table["e","est"] <- e_start
# Run DEC+J model
DEC_J_fit <- BioGeoBEARS::bears_optim_run(DEC_J_run)
## Store model outputs
list_model_fits <- list(DEC = DEC_fit, DEC_J = DEC_J_fit)
## Compare models
model_comparison_output <- select_best_model_from_BioGeoBEARS(list_model_fits = list_model_fits)
# Explore output
str(model_comparison_output, max.level = 1)
# Print comparison
print(model_comparison_output$models_comparison_df)
# Print best model fit
print(model_comparison_output$best_model_fit$outputs)
## Clean local files
unlink(BioGeoBEARS_directory_path, recursive = TRUE, force = TRUE)
}
Compare trait evolutionary model fits with AICc and Akaike's weights
Description
Compare evolutionary models fit with geiger::fitContinuous() or geiger::fitDiscrete()
using AICc and Akaike's weights. Generate a data.frame summarizing information.
Identify the best model and extract its results.
Usage
select_best_trait_model_from_geiger(list_model_fits)
Arguments
list_model_fits |
Named list with the results of a model fit with |
Value
The function returns a list with three elements.
-
$model_comparison_dfData.frame summarizing information to compare model fits. It includes the model name ($model), the log-likelihood ($logLik), the number of free-parameters ($k), the AIC ($AICc), the corrected AIC ($AICc), the delta to the best/lowest AICc ($delta_AICc), the Akaike weights ($Akaike_weights), and their rank based on AICc ($rank). -
$best_model_nameCharacter string. Name of the best model. -
$best_model_fitList containing the output ofgeiger::fitContinuous()orgeiger::fitDiscrete()for the model with the best fit.
Author(s)
Maël Doré
See Also
geiger::fitContinuous() geiger::fitDiscrete()
Examples
# ----- Example 1: Continuous data ----- #
# Load phylogeny and tip data
library(phytools)
data(eel.tree)
data(eel.data)
# Extract body size
eel_data <- stats::setNames(eel.data$Max_TL_cm,
rownames(eel.data))
# (May take several minutes to run)
# Fit BM model
BM_fit <- geiger::fitContinuous(phy = eel.tree, dat = eel_data, model = "BM")
# Fit EB model
EB_fit <- geiger::fitContinuous(phy = eel.tree, dat = eel_data, model = "EB")
# Fit OU model
OU_fit <- geiger::fitContinuous(phy = eel.tree, dat = eel_data, model = "OU")
# Store models
list_model_fits <- list(BM = BM_fit, EB = EB_fit, OU = OU_fit)
# Compare models
model_comparison_output <- select_best_trait_model_from_geiger(list_model_fits = list_model_fits)
# Explore output
str(model_comparison_output, max.level = 2)
# Print comparison
print(model_comparison_output$models_comparison_df)
# Print best model fit
print(model_comparison_output$best_model_fit)
# ----- Example 2: Categorical data ----- #
# Load phylogeny and tip data
library(phytools)
data(eel.tree)
data(eel.data)
# Extract feeding mode data
eel_data <- stats::setNames(eel.data$feed_mode, rownames(eel.data))
table(eel_data)
# (May take several minutes to run)
# Fit ER model
ER_fit <- geiger::fitDiscrete(phy = eel.tree, dat = eel_data, model = "ER")
# Fit ARD model
ARD_fit <- geiger::fitDiscrete(phy = eel.tree, dat = eel_data, model = "ARD")
# Store models
list_model_fits <- list(ER = ER_fit, ARD = ARD_fit)
# Compare models
model_comparison_output <- select_best_trait_model_from_geiger(list_model_fits = list_model_fits)
# Explore output
str(model_comparison_output, max.level = 2)
# Print comparison
print(model_comparison_output$models_comparison_df)
# Print best model fit
print(model_comparison_output$best_model_fit)
Update diversification rates/regimes mapped on a phylogeny up to a given time in the past
Description
Updates an object of class "bammdata" to obtain the diversification rates/regimes
found along branches at a specific time in the past (i.e. the focal_time).
Optionally, the function can update the object to display a mapped phylogeny such as
branches overlapping the focal_time are shorten to the focal_time.
Usage
update_rates_and_regimes_for_focal_time(
BAMM_object,
focal_time,
update_rates = TRUE,
update_regimes = TRUE,
update_tree = FALSE,
update_plot = FALSE,
update_all_elements = FALSE,
keep_tip_labels = TRUE,
verbose = TRUE
)
Arguments
BAMM_object |
Object of class |
focal_time |
Numerical. The time, in terms of time distance from the present, at which the tree and rate mapping must be cut. It must be smaller than the root age of the phylogeny. |
update_rates |
Logical. Specify whether diversification rates stored in
|
update_regimes |
Logical. Specify whether diversification regimes stored in
|
update_tree |
Logical. Specify whether to update the phylogeny such as
all branches that are younger than the |
update_plot |
Logical. Specify whether to update the phylogeny AND the elements
used by |
update_all_elements |
Logical. Specify whether to update all the elements in the object, including
rates/regimes/phylogeny/elements for |
keep_tip_labels |
Logical. Should terminal branches with a single descendant tip
retain their initial |
verbose |
Logical. Should progression be displayed? A message will be printed for every batch of
100 BAMM posterior samples updated. Default is |
Details
The object of class "bammdata" (BAMM_object) is cut-off at a specific time in the past
(i.e. the focal_time) and the current diversification rate values of the overlapping edges/branches are extracted.
—– Update diversification rate data —–
If update_rates = TRUE, diversification rates of branches overlapping with focal_time
will be updated. Each cut-off branches form a new tip present at focal_time with updated associated
diversification rate data. Fossils older than focal_time do not yield any data.
-
$tipLambdacontains speciation rates. -
$tipMucontains extinction rates.
If update_regimes = TRUE, diversification regimes of branches overlapping with focal_time
will be updated. Each cut-off branches form a new tip present at focal_time with updated associated
diversification regime ID found in $tipStates. Fossils older than focal_time do not yield any data.
—– Update the phylogeny —–
If update_tree = TRUE, elements defining the phylogeny, as in an "phylo" object
will be updated such as all branches that are younger than the focal_time are cut-off:
-
$edgedefines the tree topology. -
$Nnodedefines the number of internal nodes. -
$tip.labelprovides the labels of all tips, including fossils older thanfocal_timeif present. -
$edge.lengthprovides length of all branches. -
$node.labelprovides the labels of all internal nodes. (Optional)
—– Update the plot from plot_BAMM_rates() —–
If update_plot = TRUE, elements used to plot diversification rates on the phylogeny
using plot_BAMM_rates() will be updated such as all branches that are younger
than the focal_time are cut-off:
-
$beginprovides absolute time since root of edge/branch start (rootward). -
$endprovides absolute time since root of edge/branch end (tipward). -
$eventVectorsprovides regime membership per branches in each posterior sample configuration. -
$eventBranchSegsprovides regime membership per segments of branches in each posterior sample configuration. -
$dtratesprovides mean speciation and extinction rates along segments of branches, and resolution fraction (tau) describing the fraction of each segment length compared to the full depth of the initial tree (i.e., the root_age).
Value
The function returns a list as an updated BAMM_object of class "bammdata".
Phylogeny-related elements used to plot a phylogeny with ape::plot.phylo():
-
$edgeMatrix of integers. Defines the tree topology by providing rootward and tipward node ID of each edge. -
$NnodeInteger. Number of internal nodes. -
$tip.labelVector of character strings. Labels of all tips, including fossils older thanfocal_timeif present.If
keep_tip_labels = TRUE, cut-off branches with a single descendant tip retain their initialtip.label.If
keep_tip_labels = FALSE, all cut-off branches are labeled using their tipward node ID.
-
$edge.lengthVector of numerical. Length of edges/branches. -
$node.labelVector of character strings. Labels of all internal nodes. (Present only if present in the initialBAMM_object)
BAMM internal elements used for tree exploration:
-
$beginVector of numerical. Absolute time since root of edge/branch start (rootward). -
$endVector of numerical. Absolute time since root of edge/branch end (tipward). -
$downseqVector of integers. Order of node visits when using a pre-order tree traversal. -
$lastvisitID of the last node visited when starting from the node in the corresponding position in$downseq.
BAMM elements summarizing diversification data:
-
$numberEventsVector of integer. Number of events/macroevolutionary regimes (k+1) recorded in each posterior configuration. k = number of shifts. -
$eventDataList of data.frames. One per posterior sample. Records shift events and macroevolutionary regimes parameters. 1st line = Background root regime. -
$eventVectorsList of integer vectors. One per posterior sample. Record regime ID per branches. -
$tipStatesList of named integer vectors. One per posterior sample. Record regime ID per tips present atfocal_time. Updated ifupdate_regimes = TRUE. -
$tipLambdaList of named numerical vectors. One per posterior sample. Record speciation rates per tips present atfocal_time. Updated ifupdate_rates = TRUE. -
$tipMuList of named numerical vectors. One per posterior sample. Record extinction rates per tips present atfocal_time. Updated ifupdate_rates = TRUE. -
$eventBranchSegsList of matrix of numerical. One per posterior sample. Record regime ID per segments of branches. -
$meanTipLambdaVector of named numerical. Mean tip speciation rates across all posterior configurations of tips present atfocal_time(does not includes older fossils). -
$meanTipMuVector of named numerical. Mean tip extinction rates across all posterior configurations of tips present atfocal_time(does not includes older fossils). -
$typeCharacter string. Set the type of data modeled with BAMM. Should be "diversification".
Additional elements providing key information for downstream analyses:
-
$expectedNumberOfShiftsInteger. The expected number of regime shifts used to set the prior in BAMM. -
$MSP_treeObject of classphylo. List of 4 elements duplicating information from the Phylogeny-related elements above, except$MSP_tree$edge.lengthis recording the Marginal Shift Probability of each branch (i.e., the probability of a regime shift to occur along each branch) whose origin is older thatfocal_time. -
$MAP_indicesVector of integers. The indices of the Maximum A Posteriori probability (MAP) configurations among the posterior samples. -
$MAP_BAMM_object. List of 18 elements of class"bammdata" recording the mean rates and regime shift locations found across the Maximum A Posteriori probability (MAP) configuration. All BAMM elements summarizing diversification data holds a single entry describing this the mean diversification history, updated for the focal_time'. -
$MSC_indicesVector of integers. The indices of the Maximum Shift Credibility (MSC) configurations among the posterior samples. -
$MSC_BAMM_objectList of 18 elements of class"bammdata" recording the mean rates and regime shift locations found across the Maximum Shift Credibility (MSC) configurations. All BAMM elements summarizing diversification data holds a single entry describing this mean diversification history, updated for the focal_time'.
New elements added to provide update information:
-
$root_ageInteger. Stores the age of the root of the tree. -
$nodes_ID_dfData.frame with two columns. Provides the conversion from thenew_node_IDto theinitial_node_ID. Each row is a node. -
$initial_nodes_IDVector of character strings. Provides the initial ID of internal nodes. Used to plot internal node IDs as labels withape::nodelabels(). -
$edges_ID_dfData.frame with two columns. Provides the conversion from thenew_edge_IDto theinitial_edge_ID. Each row is an edge/branch. -
$initial_edges_IDVector of character strings. Provides the initial ID of edges/branches. Used to plot edge/branch IDs as labels withape::edgelabels(). -
$dtratesList of three elements.1/
$dtrates$tauNumerical. Resolution factor describing the fraction of each segment length used inplot_BAMM_rates()compared to the full depth of the initial tree (i.e., the root_age)2/
$dtrates$ratesList of two numerical vectors. Speciation and extinction rates along segments used byplot_BAMM_rates().3/
$dtrates$tmatMatrix of numerical. Start and end times of segments in term of distance to the root.
-
$initial_colorbreaksList of three vectors of numerical. Rate values of the percentiles delimiting the bins for mapping rates to colors withBAMMtools::plot.bammdata(). Each element provides values for different type of rates ($speciation,$extinction,$net_diversification). -
$focal_timeInteger. The time, in terms of time distance from the present, at which the rates/regimes were extracted and the tree was eventually cut.
Author(s)
Maël Doré
See Also
cut_phylo_for_focal_time() plot_BAMM_rates()
Examples
# ----- Example 1: Extant whales (87 taxa) ----- #
## Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(whale_BAMM_object, package = "deepSTRAPP")
## Set focal-time to 5 My
focal_time = 5
# (May take several minutes to run)
## Update the BAMM object
whale_BAMM_object_5My <- update_rates_and_regimes_for_focal_time(
BAMM_object = whale_BAMM_object,
focal_time = 5,
update_rates = TRUE, update_regimes = TRUE,
update_tree = TRUE, update_plot = TRUE,
update_all_elements = TRUE,
keep_tip_labels = TRUE,
verbose = TRUE)
# Add "phylo" class to be compatible with phytools::nodeHeights()
class(whale_BAMM_object) <- unique(c(class(whale_BAMM_object), "phylo"))
root_age <- max(phytools::nodeHeights(whale_BAMM_object)[,2])
# Remove temporary "phylo" class
class(whale_BAMM_object) <- setdiff(class(whale_BAMM_object), "phylo")
## Plot initial BAMM_object for t = 0 My
plot_BAMM_rates(whale_BAMM_object, add_regime_shifts = TRUE,
labels = TRUE, legend = TRUE, cex = 0.5,
par.reset = FALSE) # Keep plotting parameters in memory to use abline().
abline(v = root_age - focal_time,
col = "red", lty = 2, lwd = 2)
## Plot updated BAMM_object for t = 5 My
plot_BAMM_rates(whale_BAMM_object_5My, add_regime_shifts = TRUE,
labels = TRUE, legend = TRUE, cex = 0.8)
# ----- Example 2: Extant Ponerinae (1,534 taxa) ----- #
if (deepSTRAPP::is_dev_version())
{
## Load the BAMM_object summarizing 1000 posterior samples of BAMM
data(Ponerinae_BAMM_object, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Set focal-time to 10 My
focal_time = 10
# (May take several minutes to run)
## Update the BAMM object
Ponerinae_BAMM_object_10My <- update_rates_and_regimes_for_focal_time(
BAMM_object = Ponerinae_BAMM_object,
focal_time = focal_time,
update_rates = TRUE, update_regimes = TRUE,
update_tree = TRUE, update_plot = TRUE,
update_all_elements = TRUE,
keep_tip_labels = TRUE,
verbose = TRUE)
## Load results to save time
data(Ponerinae_BAMM_object_10My, package = "deepSTRAPP")
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
## Extract root age
# Add temporarily the "phylo" class to be compatible with phytools::nodeHeights()
class(Ponerinae_BAMM_object) <- unique(c(class(Ponerinae_BAMM_object), "phylo"))
root_age <- max(phytools::nodeHeights(Ponerinae_BAMM_object)[,2])
# Remove temporary "phylo" class
class(Ponerinae_BAMM_object) <- setdiff(class(Ponerinae_BAMM_object), "phylo")
## Plot diversification rates on the initial tree
plot_BAMM_rates(Ponerinae_BAMM_object,
legend = TRUE, labels = FALSE)
abline(v = root_age - focal_time,
col = "red", lty = 2, lwd = 2)
## Plot diversification rates and regime shifts on the updated tree (cut-off for 10 My)
# Keep the initial color scheme
plot_BAMM_rates(Ponerinae_BAMM_object_10My, legend = TRUE, labels = FALSE,
colorbreaks = Ponerinae_BAMM_object_10My$initial_colorbreaks$net_diversification)
# Use a new color scheme mapped on the new distribution of rates
plot_BAMM_rates(Ponerinae_BAMM_object_10My, legend = TRUE, labels = FALSE)
}
Dataset summarizing 1000 posterior samples of BAMM for extant whales
Description
An object of class "bammdata" containing information of diversification dynamics
of extant whales (Cetacea order) modeled with BAMM.
Source: Steeman, M. E., M. B. Hebsgaard, R. E. Fordyce, S. Y. W. Ho, D. L. Rabosky, R. Nielsen, C. Rahbek, H. Glenner, M. V. Sorensen, and E. Willerslev (2009) Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58, 573-585.
Usage
data(whale_BAMM_object)
Format
A list with 24 elements.
Details
An object of class "bammdata" containing information of diversification dynamics
of extant ponerine ants (Ponerinae subfamily) modeled with BAMM.
Phylogeny-related elements used to plot a phylogeny with ape::plot.phylo():
-
$edgeMatrix of integers. Defines the tree topology by providing rootward and tipward node ID of each edge. -
$NnodeInteger. Number of internal nodes. -
$tip.labelVector of character strings. Labels of all tips. -
$edge.lengthVector of numerical. Length of edges/branches.
BAMM internal elements used for tree exploration:
-
$beginVector of numerical. Absolute time since root of edge/branch start (rootward). -
$endVector of numerical. Absolute time since root of edge/branch end (tipward). -
$downseqVector of integers. Order of node visits when using a pre-order tree traversal. -
$lastvisitID of the last node visited when starting from the node in the corresponding position in$downseq.
BAMM elements summarizing diversification data:
-
$numberEventsVector of integer. Number of events/macroevolutionary regimes (k+1) recorded in each posterior configuration. k = number of shifts. -
$eventDataList of data.frames. One per posterior sample. Records shift events and macroevolutionary regimes parameters. 1st line = Background root regime. -
$eventVectorsList of integer vectors. One per posterior sample. Record regime ID per branches. -
$tipStatesList of integer vectors. One per posterior sample. Record regime ID per tips. -
$tipLambdaList of numerical vectors. One per posterior sample. Record speciation rates per tips. -
$tipMuList of numerical vectors. One per posterior sample. Record extinction rates per tips. -
$eventBranchSegsList of matrix of numerical. One per posterior sample. Record regime ID per segments of branches. -
$meanTipLambdaVector of numerical. Mean tip speciation rates across all posterior configurations of tips. -
$meanTipMuVector of numerical. Mean tip extinction rates across all posterior configurations of tips. -
$typeCharacter string. Set the type of data modeled with BAMM. Here, type = "diversification".
Additional elements providing key information for downstream analyses:
-
$expectedNumberOfShiftsInteger. The expected number of regime shifts used to set the prior in BAMM. -
$MSP_treeObject of classphylo. List of 4 elements duplicating information from the Phylogeny-related elements above, except$MSP_tree$edge.lengthis recording the Marginal Shift Probability of each branch (i.e., the probability of a regime shift to occur along each branch) -
$MAP_indicesVector of integers. The indices of the Maximum A Posteriori probability (MAP) configurations among the posterior samples. -
$MAP_BAMM_object. List of 18 elements of class '"bammdata" recording the mean rates and regime shift locations found across the Maximum A Posteriori probability (MAP) configurations. All BAMM elements summarizing diversification data holds a single entry describing this mean diversification history. -
$MSC_indicesVector of integers. The indices of the Maximum Shift Credibility (MSC) configurations among the posterior samples. -
$MSC_BAMM_objectList of 18 elements of class '"bammdata" recording the mean rates and regime shift locations found across the Maximum Shift Credibility (MSC) configurations. All BAMM elements summarizing diversification data holds a single entry describing this mean diversification history.
References
Steeman, M. E., M. B. Hebsgaard, R. E. Fordyce, S. Y. W. Ho, D. L. Rabosky, R. Nielsen, C. Rahbek, H. Glenner, M. V. Sorensen, and E. Willerslev (2009) Radiation of extant cetaceans driven by restructuring of the oceans. Systematic Biology, 58, 573-585.
See Also
BAMM software website: http://bamm-project.org/