Help for package PhylogeneticEM

Title:

Automatic Shift Detection using a Phylogenetic EM

Version:

1.8.0

Description:

Implementation of the automatic shift detection method for Brownian Motion (BM) or Ornstein–Uhlenbeck (OU) models of trait evolution on phylogenies. Some tools to handle equivalent shifts configurations are also available. See Bastide et al. (2017) <doi:10.1111/rssb.12206> and Bastide et al. (2018) <doi:10.1093/sysbio/syy005>.

Depends:

ape (≥ 5.3), Matrix (≥ 1.2.18), R (≥ 3.6.0),

Imports:

capushe (≥ 1.1.1), foreach (≥ 1.4.3), gglasso (≥ 1.4), glmnet (≥ 2.0.5), graphics (≥ 3.6.0), grDevices (≥ 3.6.0), LINselect (≥ 1.1.1), MASS (≥ 7.3.45), methods (≥ 3.6.0), plyr (≥ 1.8.4), Rcpp (≥ 1.0.2), robustbase (≥ 0.92.6), stats (≥ 3.6.0), utils (≥ 3.6.0)

Suggests:

combinat (≥ 0.0.8), doParallel (≥ 1.0.10), phytools (≥ 0.5.38), testthat (≥ 1.0.2), TreeSim (≥ 2.2), knitr, rmarkdown

LinkingTo:

Rcpp, RcppArmadillo

License:

GPL-2 | GPL-3 | file LICENSE [expanded from: GPL (≥ 2) | file LICENSE]

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.2

URL:

https://github.com/pbastide/PhylogeneticEM, https://pbastide.github.io/PhylogeneticEM/

BugReports:

https://github.com/pbastide/PhylogeneticEM/issues

VignetteBuilder:

knitr

NeedsCompilation:

yes

Packaged:

2025-02-14 17:47:49 UTC; bastidep

Author:

Paul Bastide [aut, cre], Mahendra Mariadassou [ctb]

Maintainer:

Paul Bastide <paul.bastide@m4x.org>

Repository:

CRAN

Date/Publication:

2025-02-14 23:10:02 UTC

Model Estimation with Detection of Shifts

Description

PhyloEM is the main function of the package. It uses maximum likelihood methods to fit a BM or an OU process for several traits evolving along a phylogenetic tree, with automatic shift detection on the branches of the tree. This function can handle missing data.

Usage

PhyloEM(
  phylo,
  Y_data,
  process = c("BM", "OU", "scOU", "rBM"),
  check_postorder = TRUE,
  independent = FALSE,
  K_max = max(floor(sqrt(length(phylo$tip.label))), 10),
  use_previous = FALSE,
  order = TRUE,
  method.selection = c("LINselect", "DDSE", "Djump"),
  C.BM1 = 0.1,
  C.BM2 = 2.5,
  C.LINselect = 1.1,
  method.variance = c("upward_downward", "simple"),
  method.init = "lasso",
  method.init.alpha = "estimation",
  method.init.alpha.estimation = c("regression", "regression.MM", "median"),
  methods.segmentation = c("lasso", "best_single_move"),
  alpha_grid = TRUE,
  nbr_alpha = 10,
  random.root = TRUE,
  stationary.root = random.root,
  alpha = NULL,
  check.tips.names = TRUE,
  progress.bar = TRUE,
  estimates = NULL,
  save_step = FALSE,
  rescale_OU = TRUE,
  parallel_alpha = FALSE,
  Ncores = 3,
  K_lag_init = 5,
  light_result = TRUE,
  tol_tree = .Machine$double.eps^0.5,
  allow_negative = FALSE,
  option_is.ultrametric = 1,
  trait_correlation_threshold = 0.9,
  ...
)

Arguments

phylo

A phylogenetic tree of class phylo (from package ape).

Y_data

Matrix of data at the tips, size p x ntaxa. Each line is a trait, and each column is a tip. The column names are checked against the tip names of the tree.

process

The model used for the fit. One of "BM" (for a full BM model, univariate or multivariate); "OU" (for an OU with independent traits, univariate or multivariate); or "scOU" (for a "scalar OU" model, see details).

check_postorder

Re-order the tree in post-order. If the Upward-Downward algorithm is used, the tree need to be in post-order. Default to TRUE if the upward-downward is used, otherwise automatically set to FALSE.

independent

Are the trait assumed to be independent from one another? Default to FALSE. OU in a multivariate setting only works if TRUE.

K_max

The maximum number of shifts to be considered. Default to max(|\sqrt ntaxa|, 10).

use_previous

Should the initialization for K+1 shifts use the estimation for $K$ shifts already obtained? Default to FALSE.

order

Should the estimations be done for K increasing (TRUE) or K decreasing (FALSE)? If use_previous=FALSE, this has no influence, except if one initialization fails. Default to TRUE.

method.selection

Method selection to be used. Several ones can be used at the same time. One of "LINselect" for the Baraud Giraud Huet LINselect method; "DDSE" for the Slope Heuristic or "Djump" for the Jump Heuristic, last two based the Birgé Massart method.

C.BM1

Multiplying constant to be used for the BigeMassart1 method. Need to be positive. Default to 0.1.

C.BM2

Multiplying constant to be used for the BigeMassart2 method. Default to 2.5.

C.LINselect

Multiplying constant to be used for the LINselect method. Need to be greater than 1. Default to 1.1.

method.variance

Algorithm to be used for the moments computations at the E step. One of "simple" for the naive method; of "upward_downward" for the Upward Downward method (usually faster). Default to "upward_downward".

method.init

The initialization method. One of "lasso" for the LASSO base initialization method; or "default" for user-specified initialization values. Default to "lasso".

method.init.alpha

For OU model, initialization method for the selection strength alpha. One of "estimation" for a cherry-based initialization, using nlrob; or "default" for user-specified initialization values. Default to "estimation".

method.init.alpha.estimation

If method.init.alpha="estimation", choice of the estimation(s) methods to be used. Choices among "regression", (method="M" is passed to nlrob); "regression.MM" (method="MM" is passed to nlrob) or "median" (nlrob is not used, a simple median is taken). Default to all of them.

methods.segmentation

For OU, method(s) used at the M step to find new candidate shifts positions. Choices among "lasso" for a LASSO-based algorithm; and "best_single_move" for a one-move at a time based heuristic. Default to both of them. Using only "lasso" might speed up the function a lot.

alpha_grid

whether to use a grid for alpha values. Default to TRUE. This is the only available method for scOU. This method is not available for OU with multivariate traits. OU with univariate traits can take both TRUE or FALSE. If TRUE, a grid based on the branch length of the tree is automatically computed, using function find_grid_alpha.

nbr_alpha

If alpha_grid=TRUE, the number of alpha values on the grid. Default to 10.

random.root

whether the root is assumed to be random (TRUE) of fixed (FALSE). Default to TRUE

stationary.root

whether the root is assumed to be in the stationary state. Default to TRUE.

alpha

If the estimation is done with a fixed alpha (either known, or on a grid), the possible value for alpha. Default to NULL.

check.tips.names

whether to check the tips names of the tree against the column names of the data. Default to TRUE.

progress.bar

whether to display a progress bar of the computations. Default to TRUE.

estimates

The result of a previous run of this same function. This function can be re-run for other model election method. Default to NULL.

save_step

If alpha_grid=TRUE, whether to save the intermediate results for each value of alpha (in a temporary file). Useful for long computations. Default to FALSE.

rescale_OU

For the Univariate OU, should the tree be re-scaled to use a BM ? This can speed up the computations a lot. However, it can make it harder for the EM to explore the space of parameters, and hence lead to a sub-optimal solution. Default to TRUE.

parallel_alpha

If alpha_grid=TRUE, whether to run the estimations with different values of alpha on separate cores. Default to FALSE. If TRUE, the log is written as a temporary file.

Ncores

If parallel_alpha=TRUE, number of cores to be used.

K_lag_init

Number of extra shifts to be considered at the initialization step. Increases the accuracy, but can make computations quite slow of taken too high. Default to 5.

light_result

if TRUE (the default), the object returned is made light, without easily computable quantities. If FALSE, the object can be very heavy, but its subsequent manipulations can be faster (especially for plotting).

tol_tree

tolerance to consider a branch length significantly greater than zero, or two lineages lengths to be different, when checking for ultrametry. (Default to .Machine$double.eps^0.5). See is.ultrametric and di2multi.

allow_negative

whether to allow negative values for alpha (Early Burst). See details. Default to FALSE.

option_is.ultrametric

option for is.ultrametric check. Default to 1.

trait_correlation_threshold

the trait correlation threshold to stop the analysis. Default to 0.9.

...

Further arguments to be passed to estimateEM, including tolerance parameters for stopping criteria, maximal number of iterations, etc.

Details

Several models can be used:

BM with fixed root, univariate or multivariate.
OU with fixed or stationary root, univariate or multivariate.

For the OU in the multivariate setting, two assumptions can be made:

Independent traits. This amounts to diagonal rate and selection matrices.
"Scalar OU" (scOU): the rate matrix can be full, but the selection strength matrix is assumed to be scalar, i.e. all the traits are supposed to go to their optimum values with the same speed.

Note that the "scalar OU" model can also be seen as a re-scaling of the tree. The selection strength parameter alpha can then be interpreted as a measure of the "phylogenetic signal":

If alpha is close to 0, then the process is similar to a BM on the original tree, and the signal is strong.
If alpha is large, then the re-scaled tree is similar to a star-tree, and the signal is weak.

When there are no shifts, and the root is taken to be constant, this model is actually equivalent to an AC model (Uyeda et al. 2015). With this interpretation in mind, one might want to explore negative values of alpha, in order to fit a DC (or Early Burst) model. With no shift and a fixed root, the same proof shows that the scOU with alpha negative is equivalent to the DC model. There are two strong caveats in doing that.

The interpretation of the OU as modeling the dynamic of a trait undergoing stabilizing selection is lost. In this case, the scOU can only be seen as a re-scaling of the tree, similar to Pagel's delta.
The values of the "optimal values", and of the shifts on them, cannot be interpreted as such (the process is actually going away from this values, instead of being attracted). When looking at these values, one should only use the un-normalized values happening of the underlying BM. You can extract those using the params_process function with rBM = TRUE.

Value

An object of class PhyloEM. Relevant quantities can be extracted from it using helper functions params_process.PhyloEM, imputed_traits.PhyloEM

Examples

## Not run: 
## Load Data
data(monkeys)
## Run method
# Note: use more alpha values for better results.
res <- PhyloEM(Y_data = monkeys$dat,        ## data
               phylo = monkeys$phy,         ## phylogeny
               process = "scOU",            ## scalar OU
               random.root = TRUE,          ## root is stationary
               stationary.root = TRUE,
               K_max = 10,                  ## maximal number of shifts
               nbr_alpha = 4,               ## number of alpha values
               parallel_alpha = TRUE,       ## parallelize on alpha values
               Ncores = 2)
## Plot selected solution (LINselect)
plot(res) # three shifts
## Plot selected solution (DDSE)
plot(res, method.selection = "DDSE") # no shift
## Extract and solution with 5 shifts
params_5 <- params_process(res, K = 5)
plot(res, params = params_5)
## Show all equivalent solutions
eq_sol <- equivalent_shifts(monkeys$phy, params_5)
plot(eq_sol)

## End(Not run)

Add several entries, when only one is not NA.

Description

add_complementary return the only non NA value of a given vector.

Usage

add_complementary(z)

Arguments

z

a vector containing at most only one non-NA value.

Details

This function is used in function merge_complementary_vectors

Value

row vector of the same size as entry matrix.

Allocation of regimes to nodes.

Description

allocate_regimes_from_shifts allocate a number (from 0 to the number of shifts) to each node, corresponding to its regime : all nodes below shift i are numbered by i.

Usage

allocate_regimes_from_shifts(phylo, shifts_edges)

Arguments

phylo

a phylogenetic tree, class phylo.

shifts_edges

edges were the shifts are.

Value

Vector of size (ntaxa + Nnode) of the regimes of each node and tip.

Allocation of shifts to edges

Description

allocate_shifts_from_regimes returns the position of the shifts induced by the allocation of the regimes. Only works in an "infinite site" model.

Usage

allocate_shifts_from_regimes(phylo, regimes)

Arguments

phylo

a phylogenetic tree, class phylo.

regimes

: vector of size (ntaxa + Nnode) of the regimes of each node and tip.

Value

Vector of edges numbers where the shifts are.

Iteration allocation

Description

allocate_subset_node.simulate slice the data correctly and allocate the right part. To be used in function UpdateDown.

Usage

allocate_subset_node.simulate(node, array, value)

Arguments

node

node on which to slice

array

structure to be sliced

value

value to be attributed to the slice

Value

array: array p x Nnode x 2 (BM), with slice corresponding to node filled with value

Check selection strength

Description

check.selection.strength checks, if the process is an OU, if the selection strength is not too low, in which case the process is replaced with a BM.

Usage

check.selection.strength(process, selection.strength = NA, eps = 10^(-6), ...)

Arguments

process

: "BM" or "OU"

selection.strength

the selection strength parameter (if OU)

eps

the tolerance for the selection strength

Details

This function return a process in a form of a character. If the entry process is "BM", then nothing is done. If it is "OU", then the verification of the selection strength is done.

Value

character : "BM" or "OU"

Test the format of data entry.

Description

check_data tests if the data matrix has the right format, and if it is correctly ordered to match the tips names.

Usage

check_data(phylo, Y_data, check.tips.names)

Arguments

phylo

a phylogenetic tree, class phylo.

Y_data

matrix of data at the tips (pxntaxa)

check.tips.names

(bool) whether to check the tips names or not

Value

Y_data a re-ordered matrix of data (if necessary)

Check dimensions of the parameters

Description

check_dimensions checks dimensions of the parameters. If wrong, throw an error.

Usage

check_dimensions(
  p,
  root.state,
  shifts,
  variance,
  selection.strength = NULL,
  optimal.value = NULL
)

Arguments

p

dimension of the trait simulated

root.state

(list) state of the root, with: random : random state (TRUE) or deterministic state (FALSE) value.root : if deterministic, value of the character at the root exp.root : if random, expectation of the character at the root var.root : if random, variance of the character at the root

shifts

(list) position and values of the shifts : edges : vector of the K id of edges where the shifts are values : matrix p x K of values of the shifts on the edges (one column = one shift) relativeTimes : vector of dimension K of relative time of the shift from the parent node of edges

variance

variance-covariance matrix size p x p

selection.strength

matrix of selection strength size p x p (OU)

optimal.value

vector of p optimal values at the root (OU)

Value

Nothing

Check Parsimony, assuming no homoplasy

Description

check_parsimony take a vector of shifts edges, and check whether the number of groups of the tips induced by this allocation is exactly the number of shifts plus one. This is equivalent to parsimony when there is no homoplasy (i.e. no convergent regimes).

Usage

check_parsimony(tree, edges, ...)

Arguments

tree

phylogenetic tree

edges

a vector of edges of the tree, where the shifts are

...

possibly, a list giving the descendant tips of each edge

Details

This function computes explicitly the clustering of the tips, using function clusters_from_shifts. By default, this function uses enumerate_tips_under_edges to compute the list of tips under each edge, but a list can be provided (to avoid extra computation, if many tests on the same tree are done).

Value

boolean : TRUE if the allocation is parsimonious.

Check whether an allocation of the shifts is parsimonious, in the "infinite site model".

Description

check_parsimony_clusters takes a vector clusters of the tips, and checks whether the number of groups of the tips induced by this allocation is exactly the number of shifts plus one.

Usage

check_parsimony_clusters(tree, edges, clusters)

Arguments

tree

phylogenetic tree.

clusters

a vector of clusters of the tips of the tree (result of function clusters_from_shifts).

Details

This function computes explicitly the clustering of the tips, using function check_parsimony. By default, this function uses enumerate_tips_under_edges to compute the list of tips under each edge, but a list can be provided (if many tests are done).

Value

boolean : TRUE if the allocation is parsimonious.

Check range of alpha

Description

Check that the chosen values of alpha are not too large or too small, in order to avoid numerical instabilities.

Usage

check_range_alpha(alpha, h_tree)

Arguments

alpha

a vector of alpha values.

h_tree

the total height of the tree.

Clustering associated to a shift allocation, assuming no homoplasy.

Description

clusters_from_shifts take a vector of shifts edges, and gives the clustering of the tips induced by them, in a "no homoplasy" model (i.e. no convergence is allowed).

Usage

clusters_from_shifts(tree, edges, part.list = enumerate_tips_under_edges(tree))

Arguments

tree

phylogenetic tree

edges

a vector of edges of the tree, where the shifts are

part.list

a list giving the descendant tips of each edge

Details

By default, this function uses enumerate_tips_under_edges to compute the list of tips under each edge.

Value

list of size n+m-1, entry i is the vector of tips bellow edge i.

E step

Description

compute_E.simple computes the E step in the simple case where the invert matrix Sigma_YY_inv is given

Usage

compute_E.simple(
  phylo,
  times_shared,
  distances_phylo,
  process,
  params_old,
  masque_data = c(rep(TRUE, attr(params_old, "p_dim") * length(phylo$tip.label)),
    rep(FALSE, attr(params_old, "p_dim") * phylo$Nnode)),
  F_moments,
  Y_data_vec_known,
  miss = rep(FALSE, attr(params_old, "p_dim") * length(phylo$tip.label)),
  Y_data,
  U_tree,
  ...
)

Arguments

phylo

Input tree.

Details

This function takes parameters sim, Sigma and Sigma_YY_inv from compute_mean_variance.simple. It uses functions extract_variance_covariance, extract_covariance_parents, and extract_simulate_internal to extract the needed quantities from these objects.

Value

conditional_law_X (list) : list of conditional statistics : "expectation" : matrix of size p x (ntaxa+Nnode), with ntaxa first columns set to Y_data (tips), and from ntaxa+1 to conditional expectation of the nodes conditioned to the tips E[Z_j|Y] "variances" : array of size p x p x (ntaxa+Nnode) with ntaxa first matrices of zeros (tips) and conditional variance of the nodes conditioned to the tips Var[Z_j|Y] "covariances" : array of size p x p x (ntaxa+Nnode) with ntaxa first matrices of zeros (tips) and conditional covariance of the nodes and their parents conditioned to the tips Cov[Z_j,Z_pa(j)|Y], with NA for the root. "optimal.values" : matrix of size p x ntaxa+Nnode of optimal values beta(t_j)

Compute Matrix W of actualization (Ultrametric case)

Description

compute_actualization_matrix_ultrametric computes a squares p*Nedge bloc diagonal matrix of the (I_p - exp(-A * (h - t_pa(j))))_{j node}.

Usage

compute_actualization_matrix_ultrametric(
  tree,
  selection.strength,
  times_shared = compute_times_ca(tree)
)

Arguments

tree

a phylogenetic tree.

selection.strength

the selection strength of the process.

times_shared

a matrix, result of function compute_times_ca.

Details

Careful: the root is not taken into account in this function.

Value

Matrix of size p*Nedge

Computation of the optimal values at nodes and tips.

Description

compute_betas_from_shifts computes the optimal values at the nodes and tips of the tree, given the value at the root and the list of shifts occurring in the tree. It assumes an OU model.

Usage

compute_betas_from_shifts(phylo, optimal.value, shifts)

Arguments

phylo

a phylogenetic tree, class phylo.

optimal.value

the optimal value at the root of the tree.

shifts

position and values of the shifts .

Details

Note that this is intended to be an internal function, and should not be used. In general, use node_optimal_values to get optimal values from a set of parameters.

Value

Vector of size (ntaxa + Nnode) of the optimal values at the tips of the tree.

Compute differences of expectations between node and parent.

Description

compute_diff_exp compute the differences of conditional expectations between all the nodes and their parents.

Usage

compute_diff_exp.BM(phylo, conditional_law_X)

Arguments

phylo

a phylogenetic tree

conditional_law_X

result of function compute_E

Value

matrix p x Nedge containing, for each edge e finishing at node i, the quantity E[Z_i|Y]-E[Z_pa(i)|Y].

Phylogenetic Distances

Description

compute_dist_phy computes the phylogenetic distances d_ij between all the tips i, j.

Usage

compute_dist_phy(phy)

Arguments

phy

a phylogenetic tree of class phylo.

Details

This function relies on ape function dist.nodes.

Value

a matrix of phylogenetic distances, ordered as the tips of the tree. The matrix is of type symmetricMatrix-class.

Compute the expected states of a BM

Description

compute_expectations.BM use the matrix formulation to compute the expected values at all the nodes.

Usage

compute_expectations.BM(phylo, root.state, shifts, U_tree = NULL)

Arguments

phylo

Input tree.

root.state

(list): state of the root, with: random : random state (TRUE) or deterministic state (FALSE) value.root : if deterministic, value of the character at the root exp.root : if random, expectation of the character at the root var.root : if random, variance of the character at the root (pxp matrix)

shifts

Value

paramSimu: array p x Nnode x 2 (BM). For each trait t, 1 <= t <= p, paramSimu[t, , ] has two columns, both containing the expected values for all the nodes.

Compute the expected states of a scOU

Description

compute_expectations.scOU use the matrix formulation to compute the expected values at all the nodes. Assumes a stationary root.

Usage

compute_expectations.scOU(
  phylo,
  root.state,
  shifts,
  alpha,
  U_tree = NULL,
  times_shared = NULL
)

Arguments

phylo

Input tree.

root.state

shifts

Value

paramSimu: array p x Nnode x 2 (BM). For each trait t, 1 <= t <= p, paramSimu[t, , ] has two columns, both containing the expected values for all the nodes.

Compute fixed moments for E step.

Description

compute_fixed_moments compute the fixed matrices used in E step when there is no missing data.

Usage

compute_fixed_moments(times_shared, ntaxa)

Arguments

times_shared

matrix of the tree, result of function compute_times_ca

Value

F_means the matrix to use for actualization of means.

F_vars the matrix to use for the actualization of variances.

Do a lm on top of a lasso regression.

Description

compute_gauss_lasso takes the variables selected by a lasso procedure, and uses them to do a simple linear least square regression. Function used is lm for non-transformed data (root = NULL), and lm.fit for transformed data (root = an integer).

Usage

compute_gauss_lasso(
  Ypt,
  Xp,
  delta,
  root,
  projection = which(rowSums(delta) != 0)
)

Arguments

Ypt

(transformed) data

Xp

(transformed) matrix of regression

delta

regression coefficients obtained with a lasso regression

root

the position of the root (intercept) in delta

Details

Depending on the value of root, the behavior is different. If root is null, then we fit a linear regression with an intercept. If root is equal to an integer, then the "intercept" column of the matrix Xp (that has possibly been trough a multiplication with a Cholesky decomposition of the variance) is included, rather than the intercept.

Value

Named list, with "E0.gauss" the intercept (value at the root); "shifts.gauss" the list of shifts found on the branches; and "residuals" the residuals of the regression

Log Likelihood

Description

compute_log_likelihood.simple computes the log-likelihood of the data in the simple case where the inverse of the variance matrix is given.

Usage

compute_log_likelihood.simple(
  phylo,
  Y_data_vec,
  sim,
  Sigma,
  Sigma_YY_chol_inv,
  miss = rep(FALSE, dim(sim)[1] * length(phylo$tip.label)),
  masque_data = c(rep(TRUE, dim(sim)[1] * length(phylo$tip.label)), rep(FALSE,
    dim(sim)[1] * phylo$Nnode)),
  ...
)

Arguments

phylo

Input tree.

Sigma

: variance-covariance matrix, result of function compute_variance_covariance

Details

Value

log likelihood of the data

Squared Mahalanobis Distance

Description

compute_mahalanobis_distance.simple computes the squared Mahalanobis distance between the data and mean at tips of the data in the simple case where the inverse of the variance matrix is given.

Usage

compute_mahalanobis_distance.simple(
  phylo,
  Y_data_vec,
  sim,
  Sigma_YY_chol_inv,
  miss = rep(FALSE, dim(sim)[1] * length(phylo$tip.label)),
  ...
)

Arguments

phylo

Input tree.

Y_data_vec

: vector indicating the data at the tips.

sim

(list) : result of function simulate.

Sigma_YY_chol_inv

: invert of the cholesky variance-covariance matrix of the data.

Details

This function takes parameters sim and Sigma_YY_inv from compute_mean_variance.simple. It uses function extract_simulate_internal to extract the needed quantities from these objects.

Value

squared Mahalanobis distance between data and mean at the tips.

Compute moments of params_old

Description

compute_mean_variance.simple computes the quantities needed to compute mean and variance matrix with parameters params_old.

Usage

compute_mean_variance.simple(
  phylo,
  times_shared,
  distances_phylo,
  process = c("BM", "OU", "rBM", "scOU"),
  params_old,
  masque_data = c(rep(TRUE, attr(params_old, "p_dim") * length(phylo$tip.label)),
    rep(FALSE, attr(params_old, "p_dim") * phylo$Nnode)),
  sim = NULL,
  U_tree = NULL,
  ...
)

Arguments

phylo

Input tree.

times_shared

(matrix) : times of shared ancestry, result of function compute_times_ca

distances_phylo

(matrix) : phylogenetic distance, result of function compute_dist_phy

process

a two letter string indicating the process to consider

params_old

a list of parameters to be used in the computations

Details

This function is used by functions compute_E.simple and compute_log_likelihood.simple.

Value

sim (list) : result of function simulate with the appropriate parameters

Sigma matrix of variance covariance, result of function compute_variance_covariance

Sigma_YY_chol_inv invert of Cholesky matrix of Sigma_YY: (Sigma_YY)^(-1) = tcrossprod(Sigma_YY_chol_inv)

Residuals

Description

compute_residuals.simple computes the residuals after the fit of the data in the simple case where the inverse of the variance matrix is given.

Usage

compute_residuals.simple(phylo, Y_data_vec, sim, Sigma_YY_chol_inv, miss)

Arguments

phylo

Input tree.

Y_data_vec

: vector indicating the data at the tips.

sim

(list) : result of function simulate.

Sigma_YY_chol_inv

: invert of the Cholesky variance-covariance matrix of the data.

Details

This function takes parameters sim and Sigma_YY_inv from compute_mean_variance.simple. It uses function extract_simulate_internal to extract the needed quantities from these objects.

Computation of shifts from the vector of optimal values

Description

compute_shifts_from_betas computes the list of shifts corresponding to the vector of optimal values on nodes.

Usage

compute_shifts_from_betas(phylo, betas)

Arguments

phylo

a phylogenetic tree, class phylo.

betas

vector of size (ntaxa + Nnode) of the optimal values at each node and tip.

Details

This function uses function fun on each row of matrix of edges.

Value

vector of shifts.

List of potential daughter states when parent is in state k.

Description

compute_state_filter compute the admissible daughters states, i.e. states that realize the minimum cost for the tree parent -> daughter -> subtree(daughter), when the parent node is in state k.

Usage

compute_state_filter(cost, k)

Arguments

cost

a (ndaughters) x (nclus) matrix of the cost of each state for the daughters nodes.

k

the parental state considered.

Details

This function is used in functions parsimonyNumber and enumerate_parsimony.

Value

A (ndaughters) x (nclus) binary matrix indicating the admissible states for the daughters node when parent node is in state k.

Compute the stationary variance matrix

Description

compute_stationary_variance computes the stationary variance matrix of an OU process.

Usage

compute_stationary_variance(variance, selection.strength)

Arguments

variance

the variance (rate matrix) of the process.

selection.strength

the selection strength (alpha) matrix of the process.

Value

A positive definite Matrix of class dpoMatrix-class.

Compute weighted sum of var_diff

Description

compute_sum_var_diff computes sum_{e edge} ell_j * Var[X_j - X_pa(j) | Y]

Usage

compute_sum_var_diff(phylo, var_diff)

Arguments

phylo

a phylogenetic tree

var_diff

result of function compute_var_diff.BM

Value

matrix p x p

Common Ancestors Times

Description

compute_times_ca computes the times t_ij between the root and the common ancestor of two tips i, j.

Usage

compute_times_ca(phy)

Arguments

phy

a phylogenetic tree of class phylo.

Details

This function relies on ape functions node.depth.edgelength and mrca.

Value

a matrix of times of shared evolution, ordered as the tips of the tree. The matrix is of type symmetricMatrix-class.

Matrix of tree-induced correlations for the BM

Description

compute_tree_correlations_matrix.BM returns times_shared its provided argument.

Usage

compute_tree_correlations_matrix.BM(times_shared, params_old, ...)

Arguments

times_shared

times of shared ancestry of all nodes and tips, result of function compute_times_ca

Value

times_shared

Matrix of tree-induced correlations for the scOU

Description

compute_tree_correlations_matrix.scOU computes the (n+m)x(m+n) matrix of correlations induced by the tree. It takes two cases in consideration: root fixed, or root in stationary state.

Usage

compute_tree_correlations_matrix.scOU(
  times_shared,
  distances_phylo,
  params_old,
  ...
)

Arguments

times_shared

times of shared ancestry of all nodes and tips, result of function compute_times_ca

distances_phylo

(matrix) : phylogenetic distance, result of function compute_dist_phy

params_old

(list) : old parameters to be used in the E step

Value

matrix of variance covariance for the scOU

Computation of the variance.

Description

compute_var_M.BM finds the variance that is the maximum of likelihood

Usage

compute_var_M.BM(phylo, var_diff, diff_exp, edges_max, random.root, mu)

Arguments

phylo

Tree

var_diff

variances of differences result of function compute_var_diff.BM

diff_exp

differences of expectations result of function compute_diff_exp.BM

edges_max

Edges where the shifts occur result of function segmentation.BM

Details

Given the variances, the costs0 and the edges where the shifts occurs, the computation of the maximum of likelihood in the variance is simple.

Value

a p x p matrix : the computed variance

Compute variances of differences between nodes and parents.

Description

compute_var_diff computes variances of differences between all the nodes and their parents.

Usage

compute_var_diff.BM(phylo, conditional_law_X)

Arguments

phylo

a phylogenetic tree

conditional_law_X

result of function compute_E

Value

array p x p x Nedge containing, for each edge e finishing at node i, the quantity Var[Z_i-Z_pa(i)|Y].

Tips Variances for the BM

Description

compute_variance_block_diagonal.BM computes the n p*p variance matrices of each tip vector.

Usage

compute_variance_block_diagonal.BM(times_shared, params_old, ntaxa, ...)

Arguments

times_shared

times of shared ancestry of all nodes and tips, result of function compute_times_ca

params_old

(list) : old parameters to be used in the E step

Value

p * p * ntaxa array with ntaxa variance matrices

Tips Variances for the OU

Description

compute_variance_block_diagonal.OU computes the n p*p variance matrices of each tip vector.

Usage

compute_variance_block_diagonal.OU(times_shared, params_old, ntaxa, ...)

Arguments

times_shared

times of shared ancestry of all nodes and tips, result of function compute_times_ca

params_old

(list) : old parameters to be used in the E step

Value

p * p * ntaxa array with ntaxa variance matrices

Complete variance covariance matrix for BM

Description

compute_variance_covariance.BM computes the (n+m)*p squared variance covariance matrix of vec(X).

Usage

compute_variance_covariance.BM(times_shared, params_old, ...)

Arguments

times_shared

times of shared ancestry of all nodes and tips, result of function compute_times_ca

params_old

(list) : old parameters to be used in the E step

Value

matrix of variance covariance for the BM

Complete variance covariance matrix for OU

Description

compute_variance_covariance.OU computes the (n+m)*p squared variance covariance matrix of vec(X).

Usage

compute_variance_covariance.OU(times_shared, distances_phylo, params_old, ...)

Arguments

times_shared

times of shared ancestry of all nodes and tips, result of function compute_times_ca

params_old

(list) : old parameters to be used in the E step

Value

matrix of variance covariance for the OU

Complete variance covariance matrix for OU

Description

compute_variance_covariance.OU computes the (n+m)*p squared variance covariance matrix of vec(X).

Usage

compute_variance_covariance.OU.nonsym(
  times_shared,
  distances_phylo,
  params_old,
  ...
)

Arguments

times_shared

times of shared ancestry of all nodes and tips, result of function compute_times_ca

params_old

(list) : old parameters to be used in the E step

Value

matrix of variance covariance for the OU

Complete variance covariance matrix for scOU

Description

compute_variance_covariance.scOU computes the (n+m)*p squared variance covariance matrix of vec(X).

Usage

compute_variance_covariance.scOU(
  times_shared,
  distances_phylo,
  params_old,
  ...
)

Arguments

times_shared

times of shared ancestry of all nodes and tips, result of function compute_times_ca

distances_phylo

(matrix) : phylogenetic distance, result of function compute_dist_phy

params_old

(list) : old parameters to be used in the E step

Value

matrix of variance covariance for the scOU

Correspondence between edges numbers

Description

correspondenceEdges takes edges numbers on an input tree, and gives back their corresponding numbers on the output tree.

Usage

correspondenceEdges(edges, from, to)

Arguments

edges

vector of index of edges in the tree "from"

from

initial input tree (format "phylo")

to

aimed output tree (format "phylo")

Value

vector of index of edges in the tree "to"

Make the result lighter

Description

enlight.PhyloEM takes an object of class PhyloEM, and returns the same object, without saving the quantities that can be easily re-computed using function imputed_traits.PhyloEM.

Usage

enlight(x)

## S3 method for class 'PhyloEM'
enlight(x)

Arguments

x

an object of class PhyloEM.

Details

The resulting object can be much lighter, saving a lot of memory space, but each call to the function imputed_traits.PhyloEM will be longer. As function plot.PhyloEM relies on this function, this makes the plotting also longer. This has the same effect as setting the option "light_result=TRUE" in the call of PhyloEM.

Value

Same as entry, lighter.

Methods (by class)

enlight(PhyloEM): PhyloEM object

Enumerate all the possible regime allocations, given a clustering of the tips.

Description

enumerate_parsimony enumerate all the equivalent allocation of the regimes in the tree, a clustering of the tips being given. The number of such equivalent regimes is given by parsimonyNumber (which is faster).

Usage

enumerate_parsimony(phylo, clusters = rep(1, length(phylo$tip.label)))

Arguments

phylo

a phylogenetic tree, class phylo.

clusters

a vector representing the group of each tip. (Default to only one group with all the tips.)

Details

Function extract.enumerate_parsimony furnishes the result in a human readable form (for any subtree). Function plot.enumerate_parsimony plots all the solutions found on the tree.

Value

an S3 object of class "enumerate_parsimony", with:

nbrReconstructions: an object of class "parsimonyCost", result of function parsimonyCost.
allocations: a list of size Nnode + ntaxa. Each entry i of the list represents the solutions for the subtree starting at node i. It is a list with nclus entries, each entry being a matrix. A line of the kth matrix for the ith node is one possible allocation of the shifts, starting with regime k for node i.
phylo: the entry phylogenetic tree

Examples

tree <- read.tree(text="(((A,B),C),D);")
plot(tree)
clusters <- c(0, 1, 2, 2)
sols <- enumerate_parsimony(tree, clusters)
plot(sols)

## Extract the parsimonious solutions from the root
extract(sols) # each line is a solution, with states of each node

## Extract the number of solutions from the root
extract(sols, what = "number")
extract(parsimonyNumber(tree, clusters)) # same result, more efficient

## Extract the cost of the solutions from the root
extract(sols, what = "cost")
extract(parsimonyCost(tree, clusters)) # same result, more efficient:

## Extract for the sub-tree below node 7
extract(sols, 7) # NAs: non-existing nodes in the sub-tree

Tips descendants of nodes.

Description

enumerate_tips_under_edges gives, for each edge of the tree, the labels of the tips that have this edge as an ancestor.

Usage

enumerate_tips_under_edges(tree)

Arguments

tree

phylogenetic tree, class phylo.

Details

This function uses function prop.part from package ape.

Value

list of size Nedge, entry i is the vector of tips bellow edge i.

Find all equivalent shifts allocations and values.

Description

equivalent_shifts computes the equivalent shifts positions and their corresponding values, assuming an ultrametric tree.

Usage

equivalent_shifts(
  phylo,
  params,
  T_tree = incidence.matrix(phylo),
  part.list = enumerate_tips_under_edges(phylo),
  times_shared = NULL
)

Arguments

phylo

a phylogenetic tree, of class phylo.

params

an object of class params_process, result inference by function PhyloEM, or constructed through function params_process

T_tree

(optional) matrix of incidence of the tree, result of function incidence.matrix

part.list

(optional) list of partition of the tree, result of function enumerate_tips_under_edges.

times_shared

(optional) a matrix, result of function compute_times_ca.

Details

This function is only valid for ultrametric trees, and for models: BM, OU with fixed root or stationary root. It assumes that there are no homoplasies.

Value

object of class equivalent_shifts, with entries:

eq_shifts_edges: matrix of equivalent shifts
shifts_and_betas: matrix of corresponding shifts values
phylo: the entry phylogenetic tree
p: the dimension

Examples

if (requireNamespace("TreeSim", quietly = TRUE)) {
  ## Simualte a tree
  set.seed(17920902)
  ntaxa = 20
  phylo <- TreeSim::sim.bd.taxa.age(n = ntaxa, numbsim = 1, lambda = 0.1,
                                   mu = 0, age = 1, mrca = TRUE)[[1]]
  
  ## Define parameters (BM, fixed root)
  params <- params_BM(p = 4, edges = c(6, 17, 31),
                     values = cbind(1:4, -(1:4), rep(1, 4)))
  ## Find equivalent solutions and plot them
  eq_shifts <- equivalent_shifts(phylo, params)
  eq_shifts
  plot(eq_shifts)
  ## Extract the values
  # Shifts values for trait 2, for the three shifts (rows), and three solutions (columns)
  extract(eq_shifts, trait = 2, what = "shifts_values")
  # Root values for trait 4, for the tree solutions (columns)
  extract(eq_shifts, trait = 4, what = "root_values")
  ## Define parameters (OU, stationary root
  params <- params_OU(p = 4, edges = c(6, 17, 31),
                     selection.strength = 0.1,
                     values = cbind(1:4, -(1:4), rep(1, 4)),
                     random = TRUE)
  ## Find equivalent solutions and plot them
  eq_shifts <- equivalent_shifts(phylo, params)
  eq_shifts
  plot(eq_shifts)
  ## Extract the values
  # Shifts values for trait 2, for the three shifts (rows), and three solutions (columns)
  extract(eq_shifts, trait = 2, what = "shifts_values")
  # Root values for trait 4, for the three solutions (columns)
  extract(eq_shifts, trait = 4, what = "root_values")
}

Find all the equivalent shift edges allocations.

Description

equivalent_shifts_edges uses function enumerate_parsimony to find all the shifts positions that are equivalent to a given one.

Usage

equivalent_shifts_edges(
  phylo,
  shifts_edges,
  part.list = enumerate_tips_under_edges(phylo)
)

Arguments

phylo

a phylogenetic tree, class phylo.

shifts_edges

a vector of shifts positions on the edges of the tree.

part.list

(optional) list of partition of the tree, result of function enumerate_tips_under_edges.

Details

This function is uses functions enumerate_parsimony for the actual computation of equivalent regimes, clusters_from_shifts for the clustering of the tips induced by the original set of shifts given, and allocate_shifts_from_regimes to convert back a parametrization in term of regimes to a parametrization in term of shifts.

Value

a matrix with as many columns as equivalent allocation, each column representing a possible parsimonious allocation of shifts on the tree.

Find values given edges. OU stationary case. Ultrametric tree.

Description

equivalent_shifts_values computes the values of the shifts given all the possible allocations computed by function equivalent_shifts_edges.

Usage

equivalent_shifts_values(phylo, eq_shifts_edges, T_tree_ac, vect_Y, p)

Arguments

phylo

a phylogenetic tree, class phylo.

eq_shifts_edges

matrix (optional) result of function equivalent_shifts_edges.

T_tree_ac

matrix of incidence of the tree, result of function incidence.matrix, actualized with coefficients computed by function incidence_matrix_actualization_factors.

Details

This function uses the linear representation of the problem. It fist compute the mean at the tips given by the original shifts positions and values, and then uses function qr.solve to find back the values of the shifts, given their various positions, and the means at the tips. Function compute_actualization_factors is used to compute the actualization factor that multiplies the shifts values at the tips. Careful, only works for ULTRAMETRIC trees.

Value

Named list, with "shifts_values" a matrix of shifts values corresponding to the shifts positions in eq_shifts_edges; and "betas_0" a vector of corresponding root optimal values.

Function to estimate alpha

Description

optimize to maximize the conditional expectation log likelihood in alpha. The interval is set to [alpha_old/2, 2*alpha_old], supposing that the previous guess of alpha_old is not far from reality.

Usage

estimate.alpha(
  phylo,
  conditional_law_X,
  sigma2,
  mu,
  shifts,
  alpha_old,
  max_selection.strength
)

Arguments

phylo

Input tree.

conditional_law_X

result of function compute_E.OU

sigma2

variance of params

mu

mean of the root state

shifts

list of shifts on the tree

alpha_old

previous estimation of the selection strength

max_selection.strength

the maximal value of alpha authorized by the user

Details

This function uses functions compute_var_diff.OU and compute_diff_exp.OU in the process. Careful : only works if the root is stationary, and shifts at nodes.

Value

double : estimation of alpha

Perform One EM

Description

EstimateEM performs one EM for one given number of shifts. It is called from function PhyloEM. Its use is mostly internal, and most user should not need it.

Usage

estimateEM(
  phylo,
  Y_data,
  Y_data_imp = Y_data,
  process = c("BM", "OU", "scOU", "rBM"),
  independent = FALSE,
  tol_EM = list(variance = 10^(-2), value.root = 10^(-2), exp.root = 10^(-2), var.root =
    10^(-2), selection.strength = 10^(-2), normalized_half_life = 10^(-2), log_likelihood
    = 10^(-2)),
  Nbr_It_Max = 500,
  method.variance = c("simple", "upward_downward"),
  method.init = c("default", "lasso"),
  method.init.alpha = c("default", "estimation"),
  method.init.alpha.estimation = c("regression", "regression.MM", "median"),
  nbr_of_shifts = 0,
  random.root = TRUE,
  stationary.root = TRUE,
  alpha_known = FALSE,
  eps = 10^(-3),
  known.selection.strength = 1,
  init.selection.strength = 1,
  max_selection.strength = 100,
  use_sigma_for_lasso = TRUE,
  max_triplet_number = 10000,
  min_params = list(variance = 0, value.root = -10^(5), exp.root = -10^(5), var.root = 0,
    selection.strength = 0),
  max_params = list(variance = 10^(5), value.root = 10^(5), exp.root = 10^(5), var.root =
    10^(5), selection.strength = 10^(5)),
  var.init.root = diag(1, nrow(Y_data)),
  variance.init = diag(1, nrow(Y_data), nrow(Y_data)),
  methods.segmentation = c("lasso", "same_shifts", "best_single_move"),
  check.tips.names = FALSE,
  times_shared = NULL,
  distances_phylo = NULL,
  subtree.list = NULL,
  T_tree = NULL,
  U_tree = NULL,
  h_tree = NULL,
  F_moments = NULL,
  tol_half_life = TRUE,
  warning_several_solutions = TRUE,
  convergence_mode = c("relative", "absolute"),
  check_convergence_likelihood = TRUE,
  sBM_variance = FALSE,
  method.OUsun = c("rescale", "raw"),
  K_lag_init = 0,
  allow_negative = FALSE,
  trait_correlation_threshold = 0.9,
  ...
)

Arguments

phylo

A phylogenetic tree of class phylo (from package ape).

Y_data

Matrix of data at the tips, size p x ntaxa. Each line is a trait, and each column is a tip. The column names are checked against the tip names of the tree.

Y_data_imp

(optional) imputed data if previously computed, same format as Y_data. Mostly here for internal calls.

process

independent

Are the trait assumed to be independent from one another? Default to FALSE. OU in a multivariate setting only works if TRUE.

tol_EM

the tolerance for the convergence of the parameters. A named list, with items:

variance: default to 10^(-2)
value.root: default to 10^(-2)
exp.root: default to 10^(-2)
var.root: default to 10^(-2)
selection.strength: default to 10^(-2)
normalized_half_life: default to 10^(-2)
log_likelihood: default to 10^(-2)

Nbr_It_Max

the maximal number of iterations of the EM allowed. Default to 500 iterations.

method.variance

method.init

The initialization method. One of "lasso" for the LASSO base initialization method; or "default" for user-specified initialization values. Default to "lasso".

method.init.alpha

method.init.alpha.estimation

nbr_of_shifts

the number of shifts allowed.

random.root

whether the root is assumed to be random (TRUE) of fixed (FALSE). Default to TRUE

stationary.root

whether the root is assumed to be in the stationary state. Default to TRUE.

alpha_known

is the selection strength assumed to be known ? Default to FALSE.

eps

tolerance on the selection strength value before switching to a BM. Default to 10^(-3).

known.selection.strength

if alpha_known=TRUE, the value of the known selection strength.

init.selection.strength

(optional) a starting point for the selection strength value.

max_selection.strength

the maximal value allowed of the selection strength. Default to 100.

use_sigma_for_lasso

whether to use the first estimation of the variance matrix in the lasso regression. Default to TRUE.

max_triplet_number

for the initialization of the selection strength value (when estimated), the maximal number of triplets of tips to be considered.

min_params

a named list containing the minimum allowed values for the parameters. If the estimation is smaller, then the EM stops, and is considered to be divergent. Default values:

variance: default to 0
value.root: default to -10^(5)
exp.root: default to -10^(5)
var.root: default to 0
selection.strength: default to 0

max_params

a named list containing the maximum allowed values for the parameters. If the estimation is larger, then the EM stops, and is considered to be divergent. Default values:

variance: default to 10^(5)
value.root: default to 10^(5)
exp.root: default to 10^(5)
var.root: default to 10^(5)
selection.strength: default to 10^(5)

var.init.root

optional initialization value for the variance of the root.

variance.init

optional initialization value for the variance.

methods.segmentation

check.tips.names

whether to check the tips names of the tree against the column names of the data. Default to TRUE.

times_shared

(optional) times of shared ancestry of all nodes and tips, result of function compute_times_ca

distances_phylo

(optional) phylogenetic distances, result of function compute_dist_phy.

subtree.list

(optional) tips descendants of all the edges, result of function enumerate_tips_under_edges.

T_tree

(optional) matrix of incidence of the tree, result of function incidence.matrix.

U_tree

(optional) full matrix of incidence of the tree, result of function incidence.matrix.full.

h_tree

(optional) total height of the tree.

F_moments

(optional, internal)

tol_half_life

should the tolerance criterion be applied to the phylogenetic half life (TRUE, default) or to the raw selection strength ?

warning_several_solutions

whether to issue a warning if several equivalent solutions are found (default to TRUE).

convergence_mode

one of "relative" (the default) or "absolute". Should the tolerance be applied to the raw parameters, or to the renormalized ones ?

check_convergence_likelihood

should the likelihood be taken into consideration for convergence assessment ? (default to TRUE).

sBM_variance

Is the root of the BM supposed to be random and "stationary"? Used for BM equivalent computations. Default to FALSE.

method.OUsun

Method to be used in univariate OU. One of "rescale" (rescale the tree to fit a BM) or "raw" (directly use an OU, only available for univariate processes).

K_lag_init

Number of extra shifts to be considered at the initialization step. Increases the accuracy, but can make computations quite slow of taken too high. Default to 5.

allow_negative

whether to allow negative values for alpha (Early Burst). See documentation of PhyloEM for more details. Default to FALSE.

trait_correlation_threshold

the trait correlation threshold to stop the analysis. Default to 0.9.

...

Further arguments to be passed to estimateEM, including tolerance parameters for stopping criteria, maximal number of iterations, etc.

Details

See documentation of PhyloEM for further details. All the parameters monitoring the EM (like tol_EM, Nbr_It_Max, etc.) can be called from PhyloEM.

Value

An object of class EstimateEM.

Extraction function

Description

extract the needed quantities out of an S3 object.

Usage

extract(x, ...)

Arguments

x

an S3 object.

...

further arguments to be passed to the specific method.

Value

An integer giving the number of equivalent parsimonious solutions.

Extract the result of `enumerate_parsimony` at a node.

Description

extract.enumerate_parsimony returns a matrix containing all the possible regime allocations for the nodes of a given subtree.

Usage

## S3 method for class 'enumerate_parsimony'
extract(
  x,
  node = attr(x$allocations, "ntaxa") + 1,
  what = c("solutions", "number", "cost"),
  ...
)

Arguments

x

an object of class "enumerate_parsimony", result of function enumerate_parsimony.

node

the node where to retrieve the parsimony number. Default to the root of the tree.

what

the quantity to retrieve. Either "solutions" for the full solutions, "number" for the number of solutions, or "cost" for the minimal cost of a solution. Default to "solutions"

...

unused

Value

A matrix with ntaxa + Nnode columns, and as many rows as the number of possible parsimonious reconstructions.

Extract the shifts values for one trait.

Description

extract.equivalent_shifts takes an object of class equivalent_shifts, result of function equivalent_shifts, and returns the shifts of root values for a given trait.

Usage

## S3 method for class 'equivalent_shifts'
extract(x, trait = 1, what = c("shifts_values", "root_values"), ...)

Arguments

x

an object of class equivalent_shifts, result of function equivalent_shifts

trait

the number of the trait to be extracted. Default to 1.

what

one of "shifts_values" or "root_values".

...

unused.

Value

A matrix with the values of the shifts (what = "shifts_values") or the root (what = "root_values") for the trait for each equivalent configuration. Each column is one configuration.

Extraction of the actual number of solutions.

Description

extract.parsimonyCost takes an object of class "parsimonyCost", result of function parsimonyCost, and computes the minimum cost at the given node.

Usage

## S3 method for class 'parsimonyCost'
extract(x, node = attr(x, "ntaxa") + 1, ...)

Arguments

x

an object of class "parsimonyCost", result of function parsimonyCost.

node

the root node of the subtree. By default, the root of the tree.

...

unused

Value

An integer giving the minimum cost of the subtree.

Extraction of the actual number of solutions.

Description

extract.parsimonyNumber takes the two matrices computed by parsimonyNumber, and compute the actual number of parsimonious solution for any subtree starting from a given node.

Usage

## S3 method for class 'parsimonyNumber'
extract(
  x,
  node = attr(x$nbrReconstructions, "ntaxa") + 1,
  what = c("number", "cost"),
  ...
)

Arguments

x

an object of class "parsimonyNumber", result of function parsimonyNumber.

node

the root node of the subtree. By default, the root of the tree.

what

the quantity to retrieve. Either "number" for the number of solutions, or "cost" for the minimal cost of a solution. Default to "number".

...

unused

Details

The parsimonious solutions are the one with the minimum number of shifts (that are given by matrix costReconstructions). This function sums the number of solutions (given in matrix nbrReconstructions) that have the minimum number of shifts.

Value

An integer giving the number of equivalent parsimonious solutions.

Extract from object `partitionsNumber`

Description

extract.partitionsNumber extracts the number of partitions for a given sub-tree, either marked or non-marked.

Usage

## S3 method for class 'partitionsNumber'
extract(
  x,
  node = attr(x, "ntaxa") + 1,
  npart = attr(x, "npart"),
  marked = FALSE,
  ...
)

Arguments

x

an object of class partitionsNumber, result of function partitionsNumber.

node

the root node of the subtree where to get the result. Default to the root of the tree.

npart

the number of partitions (colors) allowed at the tips. Default to the value used in the call of function partitionsNumber (the maximum).

marked

whether to extract the marked (TRUE) or un-marked (FALSE) partitions. The number of models is the number of un-marked partitions. Default to FALSE.

...

unused.

Value

the number of partitions with npart colors, on the sub-tree starting at node, marked or not.

Extraction of simulated traits

Description

extract.simul_process takes an object of class "simul_process", result of function simul_process, and extracts the traits values, expectations or optimal values at the tips or the internal nodes.

Usage

## S3 method for class 'simul_process'
extract(
  x,
  where = c("tips", "nodes"),
  what = c("states", "expectations", "optimal.values"),
  ...
)

Arguments

x

an object of class "simul_process", result of function simul_process.

where

one of "tips" (the default) or "nodes". Where to extract the results.

what

one of "states" (the default), "expectation", or "optimal.values".

...

unused

Details

Value

A matrix giving the selected quantities at the selected nodes or tips. If the tips or nods are labeled, then the colnames of the matrix are set accordingly.

Extract sub-matrices of variance.

Description

extract_variance_covariance return the adequate sub-matrix.

Usage

extract_variance_covariance(
  struct,
  what = c("YY", "YZ", "ZZ"),
  masque_data = c(rep(TRUE, attr(struct, "ntaxa") * attr(struct, "p_dim")), rep(FALSE,
    (dim(struct)[1] - attr(struct, "ntaxa")) * attr(struct, "p_dim")))
)

Arguments

struct

structural matrix of size (ntaxa+Nnode)*p, result of function compute_variance_covariance

what

sub-matrix to be extracted: "YY" : sub-matrix of tips (p*ntaxa first lines and columns) "YZ" : sub matrix tips x nodes (p*Nnode last rows and p*ntaxa first columns) "ZZ" : sub matrix of nodes (p*Nnode last rows and columns)

masque_data

Mask of missing data

Value

sub-matrix of variance covariance.

Find a reasonable grid for alpha

Description

Grid so that 2*ln(2)*quantile(d_ij)/factor_up_alpha < t_1/2 < factor_down_alpha * ln(2) * h_tree, with t_1/2 the phylogenetic half life: t_1/2 = log(2)/alpha. Ensures that for alpha_min, it is almost a BM, and for alpha_max, almost all the tips are decorrelated.

Usage

find_grid_alpha(
  phy,
  alpha = NULL,
  nbr_alpha = 10,
  factor_up_alpha = 2,
  factor_down_alpha = 3,
  quantile_low_distance = 1e-04,
  log_transform = TRUE,
  allow_negative = FALSE,
  ...
)

Arguments

phy

phylogenetic tree of class "phylo"

alpha

fixed vector of alpha values if already known. Default to NULL.

nbr_alpha

the number of elements in the grid

factor_up_alpha

factor for up scalability

factor_down_alpha

factor for down scalability

quantile_low_distance

quantile for min distance

log_transform

whether to take a log scale for the spacing of alpha values. Default to TRUE.

allow_negative

whether to allow negative values for alpha (Early Burst). See documentation of PhyloEM for more details. Default to FALSE.

...

not used.

Details

If quantile_low_distance=0, then quantile(d_ij)=min(d_ij), and, for any two tips i,j, the correlation between i and j is bounded by exp(-factor_up_alpha/2). Those values of alpha will be used for the re-scaling of the tree, which has an exponential term in exp(2*alpha*h). The function makes sure that this number is below the maximal float allowed (equals to .Machine$double.xmax).

Value

A grid of alpha values

Given a regularization path, find K selected independent variables.

Description

find_independent_regression_vectors tries to find a situation where K variables are selected, so that the selected columns of matrix Xp are independent.

Usage

find_independent_regression_vectors.glmnet_multivariate(Xp, K, fit, root)

Arguments

Xp

(transformed) matrix of regression

K

number of non-zero components allowed

root

integer, position of the root column (intercept) excluded from the fit. null if no root column.

Details

To do that, if a set of selected is not independent, we go back to the previous selected variables, and forbid the moves that led to non-independence for the rest of the path.

Value

delta a vector of regression with K non-zero coefficients.

Test for rotation invariant datasets

Description

find_rotation takes two fits from from PhyloEM, and test if their datasets are equal up to a rotation.

Usage

find_rotation(res1, res2, tol = NULL)

Arguments

res1

an object of class PhyloEM.

res2

an object of class PhyloEM.

tol

relative numerical tolerance. Default to .Machine$double.eps^(0.5).

Value

If appropriate, the rotation matrix rot such that dat1 = rot

Find values given edges. OU stationary case. Ultrametric tree.

Description

find_actualized_shift_values computes the values of the shifts their positions, and the mean values at the tips. Warning : this function does not check for consistency. Please make sure that the shifts positions and the mean values are compatible.

Usage

find_shift_values(shifts_edges, T_tree_ac, vect_Y, p, ntaxa)

Arguments

shifts_edges

a vector of positions of shifts on the tree.

T_tree_ac

matrix of incidence of the tree, result of function incidence.matrix.

Details

This function uses qr.solve for rectangular linear system solving.

Value

vector, with first entry the values at the root, and other entries the values of the shifts.

Run the EM for several values of K

Description

estimateEM_several_K.OUsr uses function estimateEM on the data, for all values of K between 0 and K_max.

Usage

format_output(results_estim_EM, phylo, time = NA)

Arguments

results_estim_EM

output of function estimateEM

time

to run the function

Details

The EM is first launched for K=0, with alpha and gamma estimated. The estimated values of alpha, gamma and beta_0 found by this first EM are then used as initialization parameters for all the other runs of the EM for other K. The EMs are parallelized thanks to packages foreach and doParallel. WARNING : this code only work of OU with stationary root, on an ultrametric tree.

Value

summary a data frame with K_max lines, and columns: - alpha_estim the estimated selection strength - gamma_estim the estimated root variance - beta_0_estim the estimated value of root optimum - EM_steps number of iterations needed before convergence - DV_estim has the EM diverged ? - CV_estim has the EM converged ? - log_likelihood log likelihood of the data using the estimated parameters - mahalanobis_distance_data_mean the Mahalanobis distance between the data and the estimated means at the tips - least_squares the Mahalanobis distance, renormalized by gamma^2: mahalanobis_distance_data_mean * gamma_estim. - mean_number_new_shifts the mean number of shifts that changed over the iterations of the EM - number_equivalent_solutions the number of equivalent solutions to the solution found. - K_try the number of shifts allowed. - complexity the complexity for K_try - time the CPU time needed.

params a list of inferred parameters

params_init a list of initial parameters

alpha_0 initial values of alpha

gamma_0 initial values of gamma

Zhat reconstructed node states

m_Y_estim reconstructed tip states

edge.quality for each edge, relative number of iterations in which they were present.

Get Model Selection Criterion

Description

This function takes an object of class PhyloEM, result of function PhyloEM, and return the values of the model selection criterion for each value of K.

Usage

get_criterion(res, method.selection = NULL)

Arguments

res

an object of class PhyloEM, result of function PhyloEM.

method.selection

select the parameters to plot. One of "LINselect", "DDSE", "Djump" or "likelihood" (for un-penalized likelihood). Default to "LINselect". See params_process.PhyloEM.

Value

A named vector with the values of the criterion for each number of shift K.

Get variance matrix of a node

Description

get_variance_node returns the conditional variance of a node, or the conditional covariance of a node and its parent.

Usage

get_variance_node(node, vars)

Arguments

node

for which to extract the matrix.

vars

matrix of size p x p*(ntaxa+Nnode) result of function compute_E.simple, entry "variances" or "covariances".

Value

sub-matrix of variance for the node.

Scale the parameters back to the original process

Description

go_back_to_original_process takes the inferred parameters with a BM on a rescaled tree, and gives back the equivalent parameters of the OU on the original process.

Usage

go_back_to_original_process(
  phy_original,
  known.selection.strength,
  sBM_variance,
  params
)

Arguments

phy_original

the original phylogenetic tree

known.selection.strength

the known selection strength of the original OU.

sBM_variance

boolean. Is the root random ?

params

the inferred parameters of the BM on the re-scaled tree.

Value

params_scOU the equivalent parameters of the OU on the original tree.

Ancestral State Reconstruction

Description

imputed_traits.PhyloEM takes an object of class PhyloEM, and returns the imputed traits values, either at the internal nodes (ancestral state reconstruction) or at the tips (data imputation)

Usage

imputed_traits(x, ...)

## S3 method for class 'PhyloEM'
imputed_traits(
  x,
  trait = 1,
  save_all = FALSE,
  where = c("nodes", "tips"),
  what = c("imputed", "variances", "expectations"),
  params = NULL,
  method.selection = NULL,
  reconstructed_states = NULL,
  ...
)

Arguments

x

an object of class PhyloEM.

...

further arguments to be passed on to params_process.PhyloEM

trait

an integer giving the trait to extract. Default to 1.

save_all

if TRUE, arguments where and what are ignored, and all the moments are kept for further extraction with the same function, specifying the argument reconstructed_states. Default to FALSE.

where

either "nodes" for ancestral state reconstruction, or "tips" for data imputation.

what

the quantity to retrieve. Either the imputed traits (default), their conditional variances, or the simple expectations under the selected process.

params

(optional) some user-specified parameters. Must be of class params_process. If left blank, they are extracted using the method.selection argument (see below).

method.selection

(optional) the method selection to be used. One of "LINselect", "DDSE", "Djump". Default to "LINselect".

reconstructed_states

if the reconstructed states have already been computed (by a previous call of the function, with save_all=TRUE), they can be passed on here (avoids multiple computations of the E step).

Value

A matrix or array with the computed quantities.

Methods (by class)

imputed_traits(PhyloEM): PhyloEM object

Incidence matrix of a tree.

Description

incidence.matrix computes the incidence matrix T of a tree : for a lineage i and a branch b, T[i,b]=1 if b is in the lineage i, and 0 otherwise.

Usage

incidence.matrix(phylo)

Arguments

phylo

a phylogenetic tree, class phylo.

Value

Matrix of incidence, size Nedge x ntaxa.

Incidence matrix of a tree.

Description

incidence.matrix.full computes the incidence matrix U of a tree : for a node i and a branch b, U[i,b]=1 if b is in the lineage i, and 0 otherwise.

Usage

incidence.matrix.full(phylo)

Arguments

phylo

a phylogenetic tree, class phylo.

Value

Matrix of incidence, size ntaxa + Nnode.

Compute the actualization factors to apply to the incidence matrix.

Description

incidence_matrix_actualization_factors computes a ntaxa x Nedge matrix of the (1 - exp(-alpha * (t_i - t_pa(j) - nu_j * l_j)))_{i tip, j node}. This matrix is to be multiplied to the incidence matrix with an outer product.

Usage

incidence_matrix_actualization_factors(
  tree,
  selection.strength,
  relativeTimes_tree = 0,
  times_shared = compute_times_ca(tree)
)

Arguments

tree

a phylogenetic tree.

selection.strength

the selection strength of the process.

relativeTimes_tree

a Nedge vector of relative times associated with the branches.

times_shared

a matrix, result of function compute_times_ca.

Value

Matrix of size ntaxa x Nedge

Initialization of the shifts using Lasso.

Description

init.EM.lasso does the following regression : ||Y_data-T.delta||_(Sigma_YY^(-1)) + lambda |delta|_1 using the function glmnet::glmnet of package glmnet, through function lasso_regression_K_fixed. T is the incidence matrix of the tree, and delta the vectorial representation of the shifts (see functions incidence.matrix and shifts.list_to_vector for further details).

Usage

init.EM.lasso(
  phylo,
  Y_data,
  Y_data_imp = Y_data,
  Y_data_vec_known = as.vector(Y_data),
  process,
  times_shared = compute_times_ca(phylo),
  distances_phylo,
  nbr_of_shifts,
  K_lag_init = 0,
  use_sigma = TRUE,
  params_sigma = NULL,
  variance.init = diag(1, p, p),
  random.init = FALSE,
  value.root.init = rep(0, p),
  exp.root.init = rep(1, p),
  var.root.init = diag(1, p, p),
  edges.init = NULL,
  values.init = matrix(0, p, length(edges.init)),
  relativeTimes.init = NULL,
  selection.strength.init = 1,
  optimal.value.init = rep(0, p),
  T_tree = incidence.matrix(phylo),
  subtree.list = NULL,
  miss = FALSE,
  sBM_variance = FALSE,
  stationary.root.init = FALSE,
  masque_data,
  independent = FALSE,
  ...
)

Arguments

Y_data

data at the tips.

times_shared

(matrix) : times of shared ancestry, result of function compute_times_ca.

distances_phylo

(matrix) : phylogenetic distance, result of function compute_dist_phy

nbr_of_shifts

number of shifts used in the EM algorithm

Details

A Cholesky decomposition of function Sigma_YY^(-1) is used. lambda is chosen so that delta has the right number of non zero components.

Value

params_init the list of initial parameters to be used, in the right format.

Initialization for the allocation of shifts.

Description

init.allocate_regimes_from_shifts initialize the vector of regimes at nodes and tips, with the regime (0) at the root.

Usage

init.allocate_regimes_from_shifts(phy, ...)

Arguments

phy

Input tree.

Details

This function is used in function allocate_regimes_from_shifts and is designed to furnish function update.allocate_regimes_from_shifts with the right structure of data.

Value

Matrix of size (Nnode + ntaxa)x1 of NAs, with the 0 at the root.

Initialization the selection strength alpha using robust estimation

Description

init.alpha.estimation fits (Y_i-Y_j)^2 ~ gamma^2(1-exp(-alpha*d_ij)) for all couples of tips (i,j) that have the same mean, i.e than are not separated by a shift. Shifts are initialized thanks to a lasso (function init.EM.lasso).

Usage

init.alpha.gamma.estimation(
  phylo,
  Y_data,
  nbr_of_shifts,
  times_shared,
  distances_phylo,
  T_tree,
  subtree.list,
  max_triplet_number,
  alpha_known,
  method.init.alpha.estimation,
  tol_EM,
  h_tree,
  miss,
  masque_data,
  independent,
  ...
)

Arguments

phylo

a phylogenetic tree, class phylo.

Y_data

data at the tips.

nbr_of_shifts

number of shifts used in the EM algorithm

distances_phylo

(matrix) : phylogenetic distance, result of function compute_dist_phy

Details

Function robustbase::nlrob is used for the robust fit.

Value

params_init the list of initial parameters to be used, in the right format.

Initialization for the computation of the optimal values

Description

init.compute_betas_from_shifts initialize the vector of optimal values at nodes and tips, with the value at the root.

Usage

init.compute_betas_from_shifts(phy, optimal.value, ...)

Arguments

phy

Input tree.

optimal.value

the optimal value at the root of the tree

Details

This function is used in function compute_betas_from_shifts and is designed to furnish function update.compute_betas_from_shifts with the right structure of data.

Value

Matrix of size (Nnode + ntaxa)x1 of NAs, with the optimal value at the root.

Initialization for the enumeration of parsimonious solutions.

Description

init.enumerate_parsimony is used in function enumerate_parsimony, and initialize the correct data structure.

Usage

init.enumerate_parsimony(phy, clusters, pos)

Arguments

phy

Input tree.

clusters

a vector representing the group of each tip.

Details

This function returns a list with Nnode + ntaxa entries. Entries corresponding to the tips are initialized with a list of nclus matrices. For tip i of group k, all matrices are set to NULL, except for the kth, set to a vector of size Nnode + ntaxa, with entry i set to k, and all the others to NA.

Value

A list of size Nnode + ntaxa, as described above.

Initialization for incidence matrix

Description

init.incidence.matrix initialize the matrix updated in update.incidence.matrix for the computation of the incidence matrix in incidence.matrix.

Usage

init.incidence.matrix(phy)

Arguments

phy

Input tree.

Details

The initialized matrix has ntaxa column and Nnode rows. Each node represent its parental branch. A row corresponding to a tip i is initialized to a vector of zeros, with only entry i equal to one. (Branch ending at tip i is only in the i^th lineage)

Value

Matrix with Nnode rows and ntaxa column.

Initialization for incidence matrix (full tree)

Description

init.incidence.matrix.full initialize the matrix updated in update.incidence.matrix.full for the computation of the incidence matrix of the full tree in incidence.matrix.full.

Usage

init.incidence.matrix.full(phy)

Arguments

phy

Input tree.

Details

The initialized matrix is squared of size ntaxa + Nnode. Each node represent its parental branch. A row corresponding to a tip i is initialized to a vector of zeros, with only entry i equal to one. (Branch ending at tip i is only in the i^th lineage)

Value

Matrix of size ntaxa + Nnode.

Initialization for parsimonyCost.

Description

init.parsimonyCost initialize a (ntaxa + Nnode) x (nclus) matrix with NAs everywhere, except for the tips.

Usage

init.parsimonyCost(phy, clusters)

Arguments

phy

a phylogenetic tree, class phylo.

clusters

the vector of the clusters of the tips.

Details

At a tip i in state k, the line-vector is initialized as follow : (1 - Ind(k=p)_{1<=p<=nclus})*Inf (where Inf * 0 = 0)

Value

A (ntaxa + Nnode)x(nclus) matrix, with ntaxa first lines initialized as described.

Initialization for parsimonyNumber.

Description

init.parsimonyNumber initialize a (ntaxa + Nnode)x(nclus) matrix with NAs everywhere, except for the tips.

Usage

init.parsimonyNumber(phy, clusters)

Arguments

phy

phylogenetic tree.

clusters

the vector of the clusters of the tips.

Details

At a tip i in state k, the line-vector is initialized as follow : Ind(k=p)_{1<=p<=nclus}

Value

A (ntaxa + Nnode)x(nclus) matrix, with ntaxa first lines initialized as described.

Initialize BM

Description

Function used in simulate for BM initialization.

Usage

init.simulate.BM(phy, p, root.state, simulate_random, ...)

Arguments

phy

Input tree.

root.state

(list) State of the root, with: random : random state (TRUE) or deterministic state (FALSE) value.root : if deterministic, value of the character at the root exp.root : if random, expectation of the character at the root var.root : if random, variance of the character at the root (pxp matrix)

Value

paramSimu Array p x Nnode x 2 (BM), filled with NAs. Slice paramSimu[, ntaxa + 1, ] (array p x 2) is initialized with simulated states and root expectations for all the traits.

Initialize state and expectation matrices

Description

Function used in simulate for OU initialization.

Usage

init.simulate.OU(phy, p, root.state, optimal.value, simulate_random, ...)

Arguments

phy

Input tree.

p

dimension of the trait simulated

root.state

Value

paramSimu: array p x Nnode x 3, filled with NAs. Slice paramSimu[, ntaxa + 1, ] (array p x 3) is initialized with simulated states, root expectations, and optimal values for all the traits.

Initialize state and expectation matrices

Description

Function used in simulate for BM/OU initializations.

Usage

init.simulate.StateAndExp(phy, p, root.state, simulate_random)

Arguments

phy

Input tree.

p

dimension of the trait simulated

root.state

Value

paramSimu: array p x Nnode x 2 (BM), filled with NAs. Slice paramSimu[, ntaxa + 1, ] (array p x 2) is initialized with simulated states and root expectations for all the traits.

Check whether parameters are in ranges.

Description

is.in.ranges.params checks whether calculated parameters in the EM are in the defined ranges.

Usage

is.in.ranges(p, min, max)

Arguments

p

list of parameters with the correct structure

min

list of minimum values for the parameters

max

list of maximum values for the parameters

Details

This function is used to test the convergence of the algorithm.

Value

boolean

Do a lasso regression with the number of non-zero variables fixed.

Description

lasso_regression_K_fixed does the following regression : ||Yp-Xp.delta|| + lambda |delta|_1 using the function glmnet::glmnet of package glmnet, where delta is a vector representing the shifts occurring on the branches. It does a Gauss lasso regression using function lm on top of it. This function is used in functions init.EM.lasso, segmentation.OU.specialCase.lasso, ...

Usage

lasso_regression_K_fixed.glmnet_multivariate(
  Yp,
  Xp,
  K,
  root = NULL,
  penscale = rep(1, ncol(Xp)),
  K_lag = 0
)

Arguments

Yp

(transformed) data

Xp

(transformed) matrix of regression

K

number of non-zero components allowed

Details

lambda is chosen so that delta has the right number of non zero components. If not possible, either temporarily raise the number of shifts and then select only the shifts with the highest modulus, or if not possible, throw an error.

Value

E0.gauss the intercept (value at the root)

shifts.gauss the list of shifts found on the branches

Log Likelihood of a fitted object

Description

log_likelihood computes the log likelihood of some parameters.

Usage

log_likelihood(x, ...)

## S3 method for class 'params_process'
log_likelihood(x, Y_data, phylo, ...)

## S3 method for class 'PhyloEM'
log_likelihood(x, ...)

Arguments

x

an object of class params_process or PhyloEM.

...

for a PhyloEM object, further arguments to be passed on to params_process.PhyloEM (to choose which parameters to extract from the results, see documentation of this function).

Y_data

matrix of data at the tips, size p x ntaxa. Each line is a trait, and each column is a tip. The column names are checked against the tip names of the tree.

phylo

a phylogenetic tree, class phylo. incidence.matrix.full. Can be specified to avoid extra computations.

Value

The log likelihood of the data with the provided parameters on the tree.

Methods (by class)

log_likelihood(params_process): params_process object
log_likelihood(PhyloEM): PhyloEM object

Compute parent matrix from possibles daughter matrices.

Description

matrix_of_possibles is used in function update.enumerate_parsimony to compute, from the list of possible matrices for the daughters, the matrix for the node (a group for the parent being fixed).

Usage

matrix_of_possibles(matrices)

Arguments

matrices

a list of matrices with ndaughters entries.

Details

This function select all possible combinations of rows from all the daughters, and merge then into one using function merge_complementary_vectors.

Value

Matrix of all possible regimes for the subtree bellow node parent.

Merge PhyloEM fits on various grids of alpha values

Description

merge_alpha_grids takes several fits from PhyloEM, and merge them so as to take into account all alpha values. This can be used to break down computations into smaller chunks to be run independently.

Usage

merge_alpha_grids(...)

Arguments

...

objects of class PhyloEM fitted on the same dataset with the same parameters, but different grids of alpha values.

Value

An object of class PhyloEM, result of the merge.

Examples

## Not run: 
## Load Data
data(monkeys)
## First fit with coarse grid
res1 <- PhyloEM(Y_data = monkeys$dat,
                phylo = monkeys$phy,
                process = "scOU",
                random.root = TRUE,
                stationary.root = TRUE,
                K_max = 10,
                alpha = c(0.2, 0.3),
                parallel_alpha = TRUE,
                Ncores = 2)
## Second fit with finer grid
res2 <- PhyloEM(Y_data = monkeys$dat,
                phylo = monkeys$phy,
                process = "scOU",
                random.root = TRUE,
                stationary.root = TRUE,
                K_max = 10,
                alpha = c(0.22, 0.24),
                parallel_alpha = TRUE,
                Ncores = 2)
## Thrid fit with redundancies
res3 <- PhyloEM(Y_data = monkeys$dat,
                phylo = monkeys$phy,
                process = "scOU",
                random.root = TRUE,
                stationary.root = TRUE,
                K_max = 10,
                alpha = c(0.26, 0.3),
                parallel_alpha = TRUE,
                Ncores = 2)

## Merge the three
res_merge <- merge_alpha_grids(res1, res2, res3)
## Plot the selected result
plot(res_merge)
## Plot the model selection criterion
plot_criterion(res_merge)

## End(Not run)

Merge several complementary vectors into one.

Description

merge_complementary_vectors take several vectors with complementary entries (i.e, vector of same length, that are such that only one vector has a non NA value for each entry), and merge them into one.

Usage

merge_complementary_vectors(comb, mat)

Arguments

comb

vector giving the rows to be kept.

mat

matrix containing the vectors as rows.

Details

The vectors are selected from the entry matrix by comb. At each entry, the vectors are added using function add_complementary.

Value

row vector of the same size as entry matrix.

Merge a list of independent parameters into into one parameter

Description

merge_params_independent merges a list of p params objects into one param object of dimension p The reverse operation is done by split_params_independent

Usage

merge_params_independent(params_split)

Arguments

params_split

a list of parameters

Value

A parameter object

Merge fits from independent runs of PhyloEM.

Description

merge_rotations takes several fits from PhyloEM, and merge them according to the best score (maximum likelihood or least squares). For each number of shifts, The datasets needs to be equal up to a rotation. This is tested thanks to a QR decomposition, see function find_rotation.

Usage

merge_rotations(..., method.selection = NULL, tol = NULL)

Arguments

...

objects of class PhyloEM fitted on datasets that are equal up to a rotation.

method.selection

(optional) selection method to be applied to the merged fit. See params_process.PhyloEM.

tol

(optional) relative numerical tolerance. See find_rotation.

Value

An object of class PhyloEM, result of the merge.

Examples

## Not run: 
## Load Data
data(monkeys)
## Run method
# Note: use more alpha values for better results.
res <- PhyloEM(Y_data = monkeys$dat,        ## data
               phylo = monkeys$phy,         ## phylogeny
               process = "scOU",            ## scalar OU
               random.root = TRUE,          ## root is stationary
               stationary.root = TRUE,
               K_max = 10,                  ## maximal number of shifts
               nbr_alpha = 4,               ## number of alpha values
               parallel_alpha = TRUE,       ## parallelize on alpha values
               Ncores = 2)
## Rotate dataset
rot <- matrix(c(cos(pi/4), -sin(pi/4), sin(pi/4), cos(pi/4)), nrow= 2, ncol = 2)
Yrot <- t(rot) %*% monkeys$dat
rownames(Yrot) <- rownames(monkeys$dat)
## Fit rotated dataset
# Note: use more alpha values for better results.
res_rot <- PhyloEM(Y_data = Yrot,               ## rotated data
                   phylo = monkeys$phy,         
                   process = "scOU",            
                   random.root = TRUE,          
                   stationary.root = TRUE,
                   K_max = 10,                  
                   nbr_alpha = 4,               
                   parallel_alpha = TRUE,       
                   Ncores = 2)
## Merge the two
res_merge <- merge_rotations(res, res_rot)
## Plot the selected result
plot(res_merge)
## Plot the model selection criterion
plot_criterion(res_merge)

## End(Not run)

Model Selection of a fitted object

Description

model_selection does the model selection on a fitted PhyloEM object, and returns the same fitted object.

Usage

model_selection(x, ...)

## S3 method for class 'PhyloEM'
model_selection(
  x,
  method.selection = c("LINselect", "DDSE", "Djump"),
  C.BM1 = 0.1,
  C.BM2 = 2.5,
  C.LINselect = 1.1,
  independent = FALSE,
  ...
)

Arguments

x

a fitted PhyloEM object

...

Further arguments to be passed to estimateEM, including tolerance parameters for stopping criteria, maximal number of iterations, etc.

method.selection

C.BM1

Multiplying constant to be used for the BigeMassart1 method. Need to be positive. Default to 0.1.

C.BM2

Multiplying constant to be used for the BigeMassart2 method. Default to 2.5.

C.LINselect

Multiplying constant to be used for the LINselect method. Need to be greater than 1. Default to 1.1.

independent

Are the trait assumed to be independent from one another? Default to FALSE. OU in a multivariate setting only works if TRUE.

Value

The same object, but with a slot corresponding to the model selection used. See function params_process.PhyloEM to retrieve the selected parameters.

Methods (by class)

model_selection(PhyloEM): PhyloEM object

New World Monkeys dataset

Description

Morphometric dataset and phylogeny for brain shape variation of 50 species of New World monkeys (platyrrhine).

Usage

monkeys

Format

A list containing two objects:

phy: The Phylogenetic tree for the platyrrhine species, pruned to match the species in the morphometric dataset
dat: First two PC scores from a PCA of the species-averaged Procrustes coordinates

References

Aristide, L., dos Reis, S. F., Machado, A. C., Lima, I., Lopes, R. T. & Perez, S. I. (2016). Brain shape convergence in the adaptive radiation of New World monkeys. Proceedings of the National Academy of Sciences, 113(8), 2158–2163. http://doi.org/10.1073/pnas.1514473113

Computation of the optimal values at nodes and tips.

Description

compute_betas_from_shifts computes the optimal values at the nodes and tips of the tree, given the value at the root and the list of shifts occurring in the tree. It assumes an OU model.

Usage

node_optimal_values(param, phylo)

Arguments

param

an object of class params_process.

phylo

a phylogenetic tree, class phylo.

Value

Matrix of size ntraits x (ntaxa + Nnode) of the optimal values at the node and tips of the tree. Column names correspond to the number of the node in the phylo object.

Examples

set.seed(1792)
ntaxa = 10
tree <- rphylo(ntaxa, 1, 0.1)
# parameters of the process
par <- params_process("BM",                             ## Process
                      p = 2,                            ## Dimension
                      variance = diag(0.5, 2, 2) + 0.5, ## Rate matrix
                      edges = c(4, 10, 15),             ## Positions of the shifts
                      values = cbind(c(5, 4),           ## Values of the shifts
                                     c(-4, -5),
                                     c(5, -3)))
plot(par, phylo = tree, traits = 1, value_in_box = TRUE,
     shifts_bg = "white", root_bg = "white", ancestral_as_shift = TRUE, root_adj = 5)
nodelabels()
node_optimal_values(par, tree)

Create an object `params_process` for a BM

Description

params_BM creates a coherent object params_process from user provided values of the parameters. Non specified parameters are set to default values.

Usage

params_BM(
  p = 1,
  variance = diag(1, p, p),
  random = FALSE,
  value.root = rep(0, p),
  exp.root = rep(0, p),
  var.root = diag(1, p, p),
  edges = NULL,
  values = matrix(0, p, length(edges)),
  relativeTimes = NULL,
  nbr_of_shifts = length(edges),
  phylo = NULL,
  sBM_variance = FALSE,
  trait_names = NULL,
  ...
)

Arguments

p

the dimension (number of traits) of the parameters. Default to 1.

variance

the variance (rate matrix) of the BM. Default to diag(1, p, p).

random

whether the root of the BM is random (TRUE) or fixed (FALSE). Default to FALSE.

value.root

if random=FALSE, the root value. Default to 0.

exp.root

if random=TRUE, the root expectation. Default to 0.

var.root

if random=TRUE, the root variance. Default to diag(1, p, p).

edges

a vector of edges where the shifts occur. Default to NULL (no shift).

values

a matrix of shift values, with p lines and as many columns as the number of shifts. Each column is the p values for one shift. Default to matrix(0, p, length(edges)).

relativeTimes

(unused) the relative position of the shift on the branch, between 0 (beginning of the branch) and 1 (end of the branch). Default to 0.

nbr_of_shifts

the number of shifts to use (randomly drawn). Use only if edges is not specified. In that case, a phylogenetic tree must be provided (to allow a random sampling of its edges).

phylo

a phylogenetic tree of class phylo. Needed only if the shifts edges are not specified, or if sBM_variance=TRUE. Default to NULL. If sBM_variance=TRUE, it must have a specified value for the root branch length (slot root.edge).

sBM_variance

if the root is random, does it depend on the length of the root edge ? (For equivalent purposes with a rescaled OU). Default to FALSE. If TRUE, a phylogenetic tree with root edge length must be provided.

trait_names

vector of trait names values. Must be of length p.

...

unused.

Value

an object of class params_process.

Create an object `params_process` for an OU

Description

params_OU creates a coherent object params_process from user provided values of the parameters. Non specified parameters are set to default values.

Usage

params_OU(
  p = 1,
  variance = diag(1, p, p),
  selection.strength = diag(1, p, p),
  optimal.value = rep(0, p),
  random = TRUE,
  stationary.root = TRUE,
  value.root = rep(0, p),
  exp.root = rep(0, p),
  var.root = diag(1, p, p),
  edges = NULL,
  values = matrix(0, p, length(edges)),
  relativeTimes = NULL,
  nbr_of_shifts = length(edges),
  phylo = NULL,
  trait_names = NULL,
  ...
)

Arguments

p

the dimension (number of traits) of the parameters. Default to 1.

variance

the variance (rate matrix) of the BM. Default to diag(1, p, p).

selection.strength

the selection strength matrix. Default to diag(1, p, p).

optimal.value

the vector of the optimal values at the root. Default to rep(0, p).

random

whether the root of the OU is random (TRUE) or fixed (FALSE). Default to TRUE.

stationary.root

whether the root of the OU is stationary (TRUE) or not. Default to TRUE.

value.root

if random=FALSE, the root value. Default to 0.

exp.root

if random=TRUE, the root expectation. Default to 0. If stationary.root=TRUE, default to optimal.value.

var.root

if random=TRUE, the root variance. Default to diag(1, p, p). If stationary.root=TRUE, default to the stationary variance computed from variance and selection.strength, see function compute_stationary_variance.

edges

a vector of edges where the shifts occur. Default to NULL (no shift).

values

a matrix of shift values, with p lines and as many columns as the number of shifts. Each column is the p values for one shift. Default to matrix(0, p, length(edges)).

relativeTimes

(unused) the relative position of the shift on the branch, between 0 (beginning of the branch) and 1 (end of the branch). Default to 0.

nbr_of_shifts

the number of shifts to use (randomly drawn). Use only if edges is not specified. In that case, a phylogenetic tree must be provided (to allow a random sampling of its edges).

phylo

a phylogenetic tree of class phylo. Needed only if the shifts edges are not specified.

trait_names

vector of trait names values. Must be of length p.

...

unused.

Value

an object of class params_process.

Create an object params_process

Description

params_process creates or extracts a set of parameters of class params_process.

Usage

params_process(x, ...)

Arguments

x

an S3 object.

...

further arguments to be passed to the specific method.

Value

An S3 object of class params_process. This is essentially a list containing the following entries:

process

The model used. One of "BM" (for a full BM model, univariate or multivariate); "OU" (for a full OU model, univariate or multivariate); or "scOU" (for a "scalar OU" model).

p

Dimension of the trait.

root.state

List describing the state of the root, with:

random: random state (TRUE) or deterministic state (FALSE)
value.root: if deterministic, value of the character at the root
exp.root: if random, expectation of the character at the root
var.root: if random, variance of the character at the root (pxp matrix)

shifts

List with position and values of the shifts:

edges: vector of the K id of edges where the shifts are
values: matrix p x K of values of the shifts on the edges (one column = one shift)
relativeTimes: vector of dimension K of relative time of the shift from the parent node of edges

variance

Variance-covariance matrix size p x p.

selection.strength

Matrix of selection strength size p x p (OU).

optimal.value

Vector of p optimal values at the root (OU).

Parameter estimates

Description

params takes an object of class PhyloEM, and returns the inferred parameters of the process.

Usage

## S3 method for class 'PhyloEM'
params_process(
  x,
  method.selection = NULL,
  K = NULL,
  alpha = NULL,
  rBM = FALSE,
  init = FALSE,
  ...
)

Arguments

x

an object of class PhyloEM

method.selection

(optional) the method selection to be used. One of "LINselect", "DDSE", "Djump". Default to "LINselect".

K

(optional) an integer giving the number of shifts for which to retrieve the parameters. Default to NULL (automatically selected number of shifts, see method.selection argument).

alpha

(optional) a value of alpha for which to retrieve the parameters. Can be an (un-ambiguous) estimation of the true value. If specified, then K must be precised too. Default to NULL (automatically selected value, see method.selection argument).

rBM

(optional) if TRUE, and if the process is "scOU", returns the raw parameters of the BM on the re-scaled tree. Default to FALSE, except if the selection strength is negative (see doc of PhyloEM for an explanation of this particular case).

init

(optional) if TRUE, gives the parameters from the initialization of the EM. Default to FALSE. This has no effect if K is not specified.

...

unused.

Value

An object of class params_process.

Create an object `params_process`

Description

params_process creates a coherent object params_process from user provided values of the parameters.

Usage

## S3 method for class 'character'
params_process(x, ...)

Arguments

x

one of "BM" or "OU"

...

specified parameters, see functions params_BM and params_OU for details.

Value

an object of class params_process.

Minimal number of shifts needed to get a clustering.

Description

parsimonyCost is an implementation of the Sankoff algorithm, when the cost of transition between two state is always one. It is used in functions parsimonyNumber and enumerate_parsimony to count or enumerate all the parsimonious solutions given one clustering of the tips.

Usage

parsimonyCost(phylo, clusters = rep(1, length(phylo$tip.label)))

Arguments

phylo

a phylogenetic tree, class phylo.

clusters

the vector of the clusters of the tips. (Default to all the tips in a single group).

Value

An S3 class "parsimonyCost" containing a (ntaxa + Nnode) x (nclus) matrix of the total number of shifts needed to get the clustering, if starting from a node in state k. The cost can be extract from any subtree with function extract.parsimonyCost.

Examples

tree <- read.tree(text="(((1,1),2),2);")
plot(tree); nodelabels()
clusters <- c(1, 1, 2, 2)
costs <- parsimonyCost(tree, clusters)
costs

## Extract the parsimony cost at the root
extract(costs)

## Extract the cost for the sub-tree below node 7
extract(costs, 7)

Number of equivalent parsimonious allocations.

Description

parsimonyNumber aims at finding the number of equivalent allocations of the shifts on the tree, i.e allocations that are parsimonious and compatible with a given clustering of the tips.

Usage

parsimonyNumber(phylo, clusters = rep(1, length(phylo$tip.label)))

Arguments

phylo

phylogenetic tree, class phylo.

clusters

the vector of the clusters of the tips. Default to all the tips in one single cluster.

Details

This function does a recursion up the tree. The function extract.parsimonyNumber gives the result sought for any subtree. The matrix of costs of the states (number of shifts) is also required, it is computed by function parsimonyCost.

Value

an object of S3 class "parsimonyNumber" with:

nbrReconstructions: a (ntaxa + Nnode) x (nclus) matrix of locally parsimonious solutions starting from a cluster k at a given node
costReconstructions: an object of class "parsimonyCost", result of function parsimonyCost.

Examples

tree <- read.tree(text="(((0,1),2),2);")
plot(tree); nodelabels()
clusters <- c(0, 1, 2, 2)
n_sols <- parsimonyNumber(tree, clusters)
n_sols

## Extract the number of parsimonious solutions at the root
extract(n_sols)

## Extract the cost of the solutions from the root
extract(n_sols, what = "cost")
extract(parsimonyCost(tree, clusters)) # same, more efficient

## Extract for the sub-tree below node 7
extract(n_sols, 7) # Result: 2 (the ancestral state is either "0" or "1").

Number of different models

Description

partitionsNumber computes the number of different models with a given number of shifts K. It is also the number of colorings of the tips to the tree in npart = K + 1 colors.

Usage

partitionsNumber(phylo, npart)

Arguments

phylo

a phylogenetic tree of class phylo.

npart

the numbers of partitions (colors) allowed at the tips. This is the number of shifts plus one (npart = K + 1).

Value

an object of class partitionsNumber. This is made of a matrix with (Nnode + ntaxa) rows and (2*npart) columns. Each column contains two vectors: for k=1:npart it contains the number of partitions with k groups compatible with the tree and the shift process; and for k=(npart+1):2*npart, it contains the number of "marked" partitions with (k-npart) groups compatible with the tree and the shift process. The actual number can be extracted with function extract.partitionsNumber (see examples below).

Examples

if (requireNamespace("combinat", quietly = TRUE)) {
  npart <- 8 # number of colors at the tips allowed
  tree <- read.tree(text="(A,(A,(A,A,A),A,A));") # a tree with polytomies
  plot(tree)
  parts_num <- partitionsNumber(tree, npart)
  parts_num
  
  ## Number of possible colorings of the tips in npart colors
  extract(parts_num)
  
  ## Get all the solutions for colorings with 1 to nparts colors
  extract(parts_num, npart = 1:npart)
  
  ## Number of possible colorings of the tips in npart colors
  ## For the sub-tree starting at node 17
  extract(parts_num, node = 10)
  
  ## Number of possible colorings of the tips in npart colors
  ## with one marked color
  extract(parts_num, marked = TRUE)
}

Penalty function type Baraud Giraud Huet.

Description

penalty_BaraudGiraudHuet_likelihood is the penalty defined by : pen' = ntaxa * log(1 + pen/(ntaxa - K)) with pen = C * (ntaxa - K)/(ntaxa - K - 1) * EDkhi[K + 1; ntaxa - K - 1; exp(-Delta_K)/(K + 1)] and Delta = log(model_complexity) + log(K + 1) such that sum(exp(-Delta_K)) < infty.

Usage

penalty_BaraudGiraudHuet_likelihood(K, model_complexity, ntaxa, C = 1.1)

Arguments

K

the dimension of the model.

model_complexity

the complexity of the set of models with dimension K.

ntaxa

the number of tips.

C

a constant, C > 1. Default is C = 1.1 (as suggested in Baraud Giraud Huet (2009))

Details

See Baraud Giraud Huet (2009, 2011). Must be applied to log-likelihood criterion. Function pen is computed using function penalty from package LINselect.

Value

value of the penalty.

Penalty function type Birgé-Massart 1

Description

penalty_BirgeMassart_shape1 is the penalty shape defined by : pen_shape = (sqrt(K) + sqrt(2 * K * L_K))^2 with sum(exp(- K * L_K)) < infty : L_K = B + 1/K * log(model_complexity).

Usage

penalty_BirgeMassart_shape1(K, p, model_complexity, B = 0.1)

Arguments

K

the number of shifts

p

the dimension of the data

model_complexity

the complexity of the set of models with dimension K

B

a non-negative constant. Default is 0.1 (as suggested in Cleynen & Lebarbier 2015)

Details

See Birgé Massart (2001). Must be applied to least-square criterion. This penalty should be calibrated using the slope heuristic.

Value

value of the penalty

Penalty function type Birgé-Massart 2

Description

penalty_BirgeMassart_shape2 is the penalty shape defined by : pen_shape = C*K_try + log(model_complexity). It dominates the penalty defined by penalty_BirgeMassart_shape1.

Usage

penalty_BirgeMassart_shape2(K, p, model_complexity, C = 2.5)

Arguments

K

the number of shifts

p

the dimension of the data

model_complexity

the complexity of the set of models with dimension K.

C

a non-negative constant. Default is 2.5 (as suggested in Lebarbier 2005)

Details

See Birgé Massart (2001). Must be applied to least-square criterion. This penalty should be calibrated using the slope heuristic.

Value

value of the penalty.

Penalty function type pBIC

Description

penalty_pBIC_scalarOU is the pBIC.

Usage

penalty_pBIC(
  all_params,
  model_complexity,
  independent,
  tree,
  times_shared,
  distances_phylo,
  T_tree,
  p,
  K,
  ntaxa,
  process
)

Arguments

model_complexity

the complexity of the set of models with dimension K.

K

the dimension of the model.

ntaxa

the number of tips.

Value

value of the penalty.

Plot for class `PhyloEM`

Description

This function takes an object of class PhyloEM, result of function PhyloEM, and plots the result of the inference.

Usage

## S3 method for class 'PhyloEM'
plot(
  x,
  traits = 1:(x$p),
  params = NULL,
  method.selection = NULL,
  automatic_colors = TRUE,
  color_characters = "black",
  color_edges = "black",
  plot_ancestral_states = FALSE,
  name_trait = "Trait Value",
  imposed_scale,
  ancestral_cex = 2,
  ancestral_pch = 19,
  value_in_box = FALSE,
  ancestral_as_shift = FALSE,
  shifts_cex = 0.6,
  shifts_bg = "chocolate4",
  root_bg = "chocolate4",
  shifts_adj = 0,
  root_adj = 1,
  color_shifts_regimes = FALSE,
  regime_boxes = FALSE,
  alpha_border = 70,
  show.tip.label = FALSE,
  label_cex = 0.5,
  label_font = 1,
  label_offset = 0,
  axis_cex = 0.7,
  axis_las = 0,
  show_axis_traits = TRUE,
  edge.width = 1,
  margin_plot = NULL,
  gray_scale = FALSE,
  root.edge = TRUE,
  ...
)

Arguments

x

an object of class PhyloEM, result of function PhyloEM.

traits

a vector of integers giving the numbers of the trait to be plotted. Default to 1:p (all the traits).

params

(optional) some user-specified parameters. Must be of class params_process. If left blank, they are extracted using the method.selection argument (see below).

method.selection

select the parameters to plot. One of "LINselect", "DDSE", "Djump". Default to "LINselect". See params_process.PhyloEM.

automatic_colors

whether to color the edges automatically according to their regimes. Default to TRUE. If FALSE, colors can be manually specified through arguments color_characters and colro_edges (see below).

color_characters

if automatic_colors=FALSE, a vector of colors for the tips of the tree.

color_edges

if automatic_colors=FALSE, a vector of colors for the edges of the tree.

plot_ancestral_states

whether to plot the ancestral traits inferred at the internal nodes of the tree. Only available if only one trait is plotted. Default to FALSE.

name_trait

name of the trait scale bar for the ancestral states plotting. Default to "Trait Value".

imposed_scale

if plot_ancestral_states=TRUE, a vector specifying the imposed scale for the ancestral states plotting. Useful to make comparisons. Default to the plotted trait.

ancestral_cex

if plot_ancestral_states=TRUE, the size of the ancestral states on the tree. Default to 2.

ancestral_pch

if plot_ancestral_states=TRUE, the symbol used of the ancestral states. Default to circles (pch=19).

value_in_box

whether to plot the value of the shift in a box on the edges. Only available when only one trait is plotted. Can be difficult to read on big trees. The size of the text in the boxes is controlled by parameter. Default to FALSE.

ancestral_as_shift

whether to represent the ancestral value at the root as an ancestral shift on the root edge. Default to FALSE. shifts_cex (see below).

shifts_cex

if value_in_box=TRUE, the size of the text in the boxes. Default to 0.8.

shifts_bg

if value_in_box=TRUE, the background color of the boxes.

root_bg

if value_in_box=TRUE and ancestral_as_shift=TRUE, the background color of the ancestral box.

shifts_adj

the adj parameter for the shifts position on the edges. Default to 0 (beginning of the edge).

root_adj

if ancestral_as_shift=TRUE, the adj parameter for the ancestral value position on the root edge. Default to 1.

color_shifts_regimes

whether to color each shift according to its regime (default to the same color of the edge it's on). Default to FALSE.

regime_boxes

whether to draw a box showing all the tips below a given. The transparency of the border of the box is controlled by parameter alpha_border (see below).

alpha_border

if regime_boxes=TRUE, the alpha parameter of the border of the box. Default to 70.

show.tip.label

whether to show the tip labels. Default to FALSE.

label_cex

if show.tip.label=TRUE, the size of the labels. Default to 0.5.

label_font

if show.tip.label=TRUE, the font of the labels (see par).

label_offset

if show.tip.label=TRUE, the size of the offset between the tree and the labels. Default to 0.

axis_cex

cex for the label values of the plot. Default to 0.7.

axis_las

las for the label values of the plot. Default to 0 (see par).

show_axis_traits

control whether the trait values axis is plotted (default to TRUE).

edge.width

width of the edge. Default to 1.

margin_plot

vector giving the margin to around the plot. Default to c(0, 0, 0, 0).

gray_scale

if TRUE, the colors are replaced by a gray scale. Default to FALSE.

root.edge

a logical indicating whether to draw the root edge (defaults to TRUE)

...

further arguments to be passed to plot.phylo.

Plot all the equivalent solutions.

Description

plot.enumerate_parsimony plots a representation of all the equivalent solutions.

Usage

## S3 method for class 'enumerate_parsimony'
plot(x, numbering = FALSE, nbr_col = 3, ...)

Arguments

x

an object of class enumerate_parsimony, result of function enumerate_parsimony

numbering

whether to number the solutions. Default to FALSE.

nbr_col

the number of columns on which to display the plot. Default to 3.

...

further arguments to be passed to plot.phylo or nodelabels.

Details

This function uses functions plot.phylo and nodelabels for the actual plotting of the trees.

Value

A plot of the equivalent shifts allocations.

Plot all the equivalent solutions.

Description

plot.equivalent_shifts plots a representation of all the equivalent shifts allocations, with a representation of the shifts and their values, and a coloration of the branches in term of regimes.

Usage

## S3 method for class 'equivalent_shifts'
plot(
  x,
  trait = 1,
  show_shifts_values = TRUE,
  numbering = FALSE,
  colors_tips = NULL,
  nbr_col = 3,
  gray_scale = FALSE,
  edge.width = 2,
  shifts_cex = 1.2,
  ...
)

Arguments

x

an object of class equivalent_shifts, result of function equivalent_shifts

trait

(integer) the trait to be plotted, if multivariate. Default to 1.

show_shifts_values

whether to show the equivalent shifts values or not. Default to FALSE.

numbering

whether to number the solutions. Default to FALSE.

colors_tips

user-provided colors for the tips of the tree. A vector vector with as many colors as there are tips. Will be automatically computed if not provided.

nbr_col

the number of columns on which to display the plot. Default to 3.

gray_scale

if TRUE, the colors are replaced by a gray scale. Default to FALSE.

edge.width

width of the edge. Default to 1.

shifts_cex

if value_in_box=TRUE, the size of the text in the boxes. Default to 0.8.

...

further arguments to be passed to plot.phylo.

Details

This function uses function plot.phylo for the actual plotting of the trees.

Value

A plot of the equivalent shifts allocations.

Plot for class `simul_process`

Description

This function takes an object of class params_process, and plots them along with some data at the tips of the tree.

Usage

## S3 method for class 'params_process'
plot(
  x,
  phylo,
  data = NULL,
  traits,
  automatic_colors = TRUE,
  color_characters = "black",
  color_edges = "black",
  plot_ancestral_states = FALSE,
  ancestral_states = NULL,
  imposed_scale,
  ancestral_cex = 2,
  ancestral_pch = 19,
  value_in_box = FALSE,
  ancestral_as_shift = FALSE,
  shifts_cex = 0.6,
  shifts_bg = "chocolate4",
  root_bg = "chocolate4",
  shifts_adj = 0,
  root_adj = 1,
  color_shifts_regimes = FALSE,
  regime_boxes = FALSE,
  alpha_border = 70,
  show.tip.label = FALSE,
  label_cex = 0.5,
  label_offset = 0,
  axis_cex = 0.7,
  edge.width = 1,
  margin_plot = NULL,
  gray_scale = FALSE,
  ...
)

Arguments

x

an object of class params_process.

phylo

a phylogenetic tree.

data

a matrix of data at the tips of the tree. Must have p rows and ntaxa columns. If these are simulated, use the extract.simul_process function.

traits

a vector of integers giving the numbers of the trait to be plotted. Default to 1:p (all the traits).

automatic_colors

whether to color the edges automatically according to their regimes. Default to TRUE. If FALSE, colors can be manually specified through arguments color_characters and colro_edges (see below).

color_characters

if automatic_colors=FALSE, a vector of colors for the tips of the tree.

color_edges

if automatic_colors=FALSE, a vector of colors for the edges of the tree.

plot_ancestral_states

whether to plot the ancestral traits inferred at the internal nodes of the tree. Only available if only one trait is plotted. Default to FALSE.

ancestral_states

if plot_ancestral_states=TRUE, the ancestral states must be specified. If these are simulated, use the extract.simul_process function.

imposed_scale

if plot_ancestral_states=TRUE, a vector specifying the imposed scale for the ancestral states plotting. Useful to make comparisons. Default to the plotted trait.

ancestral_cex

if plot_ancestral_states=TRUE, the size of the ancestral states on the tree. Default to 2.

ancestral_pch

if plot_ancestral_states=TRUE, the symbol used of the ancestral states. Default to circles (pch=19).

value_in_box

ancestral_as_shift

whether to represent the ancestral value at the root as an ancestral shift on the root edge. Default to FALSE. shifts_cex (see below).

shifts_cex

if value_in_box=TRUE, the size of the text in the boxes. Default to 0.8.

shifts_bg

if value_in_box=TRUE, the background color of the boxes.

root_bg

if value_in_box=TRUE and ancestral_as_shift=TRUE, the background color of the ancestral box.

shifts_adj

the adj parameter for the shifts position on the edges. Default to 0 (beginning of the edge).

root_adj

if ancestral_as_shift=TRUE, the adj parameter for the ancestral value position on the root edge. Default to 1.

color_shifts_regimes

whether to color each shift according to its regime (default to the same color of the edge it's on). Default to FALSE.

regime_boxes

whether to draw a box showing all the tips below a given. The transparency of the border of the box is controlled by parameter alpha_border (see below).

alpha_border

if regime_boxes=TRUE, the alpha parameter of the border of the box. Default to 70.

show.tip.label

whether to show the tip labels. Default to FALSE.

label_cex

if show.tip.label=TRUE, the size of the labels. Default to 0.5.

label_offset

if show.tip.label=TRUE, the size of the offset between the tree and the labels. Default to 0.

axis_cex

cex for the label values of the plot. Default to 0.7.

edge.width

width of the edge. Default to 1.

margin_plot

vector giving the margin to around the plot. Default to c(0, 0, 0, 0).

gray_scale

if TRUE, the colors are replaced by a gray scale. Default to FALSE.

...

further arguments to be passed to plot.phylo.

Plot Model Selection Criterion

Description

This function takes an object of class PhyloEM, result of function PhyloEM, and plots a model selection criterion.

Usage

plot_criterion(
  res,
  method.selection = NULL,
  add = FALSE,
  select.col = "red",
  ...
)

Arguments

res

an object of class PhyloEM, result of function PhyloEM.

method.selection

select the parameters to plot. One of "LINselect", "DDSE", "Djump" or "likelihood" (for un-penalized likelihood). Default to "LINselect". See params_process.PhyloEM.

add

boolean: should the points be added to a current plot (default to FALSE).

select.col

the color of the point selected by the criterion. Default to "red".

...

further argument to be passed to base plot.

Product of elements of a matrix

Description

prod_index return the product of chosen elements of a matrix.

Usage

prod_index(X, Id)

Arguments

X

a matrix with p rows and K column. Each row contains the number of partition in 1<=k<=K groups for one of the p children of a given node

Id

a vector of length p, result of the function xsimplex.

Details

This function is to be used in sum_simplex to be applied to all the elements given by xsimplex. Performs the product : X[1,Id[1]]*X[2,Id[2]]*...*X[p,Id[p]] if all the elements of Id are positive. Otherwise, just return 0.

Value

double : the result of the product.

exact qr.solve

Description

This is the same function as qr.solve, but it throws an error if an exact fit cannot be found (instead of returning a least square fitted value, which is the default behavior of qr.solve).

Usage

qr_solve_exact(a, b, tol = 1e-07)

Arguments

a

a QR decomposition or a rectangular matrix.

b

a vector or matrix of right-hand sides of equations. incidence.matrix.

tol

the tolerance for detecting linear dependencies in the columns of x. Only used if LAPACK is false and x is real.

Generic recursion down the tree.

Description

recursionDown uses the function updateDown to compute daughters rows of matrix param.

Usage

recursionDown(phy, params, updateDown, ...)

Arguments

phy

Input tree, in cladewise order.

params

Matrix of parameters to update by the recursion

updateDown

Function to be used for the update

...

Arguments to be used by the function updateDown

Details

This function is to be used in other more complex function that need to update a quantity from the root to the tips of a tree. Note that the input tree must be in cladewise order.

Value

Matrix of parameters updated.

Residuals of a fitted object

Description

residuals computes the residuals of some parameters.

Usage

## S3 method for class 'PhyloEM'
residuals(object, ...)

Arguments

object

an object of class params_process or PhyloEM. trait, and each column is a tip. The column names are checked against the tip names of the tree. incidence.matrix.full. Can be specified to avoid extra computations.

...

for a PhyloEM object, further arguments to be passed on to params_process.PhyloEM (to choose which parameters to extract from the results, see documentation of this function).

Value

The log likelihood of the data with the provided parameters on the tree.

Sample equally spaced edges.

Description

sample_edges_intervals samples K shifts, each in one of the K intervals regularly spaced on the height of the tree.

Usage

sample_edges_intervals(tree, K)

Arguments

tree

: input tree

K

: number of edges to be sampled.

Details

In case where the tree is not ultrametric, its "height" is defined as the minimum tip height.

Value

vector of edges

Sample shifts edges in a parsimonious way.

Description

sample_shifts_edges attempts to find K shifts in the tree which allocations are parsimonious. The actual generation of edges is done by function sample_edges_intervals.

Usage

sample_shifts_edges(tree, K, part.list = enumerate_tips_under_edges(tree))

Arguments

tree

: input tree

K

: number of edges to be sampled.

Details

This function uses function enumerate_tips_under_edges to generate a list of tips under each edge, and function check_parsimony to check for parsimony of a given solution, under the assumption of an "infinite site model".

Value

vector of edges

Scale variance and selection strength from a linear transform

Description

Used for process equivalencies on re-scaled trees.

Usage

scale_params(params, f)

Arguments

params

Parameters list

f

Factor of the linear transform. If t' = f * t, the function takes parameters from phylo' back to phylo.

Value

re-scaled parameters

Segmentation in the BM case

Description

segmentation.BM performs the segmentation algorithm described.

Usage

segmentation.BM(nbr_of_shifts, costs0, diff_exp)

Arguments

nbr_of_shifts

Number of shifts on the phylogeny allowed

costs0

Cost of each edge

diff_exp

Difference of expectations

Details

This function takes the largest values of costs0, and make them null, thanks to delta and tau.

Value

List containing : edges_max : array of nbr_of_shifts edges where costs0 is maximal. shifts:list containing the computed tau and delta

Segmentation in the OU special case, using lasso regression

Description

segmentation.OU.specialCase.lasso performs the segmentation using a lasso regression to select for the edges where the shifts are added.

Usage

segmentation.OU.specialCase.lasso(
  phylo,
  nbr_of_shifts,
  D,
  Xp,
  penscale = rep(1, (nrow(phylo$edge) + 1)),
  ...
)

Arguments

phylo

a phylogenetic tree

nbr_of_shifts

Number of shifts on the phylogeny allowed

Details

This function re-write the sum of costs to be minimized as a least squares regression problem, and uses a lasso regression to solve it. It uses functions incidence.matrix.full to express the problem as a linear model.

Value

List containing : beta_0 : the optimal value at the root shifts : list containing the computed tau and delta costs : vector of costs

Segmentation in the OU special case, conserving the same shifts position.

Description

segmentation.OU.specialCase.same_shifts keeps the same shifts position, and optimize the sum of costs using function best_scenario

Usage

segmentation.OU.specialCase.same_shifts(phylo, shifts_old, D, Xp, ...)

Arguments

phylo

a phylogenetic tree

shifts_old

the previous list of shifts (only position is used)

Details

This is the best move if keeping the previous shifts positions.

Value

List containing : beta_0 : the optimal value at the root shifts : list containing the computed tau and delta costs : vector of costs

Compute the matrix of shifts.

Description

shifts.list_to_matrix takes the list description of the shifts to give the matrix representation of the shifts : the b th element of the lth line has the value of the shift on character l occurring on that branch b

Usage

shifts.list_to_matrix(phy, shifts, p = nrow(shifts$values))

Arguments

phy

Input tree.

shifts

list description of the shifts : shifts$edges, shifts$values.

p

number of traits (optional, needed when shifts = NULL).

Value

Matrix p x Nedge of length nbranch.

Compute the vector of shifts.

Description

shifts.list_to_vector takes the list description of the shifts to give the vectorial representation of the shifts : the b th element of the vector has the value of the shift occurring on that branch b.

Usage

shifts.list_to_vector(phy, shifts)

Arguments

phy

Input tree.

shifts

: list description of the shifts : shifts$edges, shifts$values

Value

Vector of length nbranch.

Compute the list of shifts.

Description

shifts.matrix_to_list takes the vectorial description of the shifts to create the list description of the shifts.

Usage

shifts.matrix_to_list(delta)

Arguments

delta

matrix description of the shift.

Value

List describing shifts.

Compute the list of shifts.

Description

shifts.vector_to_list takes the vectorial description of the shifts to create the list description of the shifts.

Usage

shifts.vector_to_list(delta)

Arguments

delta

: vector description of the shift.

Value

Vector of length nbranch.

Simmap format mapping from list of edges

Description

shifts_to_simmap takes a vector of edges where the shifts occur, and return a simmap formatted tree, mapped with corresponding regimes.

Usage

shifts_to_simmap(tree, shifts_edges)

Arguments

tree

input tree in phylo format

shifts_edges

shifts positions on the edges

Details

Ancestral state is always 0, and other states are consecutive integers.

Value

tree a simmap object

Simulate a Stochastic Process on a tree

Description

simulate simulate a stochastic process on a tree.

Usage

simul_process(x, ...)

## S3 method for class 'params_process'
simul_process(
  x,
  phylo,
  simulate_random = TRUE,
  checks = TRUE,
  U_tree = NULL,
  times_shared = NULL,
  ...
)

## S3 method for class 'PhyloEM'
simul_process(
  x,
  simulate_random = TRUE,
  checks = TRUE,
  U_tree = NULL,
  times_shared = NULL,
  ...
)

Arguments

x

an object of class params_process or PhyloEM.

...

for a PhyloEM object, further arguments to be passed on to params_process.PhyloEM (to choose which parameters to extract from the results, see documentation of this function).

phylo

a phylogenetic tree, class phylo.

simulate_random

set to FALSE if only the expected values are needed (and not the random sample). Default to TRUE.

checks

whether to check the entry parameters for consistency. Default to TRUE.

U_tree

optional, full incidence matrix of the tree, result of function incidence.matrix.full. Can be specified to avoid extra computations.

times_shared

optional, times of shared ancestry of all nodes and tips, result of function compute_times_ca. Can be specified to avoid extra computations.

Value

An S3 object of class simul_process. This contains:

sim_traits: an array with dimensions p x Nnode x 2 (BM) or p x Nnode x 3 (OU). For each trait t, 1 <= t <= p, sim_traits[t, , ] has tree columns, containing respectively the simulated state, expected value and optimal value for all the nodes.
phylo: the phylogenetic tree used for the simulations (class phylo).
params: the parameters used for the simulations (class params_proces).

Methods (by class)

simul_process(params_process): params_process object
simul_process(PhyloEM): PhyloEM object

Simulate a Stochastic Process on a tree

Description

simulate_internal simulate a stochastic process on a tree.

Usage

simulate_internal(
  phylo,
  process = c("BM", "OU", "scOU", "OUBM", "StudentOU"),
  p = 1,
  root.state = list(random = FALSE, stationary.root = FALSE, value.root = NA, exp.root =
    NA, var.root = NA),
  shifts = list(edges = NULL, values = NULL, relativeTimes = NULL),
  eps = 10^(-6),
  selection.strength = NULL,
  variance = NULL,
  optimal.value = NULL,
  checks = TRUE,
  simulate_random = TRUE,
  U_tree = NULL,
  times_shared = NULL,
  df = 1
)

Arguments

phylo

a phylogenetic tree, class phylo.

process

The model used for the simulation. One of "BM" (for a full BM model, univariate or multivariate); "OU" (for a full OU model, univariate or multivariate); or "scOU" (for a "scalar OU" model).

p

Dimension of the simulated trait

root.state

List describing the state of the root, with:

random: random state (TRUE) or deterministic state (FALSE)
value.root: if deterministic, value of the character at the root
exp.root: if random, expectation of the character at the root
var.root: if random, variance of the character at the root (pxp matrix)

shifts

List with position and values of the shifts :

edges: vector of the K id of edges where the shifts are
values: matrix p x K of values of the shifts on the edges (one column = one shift)
relativeTimes: vector of dimension K of relative time of the shift from the parent node of edges

eps

Tolerance for the value of the norm 1 of the selection strength matrix for OU

selection.strength

Matrix of selection strength size p x p (OU)

variance

Variance-covariance matrix size p x p

optimal.value

Vector of p optimal values at the root (OU)

checks

whether to check the entry parameters for consistency. Default to TRUE.

simulate_random

set to FALSE if only the expected values are needed (and not the random sample). Default to TRUE.

U_tree

optional, full incidence matrix of the tree, result of function incidence.matrix.full.

times_shared

optional, times of shared ancestry of all nodes and tips, result of function compute_times_ca. Can be specified to avoid extra computations.

df

if the process is "StudentOU", the number of degree of freedom of the chosen student law. default to 1.

Value

paramSimu An array with dimensions p x Nnode x 2 (BM) or p x Nnode x 3 (OU). For each trait t, 1 <= t <= p, paramSimu[t, , ] has tree columns, containing respectively the simulated state, expected value and optimal value for all the nodes.

Split independent parameters into a list of parameters

Description

split_params_independent split a params object for a process with p independent traits into p params objects. The reverse operation is done by merge_params_independent

Usage

split_params_independent(params)

Arguments

params

parameters

Value

A list of p parameters

Sum on all subsets.

Description

sum_partitions returns the sum on all I subset of [1,p], with |I|>m.

Usage

sum_partitions(A, N, K, p, m)

Arguments

A

a matrix with p rows and K column. Each row contains the number of marked partition in 1<=k<=K groups for one of the p children of a given node.

N

a matrix with p rows and K column. Each row contains the number of partition in 1<=k<=K groups for one of the p children of a given node

K

an integer. The number of groups wanted.

p

an integer. The number of daughters of a node.

m

an integer. The minimum cardinal of a subset allowed.

Details

This function applies sum_partitions.cardFixed to all integer between m and p, and sum the results.

Value

double : the result of the sum.

Sum on subsets of a given cardinal.

Description

sum_partitions.cardFixed returns the sum on I subset of [1,p], |I| fixed, of the sums computed by sum_prod.comb.

Usage

sum_partitions.cardFixed(A, N, K, p, cardI)

Arguments

A

a matrix with p rows and K column. Each row contains the number of marked partition in 1<=k<=K groups for one of the p children of a given node.

N

a matrix with p rows and K column. Each row contains the number of partition in 1<=k<=K groups for one of the p children of a given node

K

an integer. The number of groups wanted.

p

an integer. The number of daughters of a node.

cardI

an integer. The cardinal of the subset wanted.

Details

This function uses combn to enumerate all the subsets I of [1,p] of a given cardinal, and then performs the wanted sum on these subsets.

Value

double : the result of the sum.

Sum on a simplex

Description

sum_prod.comb returns the sum on k_1+...+k_p=K+|I|-1, k_i>0 of the products of prod(A[i,k_i], i in I)*prod(N[i,k_i], i not in I).

Usage

sum_prod.comb(I, A, N, K, p)

Arguments

I

a vector of integers representing a subset of [1,p]

A

a matrix with p rows and K column. Each row contains the number of marked partition in 1<=k<=K groups for one of the p children of a given node.

N

a matrix with p rows and K column. Each row contains the number of partition in 1<=k<=K groups for one of the p children of a given node

K

an integer. The number of groups wanted.

p

an integer. The number of daughters of a node.

Details

This function uses sum_simplex to perform the wanted sum on a ad-hoc matrix, combination of rows of A and N.

Value

double : the result of the sum.

Sum on a simplex

Description

sum_simplex returns the sum on k_1+...+k_p=K, k_i>0 of the products of NN[i,k_i].

Usage

sum_simplex(NN, K, p)

Arguments

NN

a matrix with p rows and K column. Each row contains the number of partition in 1<=k<=K groups for one of the p children of a given node

K

an integer. The number of groups wanted.

p

an integer. The number of daughters of a node.

Details

This function uses xsimplex to perform the product of NN[i,k_i] for all combination of k_i such that k_1+...+k_p=K, k_i>0, using function prod_index. Then sum all the products.

Value

double : the result of the sum.

Test state of root.

Description

test.root.state test whether the parameters of root.state given by the user are coherent. If not, it returns a new corrected list to define root.state.

Usage

test.root.state(root.state, process = c("BM", "OU", "scOU", "OUBM"), ...)

Arguments

root.state

A list giving the root state

process

"BM", "OU" or "scOU"

...

parameters of the process (if OU)

Details

To test coherence, the following priorities are applied: random > stationary.root > values.root = exp.root = var.root

Value

Coherent list root.state.

Transform branch length for a re-scaled BM

Description

Re-scale the branch length of the tree so that a BM running on the new tree produces the same observations at the tips than an OU with parameter alpha.

Usage

transform_branch_length(phylo, alp)

Arguments

phylo

A phylogenetic tree of class phylo, with branch lengths.

alp

Value of the selection strength.

Value

phylo The same phylogenetic tree, with transformed branch lengths.

Transform the shift values

Description

transform_shifts_values takes the shifts generating a given expectation structure given an OU with alpha = from, and gives back the equivalent shifts values that produce the same structure with an OU with alpha = to. If from or to is 0, then the process is supposed to be a BM.

Usage

transform_shifts_values(shifts, from = 0, to, phylo)

Arguments

shifts

the shifts on the original process

from

alpha value of the original process. If equals 0, then the original process is taken to be a BM.

to

alpha value of the destination process

phylo

the phylogenetic tree (un-scaled)

Update function for regime allocation.

Description

update.allocate_regimes_from_shifts computes the regime of a daughter node, knowing the regime at the parent node and the vector of shifts positions

Usage

update.allocate_regimes_from_shifts(edgeNbr, ancestral, shifts_edges, ...)

Arguments

edgeNbr

: Number of the edge considered

ancestral

: regime of the parent node

shifts_edges

positions on edges

Details

This function is used in function allocate_regimes_from_shifts and is designed to furnish function recursionDown with the right structure of data.

Value

regime of the daughter node.

Update function for optimal value computation

Description

update.compute_betas_from_shifts computes the optimal value at a daughter node, knowing the optimal value at the parent node and the vector of shifts.

Usage

update.compute_betas_from_shifts(edgeNbr, ancestral, shifts, ...)

Arguments

edgeNbr

: Number of the edge considered

ancestral

: Computed vector for the parental node

shifts

position and values of the shifts

Details

This function is used in function compute_betas_from_shifts and is designed to furnish function recursionDown with the right structure of data.

Value

Updated matrix of size (Nnode + ntaxa)x1.

Actualization of the enumeration.

Description

update.enumerate_parsimony is used in function enumerate_parsimony, and compute the solution for the parent node, given its children.

Usage

update.enumerate_parsimony(
  daughters,
  daughtersParams,
  parent,
  cost,
  clus,
  pos,
  ...
)

Arguments

daughters

vector of daughters nodes.

daughtersParams

list with length(daughters) entries, each entry being a list of k matrices representing the possible allocations starting from daughter.

parent

the parent node.

Details

This function takes a list with L entries corresponding to the children of a node, and compute, for all the regimes, the possible allocations starting with parent node in that regime. It uses functions select.matrices to select the possible states of the children, and matrix_of_possibles to find the possible states.

Value

A list of size nclus, each entry being a matrix representing the possible allocations starting with node parent in state k.

Update function for incidence matrix

Description

update.incidence.matrix updates the matrix initialized in init.incidence.matrix for the computation of the incidence matrix in incidence.matrix.

Usage

update.incidence.matrix(daughtersParams, ...)

Arguments

daughtersParams

: rows of updated matrix corresponding to the daughters of the current node.

Details

A node belongs to all the lineages of its daughters.

Value

Vector of length ntaxa, indicating to which lineages the branch above the current node belongs to.

Update function for incidence matrix

Description

update.incidence.matrix.full updates the matrix initialized in init.incidence.matrix.full for the computation of the incidence matrix in incidence.matrix.full.

Usage

update.incidence.matrix.full(daughtersParams, parent, ...)

Arguments

daughtersParams

: rows of updated matrix corresponding to the daughters of the current node.

Details

A node belongs to all the lineages of its daughters.

Value

Vector of length ntaxa + Nnode, indicating to which lineages the branch above the current node belongs to.

Actualization for parsimonyCost.

Description

update.parsimonyCost compute the line vector of a parent node, given the vectors of its daughters.

Usage

update.parsimonyCost(daughtersParams, ...)

Arguments

daughtersParams

a (ndaughters) x (nclus) matrix with the line vectors of the cost for the daughters node.

Details

This function computes the cost of putting the parent in a state k, as the minimum number of shifts needed to get the given clustering of the trees bellow parental node.

Value

A line vector corresponding to the parent node.

Actualization for parsimonyNumber.

Description

update.parsimonyNumber compute the line vector of a parent node, given the vectors of its daughters.

Usage

update.parsimonyNumber(daughters, daughtersParams, cost, ...)

Arguments

daughters

the identifiers of the daughters nodes.

daughtersParams

a ndaughters x (nclus) matrix with the line vectors of the number of solutions for the daughters nodes.

cost

the (ntaxa + Nnode) x nclus matrix of costs (computed by parsimonyCost).

Details

This function uses function compute_state_filter to find all the admissible states of the daughters, given a starting state for the parent.

Value

A line vector corresponding to the parent node.

Update formula in the general case

Description

update.partitionsNumber.gen apply the actualization formula to get Nk and Ak of a node given its daughters.

Usage

update.partitionsNumber.gen(daughtersParams, ...)

Arguments

daughtersParams

: matrix with 2*npart columns. Each row contains the result of partitionsNumber for the children of a given node

Details

Uses functions sum_partitions and sum_simplex to compute the needed sums.

Value

vector of size 2*npart. For k=1:npart it contains the number of partitions with k groups compatible with the sub tree starting at the current node ; and for k=(npart+1):2*npart, it contains the number of "marked" partitions with (k-npart) groups compatible with the sub tree starting at the current node.

Wrapper for E step in EM

Description

wrapper_E_step is used in the EM algorithm. It calls itself recursively in case of independent parameters.

Usage

wrapper_E_step(
  phylo,
  times_shared,
  distances_phylo,
  process,
  params_old,
  masque_data,
  F_moments,
  independent,
  Y_data_vec_known,
  miss,
  Y_data,
  U_tree,
  compute_E
)

Model Estimation with Detection of Shifts

Description

Usage

Arguments

Details

Value

See Also

Examples

Add several entries, when only one is not NA.

Description

Usage

Arguments

Details

Value

Allocation of regimes to nodes.

Description

Usage

Arguments

Value

Allocation of shifts to edges

Description

Usage

Arguments

Value

Iteration allocation

Description

Usage

Arguments

Value

Check selection strength

Description

Usage

Arguments

Details

Value

Test the format of data entry.

Description

Usage

Arguments

Value

Check dimensions of the parameters

Description

Usage

Arguments

Value

Check Parsimony, assuming no homoplasy

Description

Usage

Arguments

Details

Value

Check whether an allocation of the shifts is parsimonious, in the "infinite site model".

Description

Usage

Arguments

Details

Value

Check range of alpha

Description

Usage

Arguments

Clustering associated to a shift allocation, assuming no homoplasy.

Description

Usage

Arguments

Details

Value

E step

Description

Usage

Arguments

Details

Value

Compute Matrix W of actualization (Ultrametric case)

Description

Usage

Arguments

Details

Value

Computation of the optimal values at nodes and tips.