Introduction to Phylogenetic Meta-analysis

Meta-analyses in comparative physiology almost always synthesise effects across a multitude of species (Chamberlain et al., 2012; Hadfield, 2010; Nakagawa and Santos, 2012). Species share an evolutionary history which is described by a phylogeny (Chamberlain et al., 2012; Gurevitch and Hedges, 1999; Lajeunesse, 2009). As a result, the samples (and the effect sizes obtained from these samples) are not independent which violates the independence assumption underlying conventional meta-analytic models. For example, the standard fixed- and random-effects models often used for ecological meta-analyses (Lajeunesse, 2009; Nakagawa and Santos, 2012), assume independence among the effect sizes and therefore do not account for phylogeny (Chamberlain et al., 2012; Noble et al., 2017). Including a ‘species-level’ random effect will not deal with this type of non-independence because the ‘effects’ are themselves correlated based on the amount of shared evolutionary history between two species (See the Complex Non-independence Tutorial).

Recent simulations by Cinar et al. (2021) emphasise the importance of modelling phylogeny in meta-analytic models. Even when there is little phylogenetic signal in the data it will not compromise the analysis to have a phylogeny included, but it will protect you against Type I errors. In this tutorial, we expand on our multilevel meta-analytic models to account for phylogenetic relatedness. Part of the challenge in controlling for phylogeny in a meta-analysis is that the taxa included are often highly diverse and there is little resolution on the exact phylogenetic relationship among the taxa in question. Having said that, there are tools that will probably do a fairly good job delineating the, showing how we can build a rough phylogeny that can be included.

Formal Definition of Phylogenetic Meta-analytic Model

Recall our multilevel meta-analytic model we previously discussed:

\[ y_{i} = \mu + s_{j[i]} + spp_{k[i]} + e_{i} + m_{i} \\ m_{i} \sim N(0, v_{i}\textbf{I}) \\ s_{j} \sim N(0, \tau^2\textbf{I}) \\ s_{k} \sim N(0, \sigma_{k}^2\textbf{I}) \\ e_{i} \sim N(0, \sigma_{e}^2\textbf{I}) \]

Again, \(y_{i}\) is the ith effect size estimate and \(m_{i}\) is the sampling error (deviation from \(\mu\)) for effect size i. \(e_{i}\) is the effect-size-specific effect (within study effect) (that’s a mouthful!) applied to each effect i. \(s_{j[i]}\) is the study specific effect, j, applied to the ith effect size, where j = 1,…,\(N_{studies}\), \(spp_{k[i]}\) is the species specific effect, k, applied to the ith effect size, where k = 1,…,\(N_{species}\). As we discussed in the previous tutorial on non-independence, \(\textbf{I}\) denotes the fact that effect size estimates are independently and identically distributed.

This model accounts for the fact that effect size values from the same study and species are clustered (i.e. non-independent) because they all share the same ‘effect’ sampled from the same distribution. However, in reality, this doesn’t account for the fact that some species effects are more similar to each other than others because of the shared evolutionary history between species (Hadfield, 2010; Lajeunesse, 2009; Nakagawa and Santos, 2012).

To account for shared evolutionary history between species we need to modify the \(\textbf{I}\) matrix to model the correlation between ‘species effects’. This can be done by estimating a new random phylogenetic effect (\(a_{k}\)), and replacing the \(\textbf{I}\) matrix with an \(\textbf{A}\) matrix as follows:

\[ y_{i} = \mu + s_{j[i]} + spp_{k[i]} + a_{sp[i]}+ e_{i} + m_{i} \\ m_{i} \sim N(0, v_{i}\textbf{I}) \\ s_{j} \sim N(0, \tau^2\textbf{I}) \\ spp_{k} \sim N(0, \sigma_{k}^2\textbf{I}) \\ a_{k} \sim N(0, \sigma_{a}^2\textbf{A}) \\ e_{i} \sim N(0, \sigma_{e}^2\textbf{I}) \]

Here, \(a_{k[i]}\) is a phylogenetic effect for the k species applied to effect size i. These effects are now sampled from a normal distribution (denoted by N) with a mean of 0 and \(\sigma_{a}^2\) which is the phylogenetic variance. However, now \(\textbf{A}\) is a \(N_{species}\) by \(N_{species}\) correlation matrix of distances between species extracted from a phylogenetic tree, which means that species effects will be correlated (off-diagonals will be non-zero) based on the distance from the root of the tree to the most recent common ancestor (Hadfield, 2010; Nakagawa and Santos, 2012; Noble et al., 2017).

You will notice that we have two species-level random effects in our model. We are ‘trying’ (and we may not succeed) to distinguish between species-specific effects driven by shared evolutionary history (i.e., \(a_{k}\)) and non-phylogenetic species-specific effects (i.e., \(spp_{k}\)), driven by, say, shared ecology. It may not always be possible to estimate both effects in a single model. It is worth keeping this in mind as these are data-hungry models and estimating these parameters will require a large sample size. Often, it is necessary to resort to some form of model selection or to use your a priori judgement about what species-specific effect is most important.

Example of Phylogenetic Multi-level Meta-analysis