Meta-analysis often reveals substantial heterogeneity among effect estimates included in the analysis (Senior et al., 2016). While many meta-analysts are interested in ‘overall mean effects’ we need to couch these effects in context by reporting on their variability and work hard to understand what factors drive effect variability and why (Lajeunesse, 2010; Noble et al., 2022). For meta-analyses in comparative physiology, explaining variation in effects should probably be the main goal of every analysis. Sampling variance often explains only a little amount of the total heterogeneity, so it’s important that we think hard, a priori, about what factors are likely to explain effect size variation.
As already indicated, some variation in effects could be explained by nuisance heterogeneity, which may or may not be of interest. Other sources of variability could be methodological (Noble et al., 2022). Do different chemicals that result in oxidative stress induce stronger or weaker responses? Understanding how methodological choices impact upon effect sizes can change how research is done in a field.
While methodological moderators are important biological variables are usually our main interest. We might be interested in understanding how much variation among species exist or how different life-histories, body sizes, ecological niches, feeding habits or thermal strategies (e.g., ectotherms vs endotherms) differ in their response to some treatment or the association between two variables.
Exploring population-level / average changes in effect size magnitude (and direction) as a function of ‘moderators’ (what are typically called predictors or fixed effects in other literature) is a critical aspects of meta-analysis in ecology and evolution. The models used to test how average effects change as some function of a moderator is called meta-regression.
We can extend our multi-level meta-analytic model, which only has an intercept, to include moderators quite easily:
\[ y_{i} = \mu + \sum_{i = 1}^{N_{m}}\beta_{m}x_{m} + s_{j[i]} + spp_{k[i]} + e_{i} + m_{i} \\ m_{i} \sim N(0, v_{i}) \\ s_{j} \sim N(0, \tau^2) \\ s_{k} \sim N(0, \sigma_{k}^2) \\ e_{i} \sim N(0, \sigma_{e}^2) \]
Here, \(\beta_{m}\) is the slope (or contrast) for the m th moderator (where m = 1, …, \(N_{moderator}\) or \(N_{m}\)). We can include different types of continuous and categorical moderators in our model. Some of these moderators might control for nuisance hetereogenity, others might be to understand how different experimental designs affect the magnitude and direction of mean effect size and some may simply be biological in nature – they might compare ectotherms to endotherms, for example.
We’ll turn again to the study by Wu and Seebacher (2022). We can run the same phylogenetic meta-analysis that we did in the Phylogenetic meta-analytic models tutorial, but this time we can include moderators or predictors that we think will explain some variation in effects. From the original study aim, the first thing Wu and Seebacher (2022) were interested in is the relationship of physiology with activity, exploration and dispersal (Zr). In addition to the random effects – es_ID
(effect size identity), study_ID
(unique study identity), species
(unique species identity), and species_OTL
(phylogenetic relatedness), they were also interested in comparing whether the correlation (or Zr) differences depending on whether the behavioural trait of interest was activity, exploration or dispersal. As such, a phylogenetic meta-regression model was fit with a moderator disp_trait
(activity, exploration, or dispersal) to explicitly test whether Zr varied across these effect types.
We’ll start again by loading the raw data and reconstructing the phylogenetic tree, which will be used to control for shared evolutionary history. The other random effects are already in the dataset.
# Load packages
pacman::p_load(rotl, ape, tidyverse, devtools, R.rsp)
# Download the orchaRd package
devtools::install_github("daniel1noble/orchaRd", force = TRUE, build_vignettes = TRUE)
library(orchaRd)
zr_data <- read.csv("https://raw.githubusercontent.com/daniel1noble/meta-workshop/gh-pages/data/ind_disp_raw_data.csv") %>%
dplyr::select(-dispersal_def) %>% # remove irrelevant columns
tibble::rowid_to_column("es_ID") %>% # add effect size ID
dplyr::mutate(species_OTL = stringr::str_replace_all(species_OTL, "_", " ")) # remove underscore for some species with missing underscore. Some inconsistency when curating the database.
# Calculate Zr and sampling variance
zr_data <- metafor::escalc(measure = "ZCOR", ri = corr_coeff, ni = sample_size, data = zr_data, var.names = c("Zr", "V_Zr"))
# Create species list based on OTL names and match with OTL database
species <- sort(unique(as.character(zr_data$species_OTL))) # generate list of species (as character format)
taxa <- rotl::tnrs_match_names(names = species) # match taxonomic names to the OTL
# Retrieving phylogenetic relationships among taxa in the form of a trimmed sub-tree
tree <- rotl::tol_induced_subtree(ott_ids = ott_id(taxa), label_format = "name")
set.seed(1)
tree <- ape::compute.brlen(tree, method = "Grafen", power = 1) # compute branch lengths
tree <- ape::multi2di(tree, random = TRUE) # use a randomization approach to deal with polytomies
# Create correlation matrix for analysis
phylo_cor <- vcv(tree, cor = T)
We can now fit our multi-level meta-regression model including the disp_trait
moderator:
Note: Remember to add an underscore to the species OTL because the tree tip names have underscores and these need to match exactly.
overall_model <- metafor::rma.mv(yi = Zr, V = V_Zr, mod = ~-1 + disp_trait, random = list(~1 |
study_ID, ~1 | es_ID, ~1 | species, ~1 | species_OTL), R = list(species_OTL = phylo_cor),
method = "REML", test = "t", dfs = "contain", data = zr_data %>%
dplyr::mutate(species_OTL = stringr::str_replace_all(species_OTL, " ", "_")))
summary(overall_model)
##
## Multivariate Meta-Analysis Model (k = 272; method: REML)
##
## logLik Deviance AIC BIC AICc
## -159.7703 319.5405 333.5405 358.7035 333.9696
##
## Variance Components:
##
## estim sqrt nlvls fixed factor R
## sigma^2.1 0.0071 0.0842 88 no study_ID no
## sigma^2.2 0.1181 0.3436 272 no es_ID no
## sigma^2.3 0.0164 0.1279 79 no species no
## sigma^2.4 0.0000 0.0000 74 no species_OTL yes
##
## Test for Residual Heterogeneity:
## QE(df = 269) = 2450.9991, p-val < .0001
##
## Test of Moderators (coefficients 1:3):
## F(df1 = 3, df2 = 269) = 3.0408, p-val = 0.0295
##
## Model Results:
##
## estimate se tval df pval ci.lb ci.ub
## disp_traitActivity 0.1129 0.0432 2.6129 269 0.0095 0.0278 0.1980
## disp_traitDispersal 0.1185 0.0796 1.4882 269 0.1379 -0.0383 0.2752
## disp_traitExploration 0.0479 0.0569 0.8421 269 0.4005 -0.0641 0.1599
##
## disp_traitActivity **
## disp_traitDispersal
## disp_traitExploration
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Is there a relationship between physiology with either activity, exploration, and dispersal?
We can see from above that the model estimates a positive relationship between physiology and activity with an overall mean of 0.113 with a 95% confidence interval (95% CI) of 0.028 to 0.198. Exploration and dispersal was also, on average, positively associated with physiology, but exploration had higher level of uncertainty (Exploration = 0.048 [95% CI -0.064, 0.16], Dispersal = 0.118, [95% CI -0.038, 0.275]). Heterogeneity remained high in this analysis based on the Q-test.
You might have noticed that we suppressed the intercept in this model (i.e., -1
). This is very typical in meta-analysis because we are interested in the overall mean effect size in each of the three levels of the moderator. However, you may instead want to explicitly compare whether there are significant differences in Zr between categories. To estimate these contrasts you can add the intercept back into the model and the resulting parameters are now contrasts between the intercept (in this case Activity
) and Dispersal
and Exploration
.
overall_model <- metafor::rma.mv(yi = Zr, V = V_Zr, mod = ~disp_trait, random = list(~1 |
study_ID, ~1 | es_ID, ~1 | species, ~1 | species_OTL), R = list(species_OTL = phylo_cor),
method = "REML", test = "t", dfs = "contain", data = zr_data %>%
dplyr::mutate(species_OTL = stringr::str_replace_all(species_OTL, " ", "_")))
summary(overall_model)
##
## Multivariate Meta-Analysis Model (k = 272; method: REML)
##
## logLik Deviance AIC BIC AICc
## -159.7703 319.5405 333.5405 358.7035 333.9696
##
## Variance Components:
##
## estim sqrt nlvls fixed factor R
## sigma^2.1 0.0071 0.0842 88 no study_ID no
## sigma^2.2 0.1181 0.3436 272 no es_ID no
## sigma^2.3 0.0164 0.1279 79 no species no
## sigma^2.4 0.0000 0.0000 74 no species_OTL yes
##
## Test for Residual Heterogeneity:
## QE(df = 269) = 2450.9991, p-val < .0001
##
## Test of Moderators (coefficients 2:3):
## F(df1 = 2, df2 = 269) = 0.5380, p-val = 0.5846
##
## Model Results:
##
## estimate se tval df pval ci.lb ci.ub
## intrcpt 0.1129 0.0432 2.6129 71 0.0110 0.0267 0.1991
## disp_traitDispersal 0.0056 0.0901 0.0619 269 0.9507 -0.1718 0.1829
## disp_traitExploration -0.0650 0.0658 -0.9885 269 0.3238 -0.1945 0.0645
##
## intrcpt *
## disp_traitDispersal
## disp_traitExploration
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see that the model coefficients have changed. The intrcpt
now takes on the mean Zr for Activity
. In other words, this is the mean Zr for the correlations between physiology and activity. As such, this is the same value as the previous model. For the intrcpt
, the null hypothesis being tested is whether the mean differs significantly from zero, which it does. In contrast, disp_traitDispersal
and disp_traitExploration
are no longer meta-analytic means for Dispersal
and Exploration
as in the previous model. Rather, they are now contrasts, or differences, between the mean Zr for Activity
and the mean Zr for Dispersal
and Exploration
. They are now testing the null hypothesis that the difference between Activity and Dispersal (i.e., disp_traitDispersal
) and Activity and Exploration (i.e., disp_traitExploration
) are different from zero.
Does the mean Zr for Exploration and Dispersal differ from Activity?
We can see that the mean Z-transformed correlation (Zr) between physiology and activity do not differ significantly from Zr between Activity and Dispersal and Activity and Exploration. The model also explains only 0.006% (\(R_{marginal}^2\)) of the variation, so we can be confident that there is little predictive power from this moderator.
We can also visually depict the differences in r between these groups using an orchard plot (Nakagawa et al., 2021) after back-transforming from Zr:
orchaRd::orchard_plot(overall_model, group = "study_ID", mod = "disp_trait", data = zr_data,
xlab = "Correlation Coefficient (r)", transfm = "tanh", angle = 45)
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: R.rsp(v.0.44.0), devtools(v.2.4.3), usethis(v.2.1.5), ape(v.5.6-2), rotl(v.3.0.11), magick(v.2.7.2), vembedr(v.0.1.5), equatags(v.0.1.0), mathjaxr(v.1.6-0), pander(v.0.6.4), orchaRd(v.2.0), forcats(v.0.5.1), stringr(v.1.4.0), dplyr(v.1.0.9), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.0), tibble(v.3.1.7), ggplot2(v.3.3.6), tidyverse(v.1.3.1), flextable(v.0.6.6), metafor(v.3.4-0), metadat(v.1.3-0) and Matrix(v.1.3-4)
loaded via a namespace (and not attached): readxl(v.1.4.0), uuid(v.1.1-0), backports(v.1.4.1), systemfonts(v.1.0.2), splines(v.4.0.3), rncl(v.0.8.4), TH.data(v.1.0-10), digest(v.0.6.29), htmltools(v.0.5.2), fansi(v.1.0.3), magrittr(v.2.0.3), memoise(v.2.0.1), tzdb(v.0.3.0), remotes(v.2.4.2), modelr(v.0.1.8), R.utils(v.2.11.0), xslt(v.1.4.3), officer(v.0.3.18), sandwich(v.3.0-1), prettyunits(v.1.1.1), colorspace(v.2.0-3), rvest(v.1.0.2), rappdirs(v.0.3.3), haven(v.2.5.0), xfun(v.0.31), callr(v.3.7.0), crayon(v.1.5.1), jsonlite(v.1.8.0), survival(v.3.2-11), zoo(v.1.8-10), glue(v.1.6.2), gtable(v.0.3.0), emmeans(v.1.7.5), R.cache(v.0.15.0), pkgbuild(v.1.3.1), rentrez(v.1.2.3), scales(v.1.2.0), mvtnorm(v.1.1-3), DBI(v.1.1.3), Rcpp(v.1.0.8.3), xtable(v.1.8-4), progress(v.1.2.2), klippy(v.0.0.0.9500), latex2exp(v.0.9.4), httr(v.1.4.3), ellipsis(v.0.3.2), pkgconfig(v.2.0.3), XML(v.3.99-0.9), R.methodsS3(v.1.8.1), farver(v.2.1.0), sass(v.0.4.1), dbplyr(v.2.2.1), utf8(v.1.2.2), tidyselect(v.1.1.2), labeling(v.0.4.2), rlang(v.1.0.3), munsell(v.0.5.0), cellranger(v.1.1.0), tools(v.4.0.3), cachem(v.1.0.6), cli(v.3.3.0), generics(v.0.1.2), pacman(v.0.5.1), broom(v.1.0.0), evaluate(v.0.15), fastmap(v.1.1.0), yaml(v.2.3.5), processx(v.3.6.1), knitr(v.1.39), fs(v.1.5.2), zip(v.2.2.0), nlme(v.3.1-152), formatR(v.1.11), R.oo(v.1.24.0), xml2(v.1.3.3), brio(v.1.1.3), compiler(v.4.0.3), rstudioapi(v.0.13), beeswarm(v.0.4.0), curl(v.4.3.2), testthat(v.3.1.4), reprex(v.2.0.1), bslib(v.0.3.1), stringi(v.1.7.6), highr(v.0.9), ps(v.1.7.1), desc(v.1.4.1), gdtools(v.0.2.3), lattice(v.0.20-44), vctrs(v.0.4.1), pillar(v.1.7.0), lifecycle(v.1.0.1), jquerylib(v.0.1.4), locatexec(v.0.1.1), estimability(v.1.3), data.table(v.1.14.2), R6(v.2.5.1), bookdown(v.0.22), vipor(v.0.4.5), sessioninfo(v.1.2.2), codetools(v.0.2-18), MASS(v.7.3-54), assertthat(v.0.2.1), pkgload(v.1.2.4), rprojroot(v.2.0.3), withr(v.2.5.0), multcomp(v.1.4-17), parallel(v.4.0.3), hms(v.1.1.1), grid(v.4.0.3), coda(v.0.19-4), rmarkdown(v.2.14), lubridate(v.1.8.0), base64enc(v.0.1-3) and ggbeeswarm(v.0.6.0)