Citation

This supplement is associated with a special issue review paper and can be cited as follows:

Daniel W.A. Noble, Patrice Pottier, Malgorzata Lagisz, Samantha Burke, Szymon M. Drobniak, Rose E. O’Dea, Shinichi Nakagawa (2022) Meta-analytic approaches and effect sizes to account for ‘nuisance heterogeneity’ in comparative physiology. Journal of Experimental Biology

Introduction

In this supplement, we will demonstrate some of the principles we describe in our review paper. We’ll show readers ways they can apply some of the ‘new’ effect sizes we describe to account for nuisance heterogeneity in combination with multilevel meta-analytic models (MLMA) and/or multilevel meta-regression models. We realise that nuisance heterogeneity may be a misnomer in some senses because such forms of heterogeneity may actually be the main interest of a given meta-analysis; in which case, multilevel meta-regressions are the way to go!

We will overview two different case studies. The first case study will show readers how to account for known drivers of effect heterogeneity in the effect size, followed by how we can fit a MLMA (intercept only) model to estimate an overall meta-analytic mean. We then show readers how to interpret this mean and associated heterogeneity. In the second case study, we use a common effect size metric and instead model the nuisance heterogeneity using a multi-level meta-regression approach. We also focus on showing readers how to interpret effects from this model along with heterogeneity.

Throughout the supplement R code is contained within code blocks. We provide example data associated with this tutorial which readers can find on the Open Science Framework, but the code chunks should automatically download and load these for you. Importantly, all relevant code chunks have a ‘copy and paste’ button in the top right corner so that the code can be easily transferred to your own R console.

Loading Data and R Packages

Before embarking on the case studies, readers will need to load the necessary R packages. Note that the first two lines are commented out. If you don’t have these packages, just install these by removing the #.

# install.packages('pacman') devtools::install_github('itchyshin/orchard_plot',
# subdir = 'orchaRd', force = TRUE)
# devtools::install_github('daniel1noble/metaAidR')
pacman::p_load(devtools, png, jpeg, pander, cowplot, flextable, formatR, magick,
    tidyverse, metafor, orchaRd, here, osfr, magrittr, metaAidR)

Case Study 1: Multilevel Meta-analysis Approaches to Control for Nuisance Heterogeneity at the Effect Size Level

Here, we will demonstrate how to analyse variation and differences in slopes between two groups. In this example, we will account for effects of known drivers of nuisance heterogeneity at the level of the effect size. We use a case study by Pottier et al. (2021) which investigated the association between acclimation temperature, thermal tolerance and sex. We first demonstrate how to use meta-analytic approaches to investigate the variation in the relationship between acclimation temperature and thermal tolerance (i.e. variation in slope), and then demonstrate how to meta-analyse differences in slopes between two groups (i.e. sexes).

Multilevel Meta-analysis of Slopes

We first download the data from OSF and load this into R. You can do this by copying and pasting the code below.

# Download the data file from OSF
osfr::osf_retrieve_file("https://osf.io/qn2af/") %>%
    osfr::osf_download(conflicts = "overwrite")

# Load the data file
data_CS1a <- read.csv(here::here("CTmax_ARR.csv"))

Calculate Effect Size

In this section, we demonstrate how to calculate slope effect sizes and their associated sampling variance. Note that calculating slopes requires the response variable to be measured with the same unit and does not apply to all meta-analyses. While this example focuses on thermal tolerance, the meta-analysis of slope may also be applicable to investigate changes in thermal preference or metabolic scaling, for example.

In this example, we use a simplified data set from Pottier et al. (2021), and we demonstrate how to investigate the mean effect of acclimation temperature on heat tolerance. Because heat tolerance was consistently measured in degrees Celsius (CTmax) in the studies compiled, we can calculate the slope between acclimation temperature and heat tolerance, which is termed as the acclimation response ratio (ARR) in the literature. This effect size is calculated as the difference between the heat tolerance at the higher (mean_high) and the lower (mean_low) acclimation temperatures, divided by the difference between acclimation temperatures (acc_temp_high - acc_temp_low). The sampling variance of the slope (Var_ARR) can then be defined using properties of variance as per Box 1 in the main manuscript.

In this example, the slopes were calculated with stepwise comparisons between acclimation treatments. This method created a source of non-independence, with some treatments being involved in multiple comparisons (shared_trt_ID). To account for this, we divided by half the sample size of the observations in these non-independent comparisons (n_low_adj and n_high_adj) to divide by half the weight of these estimates.

In this example, we will only use data measured on independent groups of animals, thus excluding data measured twice on the same cohort of animals (i.e. animals sharing the same shared_measurement_ID in the data). Note that it is possible to model this dependency and examples are provided in Noble et al. (2017) and Pottier et al. (2021).

# Calculate the effect sizes
data_CS1a <- data_CS1a %>%
    mutate(ARR = (mean_high - mean_low)/(acc_temp_high - acc_temp_low), Var_ARR = ((1/(acc_temp_high -
        acc_temp_low))^2 * (sd_low^2/n_low_adj + sd_high^2/n_high_adj)))

# Remove dependent effect sizes
data_CS1a <- filter(data_CS1a, es_ID != "es85" & es_ID != "es86" & es_ID != "es87" &
    es_ID != "es98" & es_ID != "es282" & es_ID != "es283" & es_ID != "es284" & es_ID !=
    "es286")

Meta-analysis

Then, we can run the multi-level meta-analytic model using the rma.mv function in the metafor package.

In this model, we used two random effects: species_ID and es_ID to account for between species variation and the variation associated with the residuals. Note that it is possible to assess the variation due to phylogenetic relatedness but we will keep it simple for demonstration purposes. We also did not assess variation due to between-study differences because the levels of this variable were almost entirely overlapping with species_ID (i.e. species_ID and study_ID were indistinguishable).

While we only present an intercept-only model, this structure can be used to fit meta-regressions by fitting moderator variables.

# Multi-level meta-analytic model
model_CS1a_MLMA <- metafor::rma.mv(ARR ~ 1, V = Var_ARR, method = "REML", random = list(~1 |
    species_ID, ~1 | es_ID), test = "t", data = data_CS1a)
print(model_CS1a_MLMA)
## 
## Multivariate Meta-Analysis Model (k = 238; method: REML)
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed      factor 
## sigma^2.1  0.0156  0.1248     29     no  species_ID 
## sigma^2.2  0.0078  0.0883    238     no       es_ID 
## 
## Test for Heterogeneity:
## Q(df = 237) = 6719.3999, p-val < .0001
## 
## Model Results:
## 
## estimate      se    tval   df    pval   ci.lb   ci.ub 
##   0.1332  0.0249  5.3465  237  <.0001  0.0841  0.1823  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
B <- coef(model_CS1a_MLMA)
# Calculate Prediction Intervals
PI_CS1a_MLMA <- predict(model_CS1a_MLMA)
print(PI_CS1a_MLMA)
## 
##    pred     se  ci.lb  ci.ub   pi.lb  pi.ub 
##  0.1332 0.0249 0.0841 0.1823 -0.1720 0.4384
# Calculate I2
I2_CS1a <- orchaRd::i2_ml(model_CS1a_MLMA)
print(I2_CS1a)
##      I2_total I2_species_ID      I2_es_ID 
##          0.98          0.65          0.33
# Plot the results
orchard_plot(model_CS1a_MLMA, mod = "Int", xlab = "ARR")

We can see from above that the model estimates an overall meta-analytic mean as 0.13 with a 95% CI from 0.08 to 0.18. In other words, for each degree variation in acclimation temperature, the heat tolerance of the animals included in this data set vary by 0.13 degrees on average. 95% of the time we expect the mean in repeated samples to fall between -0.17 and 0.44.

Our prediction intervals suggest that heterogeneity is high, reflected in our \(I_{total}^2\) estimate of 98.09%. Approximately, 65.39% of the total heterogeneity is the result of differences between species, and 32.7% of the total heterogeneity is associated with the residuals. With this effect size (slope), the nuisance heterogeneity is accounted for, to some extent, at the level of the effect size.

Multi-level Meta-analysis of Slope Differences between Two Groups

We can imagine cases where one might be interested in comparing the slopes of two groups or treatments. In this example, we demonstrate how to derive and analyse effect sizes when investigating differences in slopes. As per the previous example, such analysis requires response variables to be measured with the same unit. We use the same data set as before, but formatted in a slightly different way (i.e. columns were re-organised in a “wide” format to calculate differences between sexes). In this example, rather than investigating overall variation in ARR, we will investigate the difference in ARR between males and females.

# Download the data file from OSF
osfr::osf_retrieve_file("https://osf.io/zubky/") %>%
    osfr::osf_download(conflicts = "overwrite")

# Load the data file
data_CS1b <- read.csv(here::here("CTmax_ARR_sex.csv"))

Calculate Effect Size

First, we need to calculate the ARR of females (ARR_f) and males (ARR_m) with the formula we presented earlier. We can then use the mean difference between ARR_f and ARR_m as an effect size, which we term here as ARRD. The sampling variance was then calculated as per Box 1 in the main manuscript.

As before, we will only use data measured on independent groups of animals, thus excluding data measured twice on the same cohort of animals (i.e. animals sharing the same shared_measurement_ID in the data).

# calculate effect sizes
data_CS1b <- data_CS1b %>%
    mutate(ARR_f = ((mean_high_f - mean_low_f)/(acc_temp_high - acc_temp_low)), ARR_m = ((mean_high_m -
        mean_low_m)/(acc_temp_high - acc_temp_low)), ARRD = ARR_f - ARR_m, Var_ARRD = ((1/(acc_temp_high -
        acc_temp_low))^2 * (sd_low_f^2/n_low_f_adj + sd_high_f^2/n_high_f_adj + sd_low_m^2/n_low_m_adj +
        sd_high_m^2/n_high_m_adj)))

# Remove dependent effect sizes
data_CS1b <- filter(data_CS1b, es_ID != "es85" & es_ID != "es86" & es_ID != "es87" &
    es_ID != "es98")

Meta-analysis

We then fit a multi-level meta-analytic model using a similar structure as demonstrated before. Note that this model can be extended to meta-regressions by fitting moderator variables.

# Multi-level meta-analytic model
model_CS1b_MLMA <- metafor::rma.mv(yi = ARRD ~ 1, V = Var_ARRD, method = "REML",
    random = list(~1 | species_ID, ~1 | es_ID), test = "t", data = data_CS1b)
print(model_CS1b_MLMA)
## 
## Multivariate Meta-Analysis Model (k = 119; method: REML)
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed      factor 
## sigma^2.1  0.0007  0.0269     29     no  species_ID 
## sigma^2.2  0.0003  0.0160    119     no       es_ID 
## 
## Test for Heterogeneity:
## Q(df = 118) = 186.1915, p-val < .0001
## 
## Model Results:
## 
## estimate      se    tval   df    pval    ci.lb   ci.ub 
##   0.0023  0.0086  0.2695  118  0.7880  -0.0147  0.0194    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculate Prediction Intervals
PI_CS1b_MLMA <- predict(model_CS1b_MLMA)
print(PI_CS1b_MLMA)
## 
##    pred     se   ci.lb  ci.ub   pi.lb  pi.ub 
##  0.0023 0.0086 -0.0147 0.0194 -0.0619 0.0665
# Calculate I2
I2_CS1b <- orchaRd::i2_ml(model_CS1b_MLMA)
print(I2_CS1b)
##      I2_total I2_species_ID      I2_es_ID 
##          0.45          0.33          0.12
# Plot the results
orchard_plot(model_CS1b_MLMA, mod = "Int", xlab = "ARR")

We can see from above that the model estimates an overall meta-analytic mean as 0.002 with a 95% CI from -0.01 to 0.02. In other words, for each degree variation in acclimation temperature, the heat tolerance of females vary by an average of 0.002 degrees more than males. Note that the confidence intervals overlap with zero, which indicates that the difference between females and males is small, and not statistically significant. 95% of the time, we expect the mean in repeated samples to fall between -0.06 and 0.07. Our prediction intervals suggest that heterogeneity is small to moderate, reflected in our \(I_{total}^2\) estimate of 44.55%. Approximately, 32.93% of the total heterogeneity is the result of differences between species, and 11.62% of the total heterogeneity is associated with the residuals.

Case Study 2: Multilevel Meta-regression Approaches to Control for Dosages

In the next case study we’ll demonstrate how to use multilevel meta-regression models to correct the overall meta-analytic mean for dosage differences applied across studies. The data in question was collected by Podmokła et al. (2018). The meta-analysis reviewed published studies reporting the outcomes of experiments that manipulated levels of steroid hormones (testosterone and corticosterone) and recorded the impact of such manipulations on offspring traits. The aim was to establish the generality and homogeneity of such experimental effects, and their dependence on several moderator variables, most importantly the type of measured offspring traits, the type of hormone used, and several methodological factors (administration route, hormone vehicle, offspring developmental stage, etc. - for details please consult the original study). The data used here as a subset of the original data, limited to effect sizes recorded in experiments manipulating the hormones in ovo.

Retrieve and Load the Data

We’ll first download the data from OSF and load this into R. You can do this by copying and pasting the code below.

# Download the data file from OSF
osfr::osf_retrieve_file("https://osf.io/j85h3/") %>%
    osfr::osf_download(conflicts = "overwrite")

# Load the data file.
data_CS2_all <- read.csv(here::here("Dosage_EggHormones_CS2.csv"), sep = ";")

Interpreting Meta-analytic Means Controlling for Dosages

The data contains two categorical moderator variables to be included in the model: hormone1 - identifies the type of steroid hormone (androgen vs. corticosterone); trait1 - the type of offspring trait measured (physiology, behaviour, survival, reproduction). Random effects include study / observation ID (ID), species ID (species_lat) and study ID (paper).

The dosage variables are reported separately for each hormone in ng/g yolk (total_dose_T, total_dose_A4 & total_dose_CORT) and in fractions of natural per-yolk concentration (total_dose_p_T, etc.) The latter value is available only for studies that reported natural hormone concentrations (T_yolk, A4_yolk, CORT_yolk).

Below we are considering a subset of data related to testosterone manipulation - this hormone is often mixed with androstendione, which further justifies analysing it separately.

# Centre the dosage data on the mean. Note that, for comparability, we'll
# exclude 'NA' in dose so that models are fit with the exact same data.
data_CS2 <- data_CS2_all %>%
    mutate(z_total_dose_T = scale(total_dose_T, center = TRUE, scale = FALSE), z_total_dose_p_T = scale(total_dose_p_T,
        center = TRUE, scale = FALSE)) %>%
    filter(!is.na(z_total_dose_T))

# Model that ignores 'nuisance heterogeneity' for dosage applied
model_CS2_MLMA_1 <- metafor::rma.mv(d ~ 1, V = var_d, method = "REML", random = list(~1 |
    ID, ~1 | paper, ~1 | species_lat), test = "t", dfs = "contain", data = data_CS2)
print(model_CS2_MLMA_1)

# Calculate Prediction Intervals
pis_MLMA_1 <- predict(model_CS2_MLMA_1)
print(pis_MLMA_1)

# Calculate I2 as well
I2_1 <- orchaRd::i2_ml(model_CS2_MLMA_1)
print(I2_1)

# Plot the results
orchard_plot(model_CS2_MLMA_1, mod = "Int", xlab = "Cohen's d")

We can see from above that the model ‘ignoring’ the nuisance heterogeneity (model_CS2_MLMA_1) (i.e., the variability in effects resulting from dosage differences across studies) estimates an overall meta-analytic mean as 0.07 with a 95% CI from 0 to 0.14. Our prediction intervals suggest that heterogeneity is high, reflected in our \(I_{total}^2\) estimate of 76.07% – 95% of the time we expect the mean in repeated samples to fall between -0.56 and 0.7. Approximately 39.36% of the total heterogeneity is the result of differences between studies. Given that we have not controlled for dosage differences across studies, all we can say is that the average SMD is rather small in the sample of studies that have applied these sets of dosages. In other words, on average, the means from the treatment and control groups differ by 0.07 standard deviation units.

We can now model these dosage effects to control for nuisance heterogeneity and make the overall mean more interpretable. We’ll first start by fitting and interpreting the coefficient just for the raw total dosage applied in a given study. To do this, we can now fit a multilevel meta-regression model as follows:

# Model that re-calibrates the overall meta-analytic mean (Intercept) to
# control for dosage administration
model_CS2_nuisance1 <- metafor::rma.mv(d ~ total_dose_T, V = var_d, method = "REML",
    random = list(~1 | ID, ~1 | paper, ~1 | species_lat), test = "t", dfs = "contain",
    data = data_CS2)
print(model_CS2_nuisance1)
## 
## Multivariate Meta-Analysis Model (k = 628; method: REML)
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed       factor 
## sigma^2.1  0.0412  0.2029    628     no           ID 
## sigma^2.2  0.0424  0.2059     67     no        paper 
## sigma^2.3  0.0000  0.0000     22     no  species_lat 
## 
## Test for Residual Heterogeneity:
## QE(df = 626) = 1414.2897, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 626) = 6.6111, p-val = 0.0104
## 
## Model Results:
## 
##               estimate      se     tval   df    pval    ci.lb    ci.ub 
## intrcpt         0.0822  0.0301   2.7287   20  0.0129   0.0194   0.1450  * 
## total_dose_T   -0.0000  0.0000  -2.5712  626  0.0104  -0.0000  -0.0000  * 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see from our model that controls for nuisance heterogeneity (model_CS2_nuisance1) that we now estimate an overall meta-analytic mean as 0.08 with a 95% CI from 0.02 to 0.15. Importantly, this is the mean effect size when the total dosage applied is 0, which doesn’t make much sense. We can center the total dosage applied to make the mean more ‘meaningful’ (mind the pun).

# Model that re-calibrates the overall meta-analytic mean (Intercept) to
# control for dosage administration but uses centered dosage
model_CS2_nuisance2 <- metafor::rma.mv(d ~ z_total_dose_T, V = var_d, method = "REML",
    random = list(~1 | ID, ~1 | paper, ~1 | species_lat), test = "t", dfs = "contain",
    data = data_CS2)
print(model_CS2_nuisance2)
## 
## Multivariate Meta-Analysis Model (k = 628; method: REML)
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed       factor 
## sigma^2.1  0.0412  0.2029    628     no           ID 
## sigma^2.2  0.0424  0.2059     67     no        paper 
## sigma^2.3  0.0000  0.0000     22     no  species_lat 
## 
## Test for Residual Heterogeneity:
## QE(df = 626) = 1414.2897, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 626) = 6.6111, p-val = 0.0104
## 
## Model Results:
## 
##                 estimate      se     tval   df    pval    ci.lb    ci.ub 
## intrcpt           0.0625  0.0304   2.0571   20  0.0530  -0.0009   0.1259  . 
## z_total_dose_T   -0.0000  0.0000  -2.5712  626  0.0104  -0.0000  -0.0000  * 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculate Prediction Intervals
pis_nuisance2 <- predict(model_CS2_nuisance2, newmods = 0)
print(pis_nuisance2)
## 
##    pred     se   ci.lb  ci.ub   pi.lb  pi.ub 
##  0.0625 0.0304 -0.0009 0.1259 -0.5437 0.6687
# Calculate I2 as well
I2_nuisance2 <- orchaRd::i2_ml(model_CS2_nuisance2)
print(I2_nuisance2)
##       I2_total          I2_ID       I2_paper I2_species_lat 
##        7.5e-01        3.7e-01        3.8e-01        1.7e-08

Now that we have centered the total dosage around its mean (which is 4187.78 ng/ g yolk, our overall meta-analytic mean (now estimated to be 0.06, with a 95% CI from -8.7810^{-4} to 0.13) is interpreted differently. Overall, the meta-analytic mean difference between treatments that differ by an average of 4187.78 ng/ g yolk is 0.06 standard deviation units different. It’s still a small overall effect size, but it has a clearer interpretation now. We can again calculate PI’s and \(I^2\) using the model. Overall, our prediction intervals suggest that heterogeneity is still high even after accounting for dosage reflected also in our \(I_{total}^2\) estimate of 74.55% – 95% of the time we expect the mean in repeated samples to fall between -0.54 to 0.67 when the average dosage difference applied is 4187.78 ng/ g yolk. Notably, the prediction interval after modelling the nuisance heterogeneity shrink.

An alternative formulation of the dosage effect could take into account the fact that absolute dose may not reflect each dose’s biological effect. In other words,the effect will likely depend on the original, natural concentration of the hormone, and how much it is modified by the experiment. The model below model uses this alternative dosage formulation.

# Model that re-calibrates the overall meta-analytic mean (Intercept) to
# control for dosage administration but uses centered dosage (expressed as the
# fraction of natural hormone concentration)
data_CS2 <- data_CS2 %>%
    filter(!is.na(z_total_dose_p_T))
model_CS2_nuisance3 <- metafor::rma.mv(d ~ z_total_dose_p_T, V = var_d, method = "REML",
    random = list(~1 | ID, ~1 | paper, ~1 | species_lat), test = "t", dfs = "contain",
    data = data_CS2)
print(model_CS2_nuisance3)
## 
## Multivariate Meta-Analysis Model (k = 604; method: REML)
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed       factor 
## sigma^2.1  0.0387  0.1968    604     no           ID 
## sigma^2.2  0.0434  0.2084     66     no        paper 
## sigma^2.3  0.0000  0.0000     22     no  species_lat 
## 
## Test for Residual Heterogeneity:
## QE(df = 602) = 1358.0829, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 602) = 12.1690, p-val = 0.0005
## 
## Model Results:
## 
##                   estimate      se     tval   df    pval    ci.lb    ci.ub 
## intrcpt             0.0566  0.0309   1.8329   20  0.0817  -0.0078   0.1209    . 
## z_total_dose_p_T   -0.0008  0.0002  -3.4884  602  0.0005  -0.0012  -0.0003  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Calculate Prediction Intervals
pis_nuisance3 <- predict(model_CS2_nuisance3, newmods = 0)
print(pis_nuisance3)
## 
##    pred     se   ci.lb  ci.ub   pi.lb  pi.ub 
##  0.0566 0.0309 -0.0078 0.1209 -0.5448 0.6579
# Calculate I2 as well
I2_nuisance3 <- orchaRd::i2_ml(model_CS2_nuisance3)
print(I2_nuisance3)
##       I2_total          I2_ID       I2_paper I2_species_lat 
##        7.5e-01        3.5e-01        3.9e-01        1.4e-08

Follow-up steps could include expanding the model to comprise both hormone types (testosterone and corticosterone), modelling their impact using a categorical predictor (hormone1) and also possibly interacting the dosage effect with the type of hormone applied.

data_CS2b <- data_CS2_all %>%
    mutate(z_total_dose = scale(total_dose, center = TRUE, scale = FALSE)) %>%
    filter(!is.na(z_total_dose))
# Model that re-calibrates the overall meta-analytic mean (Intercept) to
# control for dosage administration but uses centered dosage
model_CS2_nuisance4 <- metafor::rma.mv(d ~ hormone1 * z_total_dose, V = var_d, method = "REML",
    random = list(~1 | ID, ~1 | paper, ~1 | species_lat), test = "t", dfs = "contain",
    data = data_CS2b)
print(model_CS2_nuisance4)
## 
## Multivariate Meta-Analysis Model (k = 905; method: REML)
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed       factor 
## sigma^2.1  0.0684  0.2615    905     no           ID 
## sigma^2.2  0.0545  0.2335     89     no        paper 
## sigma^2.3  0.0000  0.0001     22     no  species_lat 
## 
## Test for Residual Heterogeneity:
## QE(df = 901) = 2686.1034, p-val < .0001
## 
## Test of Moderators (coefficients 2:4):
## F(df1 = 3, df2 = 85) = 3.2497, p-val = 0.0258
## 
## Model Results:
## 
##                            estimate      se     tval   df    pval    ci.lb 
## intrcpt                      0.0595  0.0343   1.7373   18  0.0994  -0.0125 
## hormone1CORT                -0.1155  0.0686  -1.6848   85  0.0957  -0.2519 
## z_total_dose                -0.0000  0.0000  -2.3575  901  0.0186  -0.0000 
## hormone1CORT:z_total_dose    0.0000  0.0000   0.6632  901  0.5074  -0.0000 
##                              ci.ub 
## intrcpt                     0.1315  . 
## hormone1CORT                0.0208  . 
## z_total_dose               -0.0000  * 
## hormone1CORT:z_total_dose   0.0000    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Supplementary Material for Box 1

In Box 1 we briefly mention how ARR can be derived for studies manipulating a single variable, temperature. In reality studies often employ a fully factorial design, manipulating two variables at once (e.g., temperture and pH). We can easily derive ARR to accomodate the estimation of the main effects of, and even the interaction between, the two variables. Recall that ARR is defined as follows:

\[ \text{ARR} = \frac{M_{1} - M_{2}}{T_2 - T_1}, \]

If we have a fully factorial study manipulating temperature and pH we can estimate the main effect of temperature across the pH treatments and its sampling error (assuming groups are independent and equal sample sizes) as:

\[ \text{ARR}_{m} = \frac{ \left( M_{pH1,T1} + M_{pH2,T1} \right) - \left( M_{pH1,T2} + M_{pH2,T2} \right)}{2\left( T_2 - T_1 \right)} \] \[ \sigma^2_{ARR_{m}} = \left( \frac{1} {2\left( T_2 - T_1 \right)} \right)^2 \left( \left(\frac{SD^2_{pH1,T1}} {N_{pH1,T1}} + \frac{SD^2_{pH2,T1}} {N_{pH2,T1}}\right)+ \left(\frac{SD^2_{pH1,T2}} {N_{pH1,T2}} + \frac{SD^2_{pH2,T2}} {N_{pH2,T2}}\right) \right) \]

The interaction (i), to be clear, can be calculated as:

\[ \text{ARR}_{m} = \frac{ \left( M_{pH1,T1} - M_{pH2,T1} \right) - \left( M_{pH1,T2} - M_{pH2,T2} \right)}{T_2 - T_1 } \] \[ \sigma^2_{ARR_{m}} = \left( \frac{1} {T_2 - T_1} \right)^2 \left( \frac{SD^2_{pH1,T1}} {N_{pH1,T1}} + \frac{SD^2_{pH1,T2}} {N_{pH1,T2}}+ \frac{SD^2_{pH2,T1}} {N_{pH2,T1}} + \frac{SD^2_{pH2,T2}} {N_{pH2,T2}} \right) \]

We can apply similar logic to other effect size measures of interest.

Supplementary Materials for Survey

Details on Study Selection

Here we provide greater detail on the search strategy, keyword selection and the numbers of final studies included at each step of the screening process for our survey. We searched for studies that included: ‘meta-analysis,’ ‘meta regression’ , ‘comparative analysis,’ ‘comprehensive analysis’ ‘global analysis’ or ‘macro physiology’ in their title or abstract, restricting our search to key comparative physiology journals (detailed in Table S1 and S2 - Supplementary Materials) within the last 6 years. The journals used and their ISSN numbers are provided in Table 2. The full search strings used to refine our search query can be found in Table 1. Initial searches were conducted in Scopus between May 16 and 18, 2021. We updated our survey with an additional search on September 1, 2021 with the addition of new journals (See Table 1 and 2). We expected this to provide a contemporary overview of quantitative syntheses in comparative physiology while making the survey and screening feasible. Note that results from our final searches are the ones we used.

All title, abstract and full text screening was done by two people (DWAN and ML). Any disagreements were resolved by discussion to reach a consensus. Out of the 426 papers originally identified, 80 full texts were screened for eligibility, and we included a total of 63 for final data extraction. We excluded N = 17 meta-/comparative analyses because they: 1) did not include physiological traits (N = 4); 2) were empirical studies that collected their own data and did not collate data from the literature (N = 10); 3) Were not actual quantitative syntheses, but rather systematic reviews (N = 1) and 4) were studies with a focus on biomechanics (N = 2). Figure 1 outlines our PRSIMA flowchart detailing the number of papers hit, their sources and how many records were excluded and included at each screening phase.

We included comparative studies / meta-analyses that were focused on physiological questions. Narrative reviews and comments were not considered because they did not collate quantitative data from the literature. Studies needed to have collected physiological trait data, such as hormone levels, thermal physiological parameters (Critical thermal maximum (CTmax), critical thermal minimum (CTmin), thermal preferences, thermal breadth, lethal thermal tolerance limit (LT50), lethal thresholds), metabolic rates (routine, maximum, and minimum metabolic rate, aerobic scope), enzyme reaction rates and concentrations, oxidative stress measures and consequences (e.g., DNA damage, teleomere attrition), performance (e.g., maximum heart rates, sprint speed, swim speed) and growth rates, body composition (e.g., fatty acid profiles) and immune traits (e.g., cell function, antibodies). Studies were also included if they measured the relationship between physiological traits and body size (e.g., metabolic scaling), reproductive allocation, or survival across taxa. Studies on behavior were only included if the synthesis was explicitly interested in the proximate physiological drivers of behavioral variation. We also included meta-analyses that quantified the change in physiological traits in response to environmental variation (i.e., phenotypic plasticity). While many studies in comparative physiology collect data as part of a comparative analysis, we focus here only on studies collating data from existing literature because this is the major data source in quantitative syntheses. We excluded quantitative syntheses interested in phenology, biomechanics, sexual size dimorphism (i.e., except if the study was interested in sexual differences in physiological traits), those focused on validating physiological methodology, or characterising population genetic structure, gene transcription, and quantitative genetics. There were also meta-analyses that focused on colour evolution which varied in their goals but were most often not focused on physiological questions. As such, these studies were also excluded. We restricted our focus to animal studies because most plant synthesis papers were focused on ecological (e.g., quantified changes in abundance and distribution of species) or agricultural questions (e.g., studies on soil microbial function, nitrogen and phosphorus capture, methane output, and greenhouse gas emissions).

PRISMA flowchart for study selection and inclusion

Figure 1: PRISMA flowchart for study selection and inclusion

Supplementary Survey Results

We collected data on additional questions about studies that maybe of use. Below, we present these results.

Sample of 63 papers

Meta-analysis

Self describing as a meta-analysis?

Answer to the question “Is paper self-describing as containing a meta-analysis (or meta-regression) or actually using meta-analysis/regression approach?”.

dat %>%
    dplyr::group_by(meta_analysis_self_described) %>%
    dplyr::tally() %>%
    ggplot() + tm + geom_col(aes(x = meta_analysis_self_described, y = n), width = 0.5,
    fill = background.colour, colour = text.colour, size = 0.5) + scale_y_continuous(expand = c(0,
    0)) + labs(x = "meta-analysis or meta-regression", y = "number of papers")

Comparative physiology

Answer to the question “Is the study’s main focus on comparative physiology?”

  • yes = focused only on physiological traits very clearly)
  • no = does have physiological traits, but is interested in broader questions involving many different traits
dat %>%
    ggplot() + tm + geom_bar(aes(x = comparative_physiology_main_focus, fill = meta_analysis_self_described,
    y = ..count../sum(..count..)), width = 0.5, colour = text.colour, size = 0.5) +
    scale_y_continuous(expand = c(0, 0)) + scale_fill_viridis_d() + labs(x = "Main focus on comparative physiology?",
    y = "Frequency", fill = "Meta-analysis")

Meta-analytic models

Note that single papers can report more than one approach (e.g., use different effect sizes and different models).

Effect sizes used

dat %>%
    dplyr::group_by(effect_size_type) %>%
    janitor::tabyl(effect_size_type) %>%
    dplyr::summarise(SMD = dplyr::if_else(grepl("SMD", effect_size_type), n, 0),
        lnRR = dplyr::if_else(grepl("Response Ratio", effect_size_type), n, 0), Zr = dplyr::if_else(grepl("correlation",
            effect_size_type), n, 0), Other = dplyr::if_else(grepl("other|Q10", effect_size_type),
            n, 0), Means = dplyr::if_else(grepl("mean/proportion/raw", effect_size_type),
            n, 0), lnCVR = dplyr::if_else(grepl("lnCVR", effect_size_type), n, 0),
        Slope = dplyr::if_else(grepl("slope", effect_size_type), n, 0)) %>%
    colSums(.) %>%
    data.frame(effect = names(.), n = .) %>%
    mutate(prop = n/sum(n)) %>%
    arrange(prop) %>%
    mutate(effect = factor(effect, levels = unique(effect))) %>%
    ggplot() + tm + theme(axis.title.y = element_blank(), axis.text.y = element_text(size = 10)) +
    geom_col(aes(x = effect, y = n), width = 0.5, fill = background.colour, colour = text.colour,
        size = 0.5) + scale_y_continuous(expand = c(0, 0), breaks = seq(1, 20, 2)) +
    labs(x = "Effect Size", y = "Number of Papers") + coord_flip()

Statistical models used

What statistical models were used?

summary <- dat %>%
    dplyr::group_by(stat_model_type) %>%
    janitor::tabyl(stat_model_type) %>%
    dplyr::summarise(`Multilevel Meta-analysis Model` = dplyr::if_else(grepl("multilevel meta-analysis",
        stat_model_type), n, 0), `Multilevel Metaregression Model` = dplyr::if_else(grepl("meta-regression models",
        stat_model_type), n, 0), `Linear Regression` = dplyr::if_else(grepl("linear regression",
        stat_model_type), n, 0), Other = dplyr::if_else(grepl("unclear/other", stat_model_type),
        n, 0), `Vote Counting` = dplyr::if_else(grepl("vote counting", stat_model_type),
        n, 0), `Weighted Regression` = dplyr::if_else(grepl("weighted regression including mixed effects models",
        stat_model_type), n, 0), `Random Effects Model` = dplyr::if_else(grepl("random-effects meta-analysis",
        stat_model_type), n, 0)) %>%
    colSums(.) %>%
    data.frame(effect = names(.), n = .) %>%
    mutate(prop = (n/sum(n))) %>%
    arrange(prop) %>%
    mutate(effect = factor(effect, levels = unique(effect)))
rownames(summary) <- NULL

summary %>%
    ggplot() + tm + theme(axis.title.y = element_blank(), axis.text.y = element_text(size = 10)) +
    geom_col(aes(x = effect, y = prop), width = 0.5, fill = background.colour, colour = text.colour,
        size = 0.5) + scale_y_continuous(expand = c(0, 0), limits = c(0, 0.4)) +
    labs(title = "Analytical Approach", y = "Frequency") + coord_flip()

Weighted models?

Were effect sizes used in the analysis weighted by sampling variance?

dat %>%
    janitor::tabyl(meta_analysis_weighted) %>%
    ggplot() + tm + theme(axis.title.y = element_blank(), axis.text.y = element_text(size = 10)) +
    geom_col(aes(x = meta_analysis_weighted, y = percent), width = 0.5, fill = background.colour,
        colour = text.colour, size = 0.5) + scale_y_continuous(expand = c(0, 0),
    limits = c(0, 1)) + labs(y = "Frequency") + coord_flip()

How do the two categorisations (meta-analysis and weighting) combine?

dat %>%
    ggplot() + tm + geom_bar(aes(x = meta_analysis_self_described, fill = meta_analysis_weighted,
    y = ..count../sum(..count..)), width = 0.5, colour = text.colour, size = 0.5) +
    scale_y_continuous(expand = c(0, 0), limits = c(0, 1)) + scale_fill_viridis_d() +
    labs(x = "Meta-analysis?", y = "Proportion", fill = "Weighted?")

Heterogeneity

Effect size impacted by ‘nuisance heterogeneity’?

Could effect size be impacted by an additional continuous predictor (e.g., temperature in thermal limit studies, dosage in hormone manipulation studies)?

If “yes”, does the study investigate any continuous environmental or methodological factors as moderators?

dat %>%
    mutate(cont_moderator_variable_modelled = if_else(cont_moderator_variable_modelled ==
        "", "N/A", cont_moderator_variable_modelled)) %>%
    ggplot() + tm + geom_bar(aes(x = factor(cont_moderator_variable, levels = c("yes",
    "no", "unclear")), fill = cont_moderator_variable_modelled, y = ..count../sum(..count..)),
    width = 0.5, colour = text.colour, size = 0.5) + scale_y_continuous(expand = c(0,
    0), limits = c(0, 1)) + scale_fill_viridis_d() + labs(x = "Nuisance Heterogeneity?",
    y = "Frequency", fill = "Modeled Nuisance\n Heterogeneity?")

Reported Heterogeneity?

Of the studies that are weighted how many actually reported measures of heterogeneity?

dat %>%
    filter(!heterogeneity_reported == "N/A (e.g. they conducted unweighted meta-analysis)") %>%
    mutate(heterogeneity_reported = if_else(heterogeneity_reported == "unclear/other (add comment)",
        "other", heterogeneity_reported)) %>%
    janitor::tabyl(heterogeneity_reported) %>%
    dplyr::arrange(-n) %>%
    ggplot() + tm + geom_col(aes(x = factor(heterogeneity_reported, levels = heterogeneity_reported),
    y = percent), width = 0.5, fill = background.colour, colour = text.colour, size = 0.5) +
    scale_y_continuous(expand = c(0, 0), limits = c(0, 1)) + labs(x = "Heterogeneity reported?",
    y = "Frequency")

Non-independence and publication bias

Handling sources of non-independence

Did the analysis account for - or mention - additional non-independence arising from, e.g., phylogenetic relatedness or shared control treatments? The actual way of dealing with non-independence (n-i) will vary from study to study.

dat %>%
    janitor::tabyl(nonindependence_mentioned) %>%
    dplyr::arrange(n) %>%
    ggplot() + tm + theme(axis.title.y = element_blank(), axis.text.y = element_text(size = 10)) +
    geom_col(aes(x = factor(nonindependence_mentioned, levels = nonindependence_mentioned),
        y = percent), width = 0.5, fill = background.colour, colour = text.colour,
        size = 0.5) + scale_y_continuous(expand = c(0, 0), limits = c(0, 1)) + scale_x_discrete(labels = c("n-i dealt with w/o modelling",
    "n-i modeled", "unclear/other/not mentioned")) + labs(title = "Non-independence handling",
    y = "Frequency") + coord_flip()

Publication bias

What type of publication bias analysis was used in the analysis? Multiple metrics can be used per paper, hence frequencies won’t sum to 100%.

dat %>%
    dplyr::group_by(pub_bias_type) %>%
    janitor::tabyl(pub_bias_type) %>%
    dplyr::summarise(GraphicalTest = dplyr::if_else(grepl("graphical tests", pub_bias_type),
        n, 0), RegressionTest = dplyr::if_else(grepl("regression test", pub_bias_type),
        n, 0), TrimAndFill = dplyr::if_else(grepl("trim-and-fill", pub_bias_type),
        n, 0), TimeLag = dplyr::if_else(grepl("time-lag", pub_bias_type), n, 0),
        OtherUnclear = dplyr::if_else(grepl("unclear", pub_bias_type), n, 0), None = dplyr::if_else(grepl("no",
            pub_bias_type), n, 0)) %>%
    colSums(.) %>%
    data.frame(effect = names(.), n = .) %>%
    mutate(prop = (n/sum(n) * 100)) %>%
    arrange(n) %>%
    mutate(effect = factor(effect, levels = unique(effect))) %>%
    ggplot() + tm + theme(axis.title.y = element_blank(), axis.text.y = element_text(size = 10)) +
    geom_col(aes(x = effect, y = n), width = 0.5, fill = background.colour, colour = text.colour,
        size = 0.5) + scale_y_continuous(expand = c(0, 0), breaks = seq(1, 30, 2)) +
    labs(x = "Effect Size", y = "Number of Papers") + coord_flip()

Other plots

plot1 <- dat %>% 
  ggplot() + tm2 + theme(legend.position = "top") +
  geom_bar(aes(x = comparative_physiology_main_focus,
               fill = cont_moderator_variable,
               y = ..count../sum(..count..)), width = 0.5,
           colour = NA, size = 0.5) +
  scale_y_continuous(expand=c(0,0)) +
  scale_fill_brewer(palette = "Set2") +
  labs(x = "Main focus on comparative physiology",
       y = "Frequency",
       fill = "Continuous predictor present")
plot1

plot2 <- dat %>% 
  ggplot() + tm2 + theme(legend.position = "top") +
  geom_bar(aes(x = pub_year,
               fill = meta_analysis_self_described,
               y = ..count../sum(..count..)), width = 0.5,
           colour = NA, size = 0.5) +
  scale_y_continuous(expand=c(0,0)) +
  scale_fill_brewer(palette = "Set2") +
  labs(x = "Year of publication",
       y = "Frequency",
       fill = "Meta-analysis")
plot2

plot3 <- dat %>%
  ggplot(mapping = aes(x = comparative_physiology_main_focus,
               y = meta_analysis_self_described)) + tm2 +
  theme(panel.border = element_rect(size = 0.5, fill = NA),
        panel.grid.major = element_line(colour = "grey92"),
        axis.text.y = element_text(angle = 90)) +
  geom_count(mapping = aes(size = ..n.., colour = "A")) +
  scale_size_continuous(range = c(20,40), guide = F) +
  scale_colour_brewer(palette = "Set2", guide = F) +
  # scale_y_continuous(expand=c(0,0)) +
  # scale_fill_viridis_d() +
  labs(x = "Main focus on comparative physiology?",
       y = "Self described as a meta-analysis?")
plot3

plot4 <- dat %>%
  ggplot(aes(axis1 = meta_analysis_self_described,
             axis2 = comparative_physiology_main_focus,
             axis3 = cont_moderator_variable)) +
  ggalluvial::geom_alluvium(aes(fill = comparative_physiology_main_focus), width = 1/4, alpha = 0.8) +
  tm2 + theme(axis.line = element_blank(), axis.ticks = element_blank()) +
  theme(legend.position = "none", panel.grid = element_blank(), axis.text.y = element_blank()) +
  scale_fill_brewer(palette = "Set2") +
  ggalluvial::geom_stratum(fill = 'white', colour = 'black', width = 1/4) +
  geom_text(stat = 'stratum',
            aes(label = after_stat(stratum)),
            # label = c('weighted', 'unweighted', 'unclear'),
            colour = 'black') +
  scale_x_discrete(expand = c(.1, .1), limits = c("Meta-analysis", "Comparative physiol.", "Cont. moderator"))
plot4

panels <- ggarrange(ggarrange(plot1, plot2, nrow = 1),
                    ggarrange(plot3, plot4, nrow = 1, widths = c(1,2)),
                    nrow = 2, ncol = 1)
ggsave(panels, filename = "multipanel.pdf", device = "pdf")

Session Information

R version 4.0.5 (2021-03-31)

Platform: x86_64-apple-darwin17.0 (64-bit)

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: ggpubr(v.0.4.0), ggalluvial(v.0.12.3), janitor(v.2.1.0), metaAidR(v.0.0.0.9000), magrittr(v.2.0.1), osfr(v.0.2.8), here(v.1.0.1), orchaRd(v.0.0.0.9000), metafor(v.3.1-8), Matrix(v.1.3-4), forcats(v.0.5.1), stringr(v.1.4.0), dplyr(v.1.0.7), purrr(v.0.3.4), readr(v.1.4.0), tidyr(v.1.1.3), tibble(v.3.1.2), ggplot2(v.3.3.5), tidyverse(v.1.3.1), magick(v.2.7.2), formatR(v.1.11), flextable(v.0.6.6), cowplot(v.1.1.1), pander(v.0.6.4), jpeg(v.0.1-8.1), png(v.0.1-7), devtools(v.2.4.2) and usethis(v.2.0.1)

loaded via a namespace (and not attached): colorspace(v.2.0-2), ggsignif(v.0.6.2), ellipsis(v.0.3.2), rio(v.0.5.27), rprojroot(v.2.0.2), snakecase(v.0.11.0), base64enc(v.0.1-3), fs(v.1.5.0), rstudioapi(v.0.13), httpcode(v.0.3.0), farver(v.2.1.0), remotes(v.2.4.0), fansi(v.0.5.0), lubridate(v.1.7.10), mathjaxr(v.1.4-0), xml2(v.1.3.2), codetools(v.0.2-18), cachem(v.1.0.5), knitr(v.1.33), pkgload(v.1.2.1), jsonlite(v.1.7.2), broom(v.0.7.8), dbplyr(v.2.1.1), compiler(v.4.0.5), httr(v.1.4.2), backports(v.1.2.1), assertthat(v.0.2.1), fastmap(v.1.1.0), cli(v.3.0.0), htmltools(v.0.5.1.1), prettyunits(v.1.1.1), tools(v.4.0.5), gtable(v.0.3.0), glue(v.1.4.2), rappdirs(v.0.3.3), Rcpp(v.1.0.7), carData(v.3.0-4), cellranger(v.1.1.0), jquerylib(v.0.1.4), vctrs(v.0.3.8), crul(v.1.1.0), nlme(v.3.1-152), xfun(v.0.24), ps(v.1.6.0), openxlsx(v.4.2.4), testthat(v.3.0.4), rvest(v.1.0.0), lifecycle(v.1.0.0), rstatix(v.0.7.0), scales(v.1.1.1), hms(v.1.1.0), RColorBrewer(v.1.1-2), yaml(v.2.2.1), curl(v.4.3.2), memoise(v.2.0.0), gdtools(v.0.2.3), sass(v.0.4.0), stringi(v.1.6.2), highr(v.0.9), equatags(v.0.1.0), desc(v.1.3.0), pkgbuild(v.1.2.0), zip(v.2.2.0), rlang(v.0.4.11), pkgconfig(v.2.0.3), systemfonts(v.1.0.2), evaluate(v.0.14), lattice(v.0.20-44), labeling(v.0.4.2), processx(v.3.5.2), tidyselect(v.1.1.1), bookdown(v.0.22), R6(v.2.5.0), generics(v.0.1.0), DBI(v.1.1.1), pillar(v.1.6.1), haven(v.2.4.1), foreign(v.0.8-81), withr(v.2.4.2), abind(v.1.4-5), modelr(v.0.1.8), crayon(v.1.4.1), car(v.3.0-11), locatexec(v.0.1.1), uuid(v.0.1-4), utf8(v.1.2.1), rmarkdown(v.2.9), officer(v.0.3.18), grid(v.4.0.5), readxl(v.1.3.1), data.table(v.1.14.0), callr(v.3.7.0), xslt(v.1.4.3), reprex(v.2.0.0), digest(v.0.6.27), munsell(v.0.5.0), viridisLite(v.0.4.0), bslib(v.0.2.5.1) and sessioninfo(v.1.1.1)

References

Podmokła, E., Drobniak, S. and Rutkowska, J. (2018). Chicken or egg? Outcomes of experimental manipulations of maternally transmitted hormones depend on administration method – a meta-analysis. Biological Reviews 93, 1499–1517.

Pottier, P., Burke, S., Drobniak, S. M., Lagisz, M. and Nakagawa, S. (2021). Sexual (in) equality? A meta‐analysis of sex differences in thermal acclimation capacity across ectotherms. Functional Ecology in press, https://doi.org/10.1111/1365–2435.13899.