Meta-analysis quantitatively aggregates effect size data collected from existing research. The effect size one uses for a study question needs to be chosen carefully. Each make certain assumptions about the underlying data. It’s important to be weary of these assumptions and identify when something might have gone wrong.
The effect size chosen must satisfy two important features. First, the effect size must be comparable across studies. This means that they must be placed on the same scale so that they can be aggregated. Some effect size estimates can be inter-converted to each other (e.g., Zr, Hedges’ g), but this isn’t the case for all of them. Second, we must be able to derive or approximate the sampling variance for a given effect size. As already indicated, the sampling variance is very important if one wishes to use meta-analytic models that account for sampling variance and is needed to report certain measures of heterogeneity.
Meta-analyses in comparative physiology, and indeed, ecology and evolution more generally, often use highly heterogeneous data (Noble et al., 2022; Senior et al., 2016). This poses other challenges in aggregating effect size data that make their comparability questionable. We’ll discuss some of these issues in later tutorials and identify some ways in which some of this heterogeneity can be dealt with when deriving the effect size itself.
By the end of this tutorial, we hope that you will feel comfortable:
What is an effect size? The answer to this question depends on what kind of study one wishes to synthesise. For example, an effect size could be a contrast between experimental treatments. We might extract mean differences from experiments manipulating, say, bisphenol-A (BPA) and looking at the effect of BPA on aquatic ectotherm phenotype relative to some control (Wu and Seebacher, 2020). Alternatively, the effect of interest might be the magnitude and direction of a correlation between two observed variables, such as metabolism and behaviour (Holtmann et al., 2016). At times, we might even just be interested in analysing the mean or variance itself.
“An effect size is a statistical parameter that can be used to compare, on the same scale, the results of different studies in which a common effect of interest has been measured” (Rosenberg et al., 2013)
There are many types of effect sizes that are commonplace in ecological and evolutionary research (Table 1 describes many of the more common ones from Noble et al., 2022).
Effect Measure | Definition | Sampling Variance | Examples |
Mean |
|
| CTmin, CTmax, LC50, LT50, Metabolic Rate (MR) |
Log Standard deviation, lnSD |
|
| Variability in CTmin, CTmax, Metabolic Rate (MR) |
Log Response Ratio, lnRR |
|
| Ratio between pollutant exposed treatment (e.g., BPA-exposed group) and control (no pollutant) |
Standardised Mean Difference, SMDa |
|
| Difference in immune response between males and females, performance difference in the presence of stressor compared to absence of stressor |
Zr (Fisher Transformation of |
| Relationship between sex hormones and immune responses or metabolic rate and behaviour | |
a ; a |
We’ll use metafor
to demonstrate how to calculate commonly used effect size estimates in the field of comparative physiology, including Z-transformed correlation coefficients (Zr), log response ratios and Hedge’s g.
To start, well need to load some packages and data. The data is all stored on GitHub. It’s downloaded directly from the web. You won’t need to have these on hand. We’ll first load the packages that we need.
# Load packages install.packages('pacman') # If you haven't already installed
# the pacman package do so with this code
pacman::p_load(metafor, flextable, tidyverse, orchaRd, pander, mathjaxr, equatags)
# To use mathjaxr you need to run equatags::mathjax_install()
Background: Physiology is expected to drive movement, including short-term activity, exploration of unfamiliar environments, and larger scale dispersal which can influence species distributions in an environmentally sensitive manner. To test whether there is a relationship between physiology and movement pattern (separated as activity, exploration, and dispersal), we extracted data from the literature on common physiological traits measured (e.g., metabolic rate, hormone levels, body condition) with either activity, exploration, and dispersal at the individual level. As we are interested in relationships at the individual level, we extracted correlation data in the form of correlation coefficient and sample size.
For complete detail on the study aim and design, please see Wu and Seebacher (2022)
Download the data. The following code illustrates how to calculate the Fisher z-transformed effect size (Zr) and sampling variance (V_Zr) from correlation coefficients (ri) and sample size (ni) using 10 rows of data from the main dataset.
# Download and clean data
zr_data <- read.csv("https://raw.githubusercontent.com/daniel1noble/meta-workshop/gh-pages/data/ind_disp_raw_data.csv") %>%
dplyr::select(study_ID, taxa, species, trait, response, response_unit, disp_trait, disp_unit, corr_coeff, sample_size) %>% # remove irrelevant columns for this tutorial
dplyr::top_n(10) # select first 10 effect sizes to illustrate escalc function
# Calculate Fisher's r-to-z transformed correlation coefficient (ZCOR) as yi = effect size and vi = sampling variances, where ri = raw correlation coefficients, and ni = sample size.
zr_data <- metafor::escalc(measure = "ZCOR", ri = corr_coeff, ni = sample_size, data = zr_data, var.names = c("Zr", "V_Zr"))
zr_data
##
## study_ID taxa species trait response
## 1 5 Mammalia Tamias_striatus Metabolism Resting MR
## 2 13 Aves Parus_major Metabolism Basal MR
## 3 13 Aves Parus_major Metabolism Basal MR
## 4 45 Mammalia Syncerus_caffer Condition Body condition
## 5 45 Mammalia Syncerus_caffer Condition Body condition
## 6 55 Aves Phoenicopterus_ruber Condition Body condition
## 7 87 Aves Cyanistes caeruleus Immunity WBC
## 8 87 Aves Cyanistes caeruleus Cardiovascular Haematocrit
## 9 87 Aves Cyanistes caeruleus Immunity WBC
## 10 87 Aves Cyanistes caeruleus Cardiovascular Haematocrit
## response_unit disp_trait disp_unit corr_coeff sample_size Zr V_Zr
## 1 residual mW Exploration residual -0.1430 296 -0.1440 0.0034
## 2 mass-corrected W Exploration 0.0555 345 0.0555 0.0029
## 3 mass-corrected W Exploration -0.1397 335 -0.1406 0.0030
## 4 index Dispersal % -0.4000 415 -0.4236 0.0024
## 5 index Dispersal % -0.5300 508 -0.5901 0.0020
## 6 Dispersal 0.8730 462 1.3456 0.0022
## 7 n Activity n 0.0650 485 0.0651 0.0021
## 8 % Activity n 0.0150 485 0.0150 0.0021
## 9 n Activity n 0.1700 485 0.1717 0.0021
## 10 % Activity n -0.1300 485 -0.1307 0.0021
We want to convert to Zr instead of using the correlation (r) itself because Zr will satisfy assumptions of normality (see below) that are inherent to meta-analytic models. We can, however, convert back to the original correlation coefficient quite easily as follows:
# We can easily convert back to r as follows
zr_data$r <- tanh(zr_data$Zr)
zr_data %>%
select(corr_coeff, r)
##
## corr_coeff r
## 1 -0.1430 -0.1430
## 2 0.0555 0.0555
## 3 -0.1397 -0.1397
## 4 -0.4000 -0.4000
## 5 -0.5300 -0.5300
## 6 0.8730 0.8730
## 7 0.0650 0.0650
## 8 0.0150 0.0150
## 9 0.1700 0.1700
## 10 -0.1300 -0.1300
The correlation coefficient itself is a unit less measure that captures both the strength and direction of relationship between two variables. The raw correlation coefficient ranges from -1 to 1. If a correlation is positive it indicates that, as the value of one variable increases so to does the value of the second variable. The magnitude of the value indicates how strong they co-vary. For example, a correlation of 1 indicates that variable 1 perfectly co-varies with variable 2.
Background: If movement relied on physiological capacities, it may be expected that the variation in physiological characteristics introduces differences in the tendency to move among individuals of the same populations. Such individual differences can have consequences for gene flow and genetic diversification within a species. Individuals with a greater tendency to move may have particular physiological characteristics, leading to core-edge differences. Populations at the range edge is therefore hypothesised to have different physiological phenotype than populations from the core range.
To compare differences in the physiological characteristics between individuals from an established range core to their expanding range edge we collected data from studies that reported physiological response from the range core (core) and range edge (front) in the form of mean, sample size, and variance (SD, SE, or 95% CI). This information is used to calculate the log response ratio (lnRR).
For complete detail on the study aim and design, please see Wu and Seebacher (2022)
Download the data. The following code illustrates how to calculate the log response ratio (yi
) and sampling variance (vi) from the mean (mi
), sample size (ni
) and standard deviation (sdi
) from the edge population (1), and core population (2) using 10 row of data from the main dataset.
# Download and clean data
ratio_data <- read.csv("https://raw.githubusercontent.com/daniel1noble/meta-workshop/gh-pages/data/pop_disp_raw_data.csv") %>%
dplyr::select(study_ID, taxa, species, trait, response, response_unit, mean_core, sd_core, n_core, mean_front, sd_front, n_front) %>% # remove irrelevant columns for this tutorial
dplyr::top_n(10) # select first 10 effect sizes to illustrate escalc function
# Calculate log response ratio (lnRR) as yi = effect size and vi = sampling variances, where m1i = mean of edge population, n1i = sample size of edge population, sd1i = standard deviation of edge population, m2i = mean of core population, n2i = sample size of core population, sd2i = standard deviation of core population.
ratio_data <- metafor::escalc(measure = "ROM",
m1i = mean_front, n1i = n_front, sd1i = sd_front,
m2i = mean_core, n2i = n_core, sd2i = sd_core,
data = ratio_data)
ratio_data
##
## study_ID taxa species trait
## 1 9 Osteichthyes Neogobius_melanostomus Activity
## 2 24 Amphibia Xenopus_laevis Activity
## 3 34 Invertebrate Harmonia_axyridis Locomotor capacity
## 4 34 Invertebrate Harmonia_axyridis Locomotor capacity
## 5 37 Osteichthyes Neogobius_melanostomus Condition
## 6 37 Osteichthyes Neogobius_melanostomus Condition
## 7 38 Osteichthyes Neogobius_melanostomus Condition
## 8 38 Osteichthyes Neogobius_melanostomus Condition
## 9 38 Osteichthyes Neogobius_melanostomus Condition
## 10 39 Invertebrate Pacifastacus leniusculus Condition
## 11 39 Invertebrate Pacifastacus leniusculus Condition
## response response_unit mean_core sd_core n_core mean_front sd_front
## 1 Activity 0.826 0.413 98 0.953 0.581
## 2 Activity 0.433 0.028 116 0.512 0.027
## 3 Endurance m 2491.324 619.795 94 2528.995 603.984
## 4 Endurance m 2491.324 619.795 94 2850.913 749.387
## 5 Body condition 1.380 0.180 2757 1.440 0.180
## 6 Body condition 1.360 0.170 2478 1.410 0.160
## 7 Body condition 1.390 0.130 228 1.460 0.200
## 8 Body condition 1.370 0.160 200 1.430 0.190
## 9 Body condition 1.220 0.590 86 1.190 0.420
## 10 Body condition 0.032 0.003 188 0.035 0.003
## 11 Body condition 0.029 0.002 251 0.032 0.002
## n_front yi vi
## 1 161 0.1420 0.0049
## 2 117 0.1661 0.0001
## 3 96 0.0150 0.0013
## 4 96 0.1348 0.0014
## 5 306 0.0426 0.0001
## 6 215 0.0361 0.0001
## 7 238 0.0491 0.0001
## 8 137 0.0429 0.0002
## 9 130 -0.0249 0.0037
## 10 188 0.0896 0.0001
## 11 251 0.0984 0.0000
Now that we have calculated the log response ratio lets just make sure we understand what is being done. We’ll take this subset of data and do some back calculations to make sure we understand how to inter convert. To demonstrate, we’ll just convert (and back convert) the first row of data.
Using the first row of the data, calculate the log response ratio for this observation by hand. What does the direction of the effect size tell you about the mean at the marginal compared to core?
The effect size is positive indicating that the mean of the marginal (
mean_front
) is larger than the mean of the core poulaton (mean_core
). We can see that by calculating the effect size in two ways, the second of which is easier to see why.
### Let's back calculate effects to make sure we understand why they are
### interpreted as percentage differences
# First, lets make sure we understand how it's calculated
with(ratio_data, log(mean_front[1]/mean_core[1]))
## [1] 0.142
# Alternatively....
with(ratio_data, log(mean_front[1]) - log(mean_core[1]))
## [1] 0.142
What does the value of the effect size tell you about how much the mean differences between marginal and core populations?
What this tells us is that the numerator is 1.15 times the denominator, or, that the marginal (front) mean is 15% larger compared to the core mean. We can see that this is true as follows:
# Now lets back-transform to odds.
with(ratio_data, exp(log(mean_front[1]) - log(mean_core[1])))
## [1] 1.15
# Interpretation: What this tells us is that the numerator is 1.15 times the
# denominator, or, that the marginal (front) mean is 15% larger compared to the
# core mean. We can see that this is true as follows:
with(ratio_data, exp(log(mean_front[1]) - log(mean_core[1])) * mean_core[1])
## [1] 0.953
# This value should now match the marginal mean value, which it does
with(ratio_data, mean_front[1])
## [1] 0.953
A simple way to remember how to interpret this is to think that a converted lnRR = 1 means the two are the same. If you are ’subtracting` the marginal population from the core population and the effect size is positive, than 1.15 indicates that the marginal population is ~15% larger, whereas if the effect size is -1.15 than it indicates that the core population is 15% larger.
Hopefully this is useful to interpret the meaning of the effect size data at the level of individual observations. The same principles will apply (for the most part) when converting back to the original scale from the mean estimates from meta-analytic models.
We could also use Wu and Seebacher (2022)’s data to calculate Hedge’s g. Hedge’s g is very similar to Cohen’s d, but it has a small sample correction. This can also be done with metafor::escalc()
by setting measure to “SMD”. We can use the same data here as all the summary statistics that are needed for calculating lnRR are the same as Hedge’s g. We can calculate as follows.
# Calculate hedges g as yi = effect size and vi = sampling variances, where m1i
# = mean of edge population, n1i = sample size of edge population, sd1i =
# standard deviation of edge population, m2i = mean of core population, n2i =
# sample size of core population, sd2i = standard deviation of core population.
hedge_data <- metafor::escalc(measure = "SMD", m1i = mean_front, n1i = n_front, sd1i = sd_front,
m2i = mean_core, n2i = n_core, sd2i = sd_core, data = ratio_data, var.names = c("d",
"d_vi"))
hedge_data
##
## study_ID taxa species trait
## 1 9 Osteichthyes Neogobius_melanostomus Activity
## 2 24 Amphibia Xenopus_laevis Activity
## 3 34 Invertebrate Harmonia_axyridis Locomotor capacity
## 4 34 Invertebrate Harmonia_axyridis Locomotor capacity
## 5 37 Osteichthyes Neogobius_melanostomus Condition
## 6 37 Osteichthyes Neogobius_melanostomus Condition
## 7 38 Osteichthyes Neogobius_melanostomus Condition
## 8 38 Osteichthyes Neogobius_melanostomus Condition
## 9 38 Osteichthyes Neogobius_melanostomus Condition
## 10 39 Invertebrate Pacifastacus leniusculus Condition
## 11 39 Invertebrate Pacifastacus leniusculus Condition
## response response_unit mean_core sd_core n_core mean_front sd_front
## 1 Activity 0.826 0.413 98 0.953 0.581
## 2 Activity 0.433 0.028 116 0.512 0.027
## 3 Endurance m 2491.324 619.795 94 2528.995 603.984
## 4 Endurance m 2491.324 619.795 94 2850.913 749.387
## 5 Body condition 1.380 0.180 2757 1.440 0.180
## 6 Body condition 1.360 0.170 2478 1.410 0.160
## 7 Body condition 1.390 0.130 228 1.460 0.200
## 8 Body condition 1.370 0.160 200 1.430 0.190
## 9 Body condition 1.220 0.590 86 1.190 0.420
## 10 Body condition 0.032 0.003 188 0.035 0.003
## 11 Body condition 0.029 0.002 251 0.032 0.002
## n_front yi vi d d_vi
## 1 161 0.1420 0.0049 0.2398 0.0165
## 2 117 0.1661 0.0001 2.8366 0.0344
## 3 96 0.0150 0.0013 0.0613 0.0211
## 4 96 0.1348 0.0014 0.5203 0.0218
## 5 306 0.0426 0.0001 0.3333 0.0036
## 6 215 0.0361 0.0001 0.2954 0.0051
## 7 238 0.0491 0.0001 0.4125 0.0088
## 8 137 0.0429 0.0002 0.3464 0.0125
## 9 130 -0.0249 0.0037 -0.0604 0.0193
## 10 188 0.0896 0.0001 0.9980 0.0120
## 11 251 0.0984 0.0000 1.4977 0.0102
Visualising and interpreting Hedge’s g can be slightly less intuitive compared to log response ratios. There’s a great online resource by Kristoffer Magnusson with some visuals that can help you decipher the meaning of Cohen’s d effects, which is the same as Hedge’s g.
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: vembedr(v.0.1.5), equatags(v.0.1.0), mathjaxr(v.1.6-0), pander(v.0.6.4), orchaRd(v.2.0), forcats(v.0.5.1), stringr(v.1.4.0), dplyr(v.1.0.9), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.0), tibble(v.3.1.7), ggplot2(v.3.3.6), tidyverse(v.1.3.1), flextable(v.0.6.6), metafor(v.3.4-0), metadat(v.1.3-0) and Matrix(v.1.3-4)
loaded via a namespace (and not attached): nlme(v.3.1-152), fs(v.1.5.2), lubridate(v.1.8.0), httr(v.1.4.3), tools(v.4.0.3), backports(v.1.4.1), bslib(v.0.3.1), utf8(v.1.2.2), R6(v.2.5.1), DBI(v.1.1.3), colorspace(v.2.0-3), withr(v.2.5.0), tidyselect(v.1.1.2), compiler(v.4.0.3), cli(v.3.3.0), rvest(v.1.0.2), formatR(v.1.11), pacman(v.0.5.1), xml2(v.1.3.3), officer(v.0.3.18), bookdown(v.0.22), sass(v.0.4.1), scales(v.1.2.0), rappdirs(v.0.3.3), systemfonts(v.1.0.2), digest(v.0.6.29), rmarkdown(v.2.14), base64enc(v.0.1-3), pkgconfig(v.2.0.3), htmltools(v.0.5.2), dbplyr(v.2.2.1), fastmap(v.1.1.0), rlang(v.1.0.3), readxl(v.1.4.0), rstudioapi(v.0.13), jquerylib(v.0.1.4), generics(v.0.1.2), jsonlite(v.1.8.0), zip(v.2.2.0), magrittr(v.2.0.3), Rcpp(v.1.0.8.3), munsell(v.0.5.0), fansi(v.1.0.3), gdtools(v.0.2.3), lifecycle(v.1.0.1), stringi(v.1.7.6), yaml(v.2.3.5), grid(v.4.0.3), crayon(v.1.5.1), lattice(v.0.20-44), haven(v.2.5.0), hms(v.1.1.1), knitr(v.1.39), klippy(v.0.0.0.9500), pillar(v.1.7.0), uuid(v.1.1-0), reprex(v.2.0.1), xslt(v.1.4.3), glue(v.1.6.2), evaluate(v.0.15), data.table(v.1.14.2), modelr(v.0.1.8), vctrs(v.0.4.1), tzdb(v.0.3.0), cellranger(v.1.1.0), gtable(v.0.3.0), assertthat(v.0.2.1), locatexec(v.0.1.1), xfun(v.0.31), broom(v.1.0.0) and ellipsis(v.0.3.2)