Calculating and interpreting effect sizes

Introduction

Meta-analysis quantitatively aggregates effect size data collected from existing research. The effect size one uses for a study question needs to be chosen carefully. Each make certain assumptions about the underlying data. It’s important to be weary of these assumptions and identify when something might have gone wrong.

The effect size chosen must satisfy two important features. First, the effect size must be comparable across studies. This means that they must be placed on the same scale so that they can be aggregated. Some effect size estimates can be inter-converted to each other (e.g., Zr, Hedges’ g), but this isn’t the case for all of them. Second, we must be able to derive or approximate the sampling variance for a given effect size. As already indicated, the sampling variance is very important if one wishes to use meta-analytic models that account for sampling variance and is needed to report certain measures of heterogeneity.

Meta-analyses in comparative physiology, and indeed, ecology and evolution more generally, often use highly heterogeneous data (Noble et al., 2022; Senior et al., 2016). This poses other challenges in aggregating effect size data that make their comparability questionable. We’ll discuss some of these issues in later tutorials and identify some ways in which some of this heterogeneity can be dealt with when deriving the effect size itself.

By the end of this tutorial, we hope that you will feel comfortable:

Calculating different effect sizes from your data.
Interpreting the meaning of different effect size statistics
Transforming effect size estimates back to their original scales when needed

Common Effect Sizes

What is an effect size? The answer to this question depends on what kind of study one wishes to synthesise. For example, an effect size could be a contrast between experimental treatments. We might extract mean differences from experiments manipulating, say, bisphenol-A (BPA) and looking at the effect of BPA on aquatic ectotherm phenotype relative to some control (Wu and Seebacher, 2020). Alternatively, the effect of interest might be the magnitude and direction of a correlation between two observed variables, such as metabolism and behaviour (Holtmann et al., 2016). At times, we might even just be interested in analysing the mean or variance itself.

“An effect size is a statistical parameter that can be used to compare, on the same scale, the results of different studies in which a common effect of interest has been measured” (Rosenberg et al., 2013)

There are many types of effect sizes that are commonplace in ecological and evolutionary research (Table 1 describes many of the more common ones from Noble et al., 2022).

Table 1: **Common effect sizes used throughout meta-analyses in comparative physiology, their associated sampling variances and examples on when they might be used**. M: mean; SD: standard deviation; N: sample size; CTmin: critical thermal minimum; CTmax: critical thermal maximum; LC_50: median lethal concentration; LT_50: median lethal temperature; BPA: bisphenol A.
Effect Measure	Definition	Sampling Variance	Examples
Mean			CTmin, CTmax, LC50, LT50, Metabolic Rate (MR)
Log Standard deviation, lnSD			Variability in CTmin, CTmax, Metabolic Rate (MR)
Log Response Ratio, lnRR			Ratio between pollutant exposed treatment (e.g., BPA-exposed group) and control (no pollutant)
Standardised Mean Difference, SMDa			Difference in immune response between males and females, performance difference in the presence of stressor compared to absence of stressor
Zr (Fisher Transformation of Correlation Coefficient, r)			Relationship between sex hormones and immune responses or metabolic rate and behaviour
a ; a

We’ll use metafor to demonstrate how to calculate commonly used effect size estimates in the field of comparative physiology, including Z-transformed correlation coefficients (Zr), log response ratios and Hedge’s g.

Setting up

To start, well need to load some packages and data. The data is all stored on GitHub. It’s downloaded directly from the web. You won’t need to have these on hand. We’ll first load the packages that we need.

# Load packages install.packages('pacman') # If you haven't already installed
# the pacman package do so with this code
pacman::p_load(metafor, flextable, tidyverse, orchaRd, pander, mathjaxr, equatags)
# To use mathjaxr you need to run equatags::mathjax_install()

Correlation between physiology and movement patterns: Z-transformed Correlation Coefficient (Zr)

Background: Physiology is expected to drive movement, including short-term activity, exploration of unfamiliar environments, and larger scale dispersal which can influence species distributions in an environmentally sensitive manner. To test whether there is a relationship between physiology and movement pattern (separated as activity, exploration, and dispersal), we extracted data from the literature on common physiological traits measured (e.g., metabolic rate, hormone levels, body condition) with either activity, exploration, and dispersal at the individual level. As we are interested in relationships at the individual level, we extracted correlation data in the form of correlation coefficient and sample size.

For complete detail on the study aim and design, please see Wu and Seebacher (2022)

Download the data. The following code illustrates how to calculate the Fisher z-transformed effect size (Zr) and sampling variance (V_Zr) from correlation coefficients (ri) and sample size (ni) using 10 rows of data from the main dataset.

# Download and clean data
zr_data <- read.csv("https://raw.githubusercontent.com/daniel1noble/meta-workshop/gh-pages/data/ind_disp_raw_data.csv") %>%
  dplyr::select(study_ID, taxa, species, trait, response, response_unit, disp_trait, disp_unit, corr_coeff, sample_size) %>% # remove irrelevant columns for this tutorial
  dplyr::top_n(10) # select first 10 effect sizes to illustrate escalc function

# Calculate Fisher's r-to-z transformed correlation coefficient (ZCOR) as yi = effect size and vi = sampling variances, where ri = raw correlation coefficients, and ni = sample size.
zr_data <- metafor::escalc(measure = "ZCOR", ri = corr_coeff, ni = sample_size, data = zr_data, var.names = c("Zr", "V_Zr"))

zr_data

## 
##    study_ID     taxa              species          trait       response 
## 1         5 Mammalia      Tamias_striatus     Metabolism     Resting MR 
## 2        13     Aves          Parus_major     Metabolism       Basal MR 
## 3        13     Aves          Parus_major     Metabolism       Basal MR 
## 4        45 Mammalia      Syncerus_caffer      Condition Body condition 
## 5        45 Mammalia      Syncerus_caffer      Condition Body condition 
## 6        55     Aves Phoenicopterus_ruber      Condition Body condition 
## 7        87     Aves  Cyanistes caeruleus       Immunity            WBC 
## 8        87     Aves  Cyanistes caeruleus Cardiovascular    Haematocrit 
## 9        87     Aves  Cyanistes caeruleus       Immunity            WBC 
## 10       87     Aves  Cyanistes caeruleus Cardiovascular    Haematocrit 
##       response_unit  disp_trait disp_unit corr_coeff sample_size      Zr   V_Zr 
## 1       residual mW Exploration  residual    -0.1430         296 -0.1440 0.0034 
## 2  mass-corrected W Exploration               0.0555         345  0.0555 0.0029 
## 3  mass-corrected W Exploration              -0.1397         335 -0.1406 0.0030 
## 4             index   Dispersal         %    -0.4000         415 -0.4236 0.0024 
## 5             index   Dispersal         %    -0.5300         508 -0.5901 0.0020 
## 6                     Dispersal               0.8730         462  1.3456 0.0022 
## 7                 n    Activity         n     0.0650         485  0.0651 0.0021 
## 8                 %    Activity         n     0.0150         485  0.0150 0.0021 
## 9                 n    Activity         n     0.1700         485  0.1717 0.0021 
## 10                %    Activity         n    -0.1300         485 -0.1307 0.0021

We want to convert to Zr instead of using the correlation (r) itself because Zr will satisfy assumptions of normality (see below) that are inherent to meta-analytic models. We can, however, convert back to the original correlation coefficient quite easily as follows:

# We can easily convert back to r as follows
zr_data$r <- tanh(zr_data$Zr)

zr_data %>%
    select(corr_coeff, r)

## 
##    corr_coeff       r 
## 1     -0.1430 -0.1430 
## 2      0.0555  0.0555 
## 3     -0.1397 -0.1397 
## 4     -0.4000 -0.4000 
## 5     -0.5300 -0.5300 
## 6      0.8730  0.8730 
## 7      0.0650  0.0650 
## 8      0.0150  0.0150 
## 9      0.1700  0.1700 
## 10    -0.1300 -0.1300

The correlation coefficient itself is a unit less measure that captures both the strength and direction of relationship between two variables. The raw correlation coefficient ranges from -1 to 1. If a correlation is positive it indicates that, as the value of one variable increases so to does the value of the second variable. The magnitude of the value indicates how strong they co-vary. For example, a correlation of 1 indicates that variable 1 perfectly co-varies with variable 2.

Difference in physiology between populations from range core and edge: Log response ratio (lnRR)

Background: If movement relied on physiological capacities, it may be expected that the variation in physiological characteristics introduces differences in the tendency to move among individuals of the same populations. Such individual differences can have consequences for gene flow and genetic diversification within a species. Individuals with a greater tendency to move may have particular physiological characteristics, leading to core-edge differences. Populations at the range edge is therefore hypothesised to have different physiological phenotype than populations from the core range.

To compare differences in the physiological characteristics between individuals from an established range core to their expanding range edge we collected data from studies that reported physiological response from the range core (core) and range edge (front) in the form of mean, sample size, and variance (SD, SE, or 95% CI). This information is used to calculate the log response ratio (lnRR).

For complete detail on the study aim and design, please see Wu and Seebacher (2022)

Download the data. The following code illustrates how to calculate the log response ratio (yi) and sampling variance (vi) from the mean (mi), sample size (ni) and standard deviation (sdi) from the edge population (1), and core population (2) using 10 row of data from the main dataset.

# Download and clean data
ratio_data <- read.csv("https://raw.githubusercontent.com/daniel1noble/meta-workshop/gh-pages/data/pop_disp_raw_data.csv") %>%
  dplyr::select(study_ID, taxa, species, trait, response, response_unit, mean_core, sd_core, n_core, mean_front, sd_front, n_front) %>% # remove irrelevant columns for this tutorial
  dplyr::top_n(10) # select first 10 effect sizes to illustrate escalc function

# Calculate log response ratio (lnRR) as yi = effect size and vi = sampling variances, where m1i = mean of edge population, n1i = sample size of edge population, sd1i = standard deviation of edge population, m2i = mean of core population, n2i = sample size of core population, sd2i = standard deviation of core population.
ratio_data <- metafor::escalc(measure = "ROM", 
                              m1i = mean_front, n1i = n_front, sd1i = sd_front,
                              m2i = mean_core, n2i = n_core, sd2i = sd_core, 
                              data = ratio_data)
ratio_data

## 
##    study_ID         taxa                  species              trait 
## 1         9 Osteichthyes   Neogobius_melanostomus           Activity 
## 2        24     Amphibia           Xenopus_laevis           Activity 
## 3        34 Invertebrate        Harmonia_axyridis Locomotor capacity 
## 4        34 Invertebrate        Harmonia_axyridis Locomotor capacity 
## 5        37 Osteichthyes   Neogobius_melanostomus          Condition 
## 6        37 Osteichthyes   Neogobius_melanostomus          Condition 
## 7        38 Osteichthyes   Neogobius_melanostomus          Condition 
## 8        38 Osteichthyes   Neogobius_melanostomus          Condition 
## 9        38 Osteichthyes   Neogobius_melanostomus          Condition 
## 10       39 Invertebrate Pacifastacus leniusculus          Condition 
## 11       39 Invertebrate Pacifastacus leniusculus          Condition 
##          response response_unit mean_core sd_core n_core mean_front sd_front 
## 1        Activity                   0.826   0.413     98      0.953    0.581 
## 2        Activity                   0.433   0.028    116      0.512    0.027 
## 3       Endurance             m  2491.324 619.795     94   2528.995  603.984 
## 4       Endurance             m  2491.324 619.795     94   2850.913  749.387 
## 5  Body condition                   1.380   0.180   2757      1.440    0.180 
## 6  Body condition                   1.360   0.170   2478      1.410    0.160 
## 7  Body condition                   1.390   0.130    228      1.460    0.200 
## 8  Body condition                   1.370   0.160    200      1.430    0.190 
## 9  Body condition                   1.220   0.590     86      1.190    0.420 
## 10 Body condition                   0.032   0.003    188      0.035    0.003 
## 11 Body condition                   0.029   0.002    251      0.032    0.002 
##    n_front      yi     vi 
## 1      161  0.1420 0.0049 
## 2      117  0.1661 0.0001 
## 3       96  0.0150 0.0013 
## 4       96  0.1348 0.0014 
## 5      306  0.0426 0.0001 
## 6      215  0.0361 0.0001 
## 7      238  0.0491 0.0001 
## 8      137  0.0429 0.0002 
## 9      130 -0.0249 0.0037 
## 10     188  0.0896 0.0001 
## 11     251  0.0984 0.0000

Now that we have calculated the log response ratio lets just make sure we understand what is being done. We’ll take this subset of data and do some back calculations to make sure we understand how to inter convert. To demonstrate, we’ll just convert (and back convert) the first row of data.

Testing your Understanding of \(lnRR\)

Task!

Using the first row of the data, calculate the log response ratio for this observation by hand. What does the direction of the effect size tell you about the mean at the marginal compared to core?

Answer!

The effect size is positive indicating that the mean of the marginal (mean_front) is larger than the mean of the core poulaton (mean_core). We can see that by calculating the effect size in two ways, the second of which is easier to see why.

### Let's back calculate effects to make sure we understand why they are
### interpreted as percentage differences

# First, lets make sure we understand how it's calculated
with(ratio_data, log(mean_front[1]/mean_core[1]))

## [1] 0.142

# Alternatively....
with(ratio_data, log(mean_front[1]) - log(mean_core[1]))

## [1] 0.142

Task!

What does the value of the effect size tell you about how much the mean differences between marginal and core populations?

Answer!

What this tells us is that the numerator is 1.15 times the denominator, or, that the marginal (front) mean is 15% larger compared to the core mean. We can see that this is true as follows:

# Now lets back-transform to odds.
with(ratio_data, exp(log(mean_front[1]) - log(mean_core[1])))

## [1] 1.15

# Interpretation: What this tells us is that the numerator is 1.15 times the
# denominator, or, that the marginal (front) mean is 15% larger compared to the
# core mean. We can see that this is true as follows:
with(ratio_data, exp(log(mean_front[1]) - log(mean_core[1])) * mean_core[1])

## [1] 0.953

# This value should now match the marginal mean value, which it does
with(ratio_data, mean_front[1])

## [1] 0.953

A simple way to remember how to interpret this is to think that a converted lnRR = 1 means the two are the same. If you are ’subtracting` the marginal population from the core population and the effect size is positive, than 1.15 indicates that the marginal population is ~15% larger, whereas if the effect size is -1.15 than it indicates that the core population is 15% larger.

Hopefully this is useful to interpret the meaning of the effect size data at the level of individual observations. The same principles will apply (for the most part) when converting back to the original scale from the mean estimates from meta-analytic models.

Cohen’s d and Hedges g – Visualising Their Meaning

We could also use Wu and Seebacher (2022)’s data to calculate Hedge’s g. Hedge’s g is very similar to Cohen’s d, but it has a small sample correction. This can also be done with metafor::escalc() by setting measure to “SMD”. We can use the same data here as all the summary statistics that are needed for calculating lnRR are the same as Hedge’s g. We can calculate as follows.

# Calculate hedges g as yi = effect size and vi = sampling variances, where m1i
# = mean of edge population, n1i = sample size of edge population, sd1i =
# standard deviation of edge population, m2i = mean of core population, n2i =
# sample size of core population, sd2i = standard deviation of core population.
hedge_data <- metafor::escalc(measure = "SMD", m1i = mean_front, n1i = n_front, sd1i = sd_front,
    m2i = mean_core, n2i = n_core, sd2i = sd_core, data = ratio_data, var.names = c("d",
        "d_vi"))
hedge_data

## 
##    study_ID         taxa                  species              trait 
## 1         9 Osteichthyes   Neogobius_melanostomus           Activity 
## 2        24     Amphibia           Xenopus_laevis           Activity 
## 3        34 Invertebrate        Harmonia_axyridis Locomotor capacity 
## 4        34 Invertebrate        Harmonia_axyridis Locomotor capacity 
## 5        37 Osteichthyes   Neogobius_melanostomus          Condition 
## 6        37 Osteichthyes   Neogobius_melanostomus          Condition 
## 7        38 Osteichthyes   Neogobius_melanostomus          Condition 
## 8        38 Osteichthyes   Neogobius_melanostomus          Condition 
## 9        38 Osteichthyes   Neogobius_melanostomus          Condition 
## 10       39 Invertebrate Pacifastacus leniusculus          Condition 
## 11       39 Invertebrate Pacifastacus leniusculus          Condition 
##          response response_unit mean_core sd_core n_core mean_front sd_front 
## 1        Activity                   0.826   0.413     98      0.953    0.581 
## 2        Activity                   0.433   0.028    116      0.512    0.027 
## 3       Endurance             m  2491.324 619.795     94   2528.995  603.984 
## 4       Endurance             m  2491.324 619.795     94   2850.913  749.387 
## 5  Body condition                   1.380   0.180   2757      1.440    0.180 
## 6  Body condition                   1.360   0.170   2478      1.410    0.160 
## 7  Body condition                   1.390   0.130    228      1.460    0.200 
## 8  Body condition                   1.370   0.160    200      1.430    0.190 
## 9  Body condition                   1.220   0.590     86      1.190    0.420 
## 10 Body condition                   0.032   0.003    188      0.035    0.003 
## 11 Body condition                   0.029   0.002    251      0.032    0.002 
##    n_front      yi     vi       d   d_vi 
## 1      161  0.1420 0.0049  0.2398 0.0165 
## 2      117  0.1661 0.0001  2.8366 0.0344 
## 3       96  0.0150 0.0013  0.0613 0.0211 
## 4       96  0.1348 0.0014  0.5203 0.0218 
## 5      306  0.0426 0.0001  0.3333 0.0036 
## 6      215  0.0361 0.0001  0.2954 0.0051 
## 7      238  0.0491 0.0001  0.4125 0.0088 
## 8      137  0.0429 0.0002  0.3464 0.0125 
## 9      130 -0.0249 0.0037 -0.0604 0.0193 
## 10     188  0.0896 0.0001  0.9980 0.0120 
## 11     251  0.0984 0.0000  1.4977 0.0102

Visualising and interpreting Hedge’s g can be slightly less intuitive compared to log response ratios. There’s a great online resource by Kristoffer Magnusson with some visuals that can help you decipher the meaning of Cohen’s d effects, which is the same as Hedge’s g.

References

Holtmann, B., Lagisz, M. and Nakagawa, S. (2016). Metabolic rates, and not hormone levels, are a likely mediator of between-individual differences in behaviour: A meta-analysis. Functional Ecology 31, 685–696.

Noble, D. W. A., Pottier, P., Lagisz, M., Burke, S., Drobniak, S. M., O’Dea, R. E. and Nakagawa, S. (2022). Meta-analytic approaches and effect sizes to account for “nuisance heterogeneity” in comparative physiology. Journal of Experimental Biology 225, jeb243225.

Rosenberg, M. S., Rothstein, H. R. and Gurevitch, J. (2013). Effect sizes: Conventional choices and calculations. In: Handbook of Meta-Analysis in Ecology and Evolution (eds J. Koricheva, J. Gurevitch & K. Mengersen), Princeton University Press, Princeton, New Jersey. pp 61 – 71,.

Senior, A. M., Grueber, C. E., Kamiya, T., Lagisz, M., O’dwyer, K., Santos, E. S. A. and Nakagawa, S. (2016). Heterogeneity in ecological and evolutionary meta‐analyses: Its magnitude and implications. Ecology 97, 3293–3299.

Wu, N. C. and Seebacher, F. (2020). Effect of the plastic pollutant bisphenol a on the biology of aquatic organisms: A meta‐analysis. Global Change Biology 26, 3821–3833.

Wu, N. C. and Seebacher, F. (2022). Physiology can predict animal activity, exploration, and dispersal. Communications Biology 5, 1–11.

Session Information

R version 4.0.3 (2020-10-10)

Platform: x86_64-apple-darwin17.0 (64-bit)

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: vembedr(v.0.1.5), equatags(v.0.1.0), mathjaxr(v.1.6-0), pander(v.0.6.4), orchaRd(v.2.0), forcats(v.0.5.1), stringr(v.1.4.0), dplyr(v.1.0.9), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.0), tibble(v.3.1.7), ggplot2(v.3.3.6), tidyverse(v.1.3.1), flextable(v.0.6.6), metafor(v.3.4-0), metadat(v.1.3-0) and Matrix(v.1.3-4)

loaded via a namespace (and not attached): nlme(v.3.1-152), fs(v.1.5.2), lubridate(v.1.8.0), httr(v.1.4.3), tools(v.4.0.3), backports(v.1.4.1), bslib(v.0.3.1), utf8(v.1.2.2), R6(v.2.5.1), DBI(v.1.1.3), colorspace(v.2.0-3), withr(v.2.5.0), tidyselect(v.1.1.2), compiler(v.4.0.3), cli(v.3.3.0), rvest(v.1.0.2), formatR(v.1.11), pacman(v.0.5.1), xml2(v.1.3.3), officer(v.0.3.18), bookdown(v.0.22), sass(v.0.4.1), scales(v.1.2.0), rappdirs(v.0.3.3), systemfonts(v.1.0.2), digest(v.0.6.29), rmarkdown(v.2.14), base64enc(v.0.1-3), pkgconfig(v.2.0.3), htmltools(v.0.5.2), dbplyr(v.2.2.1), fastmap(v.1.1.0), rlang(v.1.0.3), readxl(v.1.4.0), rstudioapi(v.0.13), jquerylib(v.0.1.4), generics(v.0.1.2), jsonlite(v.1.8.0), zip(v.2.2.0), magrittr(v.2.0.3), Rcpp(v.1.0.8.3), munsell(v.0.5.0), fansi(v.1.0.3), gdtools(v.0.2.3), lifecycle(v.1.0.1), stringi(v.1.7.6), yaml(v.2.3.5), grid(v.4.0.3), crayon(v.1.5.1), lattice(v.0.20-44), haven(v.2.5.0), hms(v.1.1.1), knitr(v.1.39), klippy(v.0.0.0.9500), pillar(v.1.7.0), uuid(v.1.1-0), reprex(v.2.0.1), xslt(v.1.4.3), glue(v.1.6.2), evaluate(v.0.15), data.table(v.1.14.2), modelr(v.0.1.8), vctrs(v.0.4.1), tzdb(v.0.3.0), cellranger(v.1.1.0), gtable(v.0.3.0), assertthat(v.0.2.1), locatexec(v.0.1.1), xfun(v.0.31), broom(v.1.0.0) and ellipsis(v.0.3.2)

Calculating and interpreting effect sizes

Daniel W.A. Noble, Nicholas C. Wu

2022-07-14

Introduction

Common Effect Sizes

Setting up

Correlation between physiology and movement patterns: Z-transformed Correlation Coefficient (Zr)

Difference in physiology between populations from range core and edge: Log response ratio (lnRR)

Testing your Understanding of \(lnRR\)

Task!

Answer!

Task!

Answer!

Cohen’s d and Hedges g – Visualising Their Meaning

References

Session Information

Back to Table of Contents