Heterogeneity in Meta-analysis

Introduction to Heterogeneity in Meta-analysis

In physiology, ecology and evolution, we deal with many different populations, species, experimental designs etc. As such, we’re not just interested in understanding ‘what the overall effect’ actually is (in fact, we may not even care in many cases), but we are mainly focused on attempting to understand what factors (e.g., biological, methodological) explain variation in effects (Gurevitch et al., 2018; Lajeunesse, 2010; Noble et al., 2022). Understanding and quantifying how much variability in effects exists, whether this is more than what we expect by chance, and what factors explain variation is a primary goal of meta-analysis.

Meta-analytic mean estimates need to be interpreted in the context of how much variation in effects exists within and across studies. As such, reporting upon measures of variability, or what is referred to as ‘heterogeneity’ in meta-analysis, is essential to meta-analysis and should never be ignored (even though it often is unfortunately) (Borenstein, 2019; Gurevitch et al., 2018; Nakagawa and Santos, 2012; Nakagawa et al., 2017a; O’Dea et al., 2021).

There are a number of important metrics of heterogeneity that are commonly used and reported upon in the meta-analytic literature (Borenstein, 2019; Nakagawa and Santos, 2012; Nakagawa et al., 2017a). We’ll overview in this tutorial how to calculate, and interpret, many of the common types and what they mean with respect to interpreting the meta-analytic mean. Understanding the consistency of results across studies tells one a great deal about how likely an effect is going to be picked up in future studies and whether we can make broad general conclusions.

Measures of Heterogeneity

The most commonly encountered measures of heterogeneity in comparative physiology, and indeed ecology and evolution more generally, are results of Q tests, \(I^2\) metrics (or raw \(\tau^2\) or variance estimates), and less commonly, prediction intervals. These measures of heterogeneity (even if just a few) should always be presented alongside meta-analytic mean effect size estimates because they tell the reader a great deal about the ‘consistency’ of results (e.g., prediction intervals, total heterogeneity) or the relative contribution of different factors to effect size variation (i.e., different measures of \(I^2\)). We’ll overview these different metrics and show how to calculate and interpret them in meta-analyses.

For the purpose of this tutorial we’ll assume the following multilevel meta-analytic model is fit as described in the multilevel model tutorial. If you can’t quite remember all the notation we recommend going back to review that page.

\[ \begin{aligned} y_{i} &= \mu + s_{j[i]} + spp_{k[i]} + e_{i} + m_{i} \\ m_{i} &\sim N(0, v_{i}) \\ s_{j} &\sim N(0, \tau^2) \\ spp_{k} &\sim N(0, \sigma_{k}^2) \\ e_{i} &\sim N(0, \sigma_{e}^2) \end{aligned} \] We will return to the meta-analysis by Pottier et al. (2021) that we already detailed in the multilevel meta-analysis tutorial. We’ll walk through the different types of heterogeneity statistics that we can calculate for the model.

We can re-load that data again and get it ready for analysis using the code below:

Code
# install.packages("pacman") ; uncomment this line if you haven't already installed 'pacman'
pacman::p_load(metafor, tidyverse, orchaRd, devtools, patchwork, R.rsp, emmeans, flextable)

asr_dat <- read.csv("https://osf.io/qn2af/download")

#' @title arr
#' @description Calculates the acclimation response ratio (ARR).  
#' @param t2_l  Lowest of the two treatment temperatures
#' @param t1_h  Highest of the two treatment temperatures
#' @param x1_h  Mean trait value at high temperature
#' @param x2_l  Mean trait value at low temperature
#' @param sd1_h Standard deviation of mean trait value at high temperature
#' @param sd2_l Standard deviation of mean trait value at low temperature
#' @param n1_h  Sample size at high temperature
#' @param n2_l  Sample size at low temperature

arr <- function(x1_h, x2_l, sd1_h, sd2_l, n1_h, n2_l, t1_h, t2_l){
        ARR <- (x1_h - x2_l)/(t1_h - t2_l)
      V_ARR <- ((1/(t1_h - t2_l))^2*(sd2_l^2/n2_l + sd1_h^2/n1_h))
return(data.frame(ARR, V_ARR))
}

# Calculate the effect sizes
asr_dat<-asr_dat %>% 
              mutate(ARR= arr(x1_h = mean_high, x2_l = mean_low, t1_h = acc_temp_high, t2_l = acc_temp_low, 
                              sd1_h = sd_high, sd2_l = sd_low, n1_h = n_high_adj, n2_l = n_low_adj)[,1], 
                     V_ARR = arr(x1_h =  mean_high, x2_l = mean_low, t1_h = acc_temp_high, t2_l = acc_temp_low, 
                           sd1_h = sd_high, sd2_l = sd_low, n1_h = n_high_adj, n2_l = n_low_adj)[,2]) %>% 
                filter(sex == "female")

# Re-fit the multilevel meta-analytic model
MLMA <- metafor::rma.mv(yi= ARR~ 1, V = V_ARR, 
                   method="REML",
                   random=list(~1|species_ID,
                               ~1|study_ID,
                               ~1|es_ID), 
                   dfs = "contain",
                   test="t",
                   data=asr_dat)

We’ll now use this model to calculate various heterogeneity statistics in addition to the ones already calculated (which we will describe below). Before doing so, it helps to visualise the data and the model output together.

Visualising Heterogeneity with an Orchard Plot

The orchaRd package provides the orchard_plot() function, which is a great tool for communicating meta-analytic results and heterogeneity at the same time (Nakagawa et al., 2021; Nakagawa et al., 2023). The plot shows:

  • the meta-analytic mean as a point estimate (the filled circle in Figure 7.1);
  • a thick bar spanning the 95% confidence interval (uncertainty around the mean);
  • a thin bar spanning the 95% prediction interval (the expected range of a future effect size — the key to visualising heterogeneity);
  • individual effect sizes are plotted as points scaled by their precision (\(1/\sqrt{v_i}\)), so precise estimates appear larger.
Code
orchaRd::orchard_plot(MLMA, group = "species_ID", xlab = "Acclimation Response Ratio (ARR, °C / °C)")
Figure 7.1: Orchard plot of acclimation response ratio (ARR) estimates across ectotherm species. The filled circle is the meta-analytic mean, the thick bar is the 95% confidence interval, and the thin bar is the 95% prediction interval. Individual effect sizes are scaled by precision.

The wide prediction interval in Figure 7.1 immediately makes clear that there is substantial heterogeneity in ARR across species — the spread of effects is far greater than sampling error alone would produce. The numerical heterogeneity metrics below quantify this formally.

Proportion of Total Heterogeneity: \(I_{total}^2\)

\(I^2\) estimates are probably most commonly presented in the literature (Borenstein, 2019; Higgins and Thompson, 2002; Higgins et al., 2003; Nakagawa et al., 2017a; Senior et al., 2016). There are different forms of \(I^2\) that can be calculated, but the one that describes the proportion of effect size variation after accounting for total sampling variation is \(I_{total}^2\) (Nakagawa and Santos, 2012). Assuming we’re using our multilevel model described above, it’s calculated as follows:

\[ I^2_{total} = \frac{\sigma^2_{study} + \sigma^2_{phylogeny} + \sigma^2_{species} + \sigma^2_{residual}}{\sigma^2_{study} + \sigma^2_{phylogeny} + \sigma^2_{species} + \sigma^2_{residual} +\sigma^2_{m}} \tag{7.1}\]

where \(\sigma^2_{total} = \sigma^2_{study} + \sigma^2_{phylogeny} + \sigma^2_{species} + \sigma^2_{residual} +\sigma^2_{m}\) is the total effect size variance and \(\sigma^2_{m}\) is the ‘typical’ sampling error variance calculated as:

Note

Equation 7.1 is the general form and may include a phylogenetic variance component (\(\sigma^2_{phylogeny}\)) when a phylogenetic correlation structure is added to the model. In the model fitted here, no phylogenetic term is included, so \(\sigma^2_{phylogeny} = 0\) and the numerator reduces to \(\sigma^2_{study} + \sigma^2_{species} + \sigma^2_{residual}\).

\[ \sigma_{m}^2 = \sum w_{i}\left( k-1\right) / \left[ \left( \sum w_{i}\right)^2 + \sum w_{i}^2\right] \tag{7.2}\]

where k is the number of studies and the weights, \(w_{i} = \frac{1}{v_{i}}\), can be calculated using the inverse of the sampling variance (\(v_{i}\)) for each effect size, i. Below, we’ll make use of the orchaRd R package (Nakagawa et al., 2021; Nakagawa et al., 2023) to calculate various \(I^2\) metrics. In fact, we can simply change what is in the numerator of Equation 7.1 to calculate a multitude of different \(I^2\) metrics that can be useful in understanding the relative importance of factors most important in explaining effect size variation. We’ll do this below:

Code
# The orchaRd package has some convenient functions for calculating various I2 estimates including total. We'll load and install that package
#install.packages("pacman")
pacman::p_load(devtools, tidyverse, metafor, patchwork, R.rsp, emmeans, flextable)

#devtools::install_github("daniel1noble/orchaRd", force = TRUE, build_vignettes = TRUE)
library(orchaRd)

orchaRd::i2_ml(MLMA)
     I2_Total I2_species_ID   I2_study_ID      I2_es_ID 
        98.42          2.98         58.48         36.96 

Interpreting \(I^2\) Estimates

Interpret the meaning of \(I_{Total}^2\) from the multilevel meta-analytic model

Overall, we have highly heterogeneous effect size data because sampling variation only contributes to 1.581% of the total variation in effects.

Interpret the meaning of \(I_{study}^2\) from the multilevel meta-analytic model

From the multilevel meta-analytic model we find that 58.481% of the total variation in effect size estimates is the result of differences between studies.


Bootstrapping \(I^2\) Estimates

There are also times that we may want to estimate, and present, uncertainty about these heterogeneity estimates. We can do that by bootstrapping 1000 times:

Code
# The orchaRd package has some convenient functions for calculating various I2 estimates including total. We'll load and install that package
#install.packages("pacman")
pacman::p_load(devtools, tidyverse, metafor, patchwork, R.rsp, emmeans, flextable)

#devtools::install_github("daniel1noble/orchaRd", force = TRUE, build_vignettes = TRUE)
library(orchaRd)

orchaRd::i2_ml(MLMA, boot = 1000)
               Est. 2.5% 97.5%
I2_Total      98.35 97.1  99.0
I2_species_ID  1.86  0.0  27.7
I2_study_ID   55.67 15.9  76.9
I2_es_ID      37.81 20.0  66.0

Here, we can now see that \(I_{total}^2\) has a fairly narrow 95% confidence interval of ~ 0.97 to 0.99.

Beyond \(I^2\): Mean-Standardised Heterogeneity (\(CVH_2\) and \(M_2\))

\(I^2\) is a variance-partitioning metric: it tells you what fraction of total variance is genuine heterogeneity, not sampling error. That is useful, but it says nothing about whether the heterogeneity is large or small relative to the mean effect. Consider two studies that both have \(I^2_{total} = 0.90\). If one has a mean ARR of 0.001 °C/°C the absolute spread of effects is biologically negligible; if the other has a mean of 1 °C/°C the same 90% translates to enormous between-study variation. Yang et al. (2025) propose two mean-standardised alternatives that capture this information:

\[ CVH_2 = \frac{\sum \sigma^2}{\hat{\mu}^2} \tag{7.3}\]

\[ M_2 = \frac{\sum \sigma^2}{\hat{\mu}^2 + \sum \sigma^2} \tag{7.4}\]

where \(\hat{\mu}\) is the meta-analytic mean and \(\sum \sigma^2\) is the sum of all random-effect variance components (excluding sampling variance \(\sigma^2_m\)). \(CVH_2\) (Equation 7.3) is analogous to a squared coefficient of variation: values \(> 1\) indicate that heterogeneity variance exceeds the mean effect squared. \(M_2\) (Equation 7.4) is bounded \([0, 1]\) and can be read as the proportion of total ‘signal’ (mean\(^2\) + variance) that is heterogeneity. Yang et al. (2025) recommend reporting both \(CVH_2\) and \(M_2\) alongside \(I^2\) for a more complete picture. These metrics are implemented in orchaRd:

Code
orchaRd::cvh2_ml(MLMA)   # Mean-standardised heterogeneity (variance scale)
     CVH2_Total CVH2_species_ID   CVH2_study_ID      CVH2_es_ID 
         0.9317          0.0282          0.5536          0.3499 
Code
orchaRd::m2_ml(MLMA)     # Heterogeneity as fraction of total signal
     M2_Total M2_species_ID   M2_study_ID      M2_es_ID 
       0.4823        0.0146        0.2866        0.1811 

Interpreting \(CVH_2\) and \(M_2\)

Given the values of \(CVH_2\) and \(M_2\) above, what do they tell you about ARR heterogeneity that \(I^2_{total}\) does not?

\(I^2_{total}\) tells us that almost all of the variance in ARR estimates is real heterogeneity rather than sampling noise. But it does not tell us whether that heterogeneity is large or small on a biologically meaningful scale. \(CVH_2\) answers that: a value much greater than 1 means the variance across studies is many times larger than the mean effect squared — i.e., the spread of true effects dwarfs the average effect. \(M_2\) near 1 means variation dominates the signal; near 0 means the mean is large and stable relative to the scatter. Together these metrics clarify whether a high \(I^2\) reflects a lot of noise around a strong signal or a weak signal swamped by variation.


Prediction Intervals

Prediction intervals (PI) are probably the best and most intuitive way to report heterogeneity of meta-analytic results (Borenstein, 2019; Nakagawa et al., 2021; Nakagawa et al., 2023; Noble et al., 2022). Prediction intervals tell us how much we can expect a given effect size to vary across studies. More specifically, if we were to go out and conduct another study or experiment they tell us the range of effect size estimates we are expected to observe 95% of the time from that new study (Borenstein, 2019; Nakagawa et al., 2021; Noble et al., 2022).

Prediction intervals can be calculated in a similar way to confidence intervals but instead of just using the standard error, like we do with confidence interval construction, we add in the extra random-effect variance estimates. For our model above, the total PI — spanning all sources of variance — is:

\[ PI_{total} \sim \bar{\mu} \pm 1.96 \sqrt{SE^2 + \sigma^2_{study} + \sigma^2_{species} + \sigma^2_{residual}} \tag{7.5}\]

However, Yang et al. (2024) highlight that the total PI conflates two distinct questions: (1) how variable are individual effect size estimates across all studies (which includes within-study residual variance, \(\sigma^2_{residual}\)), and (2) how variable are study-level means across different biological contexts (which excludes within-study residual variance). The latter — called the study-level PI — is often the more relevant question for biological generality, because it captures how much we expect the average result to shift if we replicated the study in a new species or population context:

\[ PI_{study} \sim \bar{\mu} \pm 1.96 \sqrt{SE^2 + \sigma^2_{study} + \sigma^2_{species}} \tag{7.6}\]

Yang et al. (2024) found that across 512 ecological and evolutionary meta-analyses, most of the wide total PI came from within-study (residual) variance, and that study-level generality was far more common than the total PI implied.

We can compute both types easily:

Code
# Total PI — all variance components (predict() uses all sigma2 by default)
predict(MLMA)

   pred     se  ci.lb  ci.ub   pi.lb  pi.ub 
 0.1668 0.0316 0.1010 0.2327 -0.1755 0.5091 
Code
# Study-level PI — between-study variance only (species_ID + study_ID, excluding es_ID)
# sigma2 order matches the random= list: [species_ID, study_ID, es_ID]
vc <- MLMA$sigma2
pi_study_se <- sqrt(sum(vc[1:2]) + MLMA$se^2)
cat("Study-level 95% PI: [",
    round(MLMA$b - 1.96 * pi_study_se, 3), ",",
    round(MLMA$b + 1.96 * pi_study_se, 3), "]\n")
Study-level 95% PI: [ -0.09 , 0.424 ]

Interpreting Prediction Intervals

The meta-analytic mean ARR is 0.167. Compare the total PI to the study-level PI. What does each tell you, and why might one be more useful for assessing biological generality?

The total 95% PI is wide — from -0.175 to 0.509 — suggesting a lot of inconsistency across individual effect size estimates. But much of that width comes from within-study residual variance (variation among effect sizes within a single study). The study-level PI, which retains only between-study variance (species and study components), is narrower and tells us how much the average result is expected to shift if we repeated the study in a new biological context. For biological generality, the study-level PI is the more relevant question: if most replications in new species or populations land near the mean, the phenomenon generalises well even if individual effect sizes are noisy.


Q-tests: Inferential Tests to Determine Excess Heterogeneity

So far we’ve talked about important heterogeneity statistics that describe, and quantify, the proportion of variability (\(I^2\)) or the expected range of plausible effect size values which incorporate all the variance estimates (PI). But, how do we know that the heterogeneity in the data is greater than what we would expect by chance? It seems obvious that, if the \(I_{total}^2\) is really high (i.e., >85%) there’s a lot of variability in effects within and across studies. While these statistics are probably sufficient, Q tests can also be useful to more formally make an inferential statement about whether heterogeneity is significantly greater than chance.

We have already encountered how to calculate the Q-statistic when discussing fixed and random effects models. We can use the Q statistic now to compare this against a \(\chi^2\)-distribution to calculate a p-value. Wolfgang’s documentation (?rma.mv) describes what the Q-test means:

Cochran’s 𝑄-test, which tests whether the variability in the observed effect sizes or outcomes is larger than one would expect based on sampling variability (and the given covariances among the sampling errors) alone. A significant test suggests that the true effects/outcomes are heterogeneous.


We don’t need to do anything special to get this test. In fact, this is automatically done by metafor:

Code
print(MLMA)

Multivariate Meta-Analysis Model (k = 123; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed      factor 
sigma^2.1  0.0008  0.0280     29     no  species_ID 
sigma^2.2  0.0154  0.1241     21     no    study_ID 
sigma^2.3  0.0097  0.0987    123     no       es_ID 

Test for Heterogeneity:
Q(df = 122) = 3941.0055, p-val < .0001

Model Results:

estimate      se    tval  df    pval   ci.lb   ci.ub      
  0.1668  0.0316  5.2857  20  <.0001  0.1010  0.2327  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Describing the Q-test

How would you describe in a paper the Q-test from the metafor model above?

Of course, there are a multitude of ways that we could describe this test in words, but one way to describe it in your results section might be as follows:


The overall mean ARR across ectotherms was 0.167 and differed significantly from zero (95% CI: 0.101 to 0.233, df = 20, t = 5.286, p < 0.001). However, there was significant residual heterogeneity (Q = 3941.005, df = 122, p < 0.001) with estimates expected to range from -0.175 to 0.509 (95% Prediction Interval) (\(I_{total}^2\) = 98.419).


Proportion of Variation Explained by the Model: \(R^2_{marginal}\) and \(R^2_{conditional}\)

You will probably have noticed that \(I^2\) looks awfully familiar. It looks a lot like the way we might calculate \(R^2\) described by Nakagawa and Schielzeth (2013). Well, you would be correct! Often we also want to quantify how much variation is explained by our moderators and random effects in our multilevel meta-regression model (Nakagawa and Schielzeth, 2013; Nakagawa et al., 2017b). This is done using \(R^2\). We can calculate different \(R^2\) values depending on whether we ignore or include random effects in the numerator. If only the variation explained by moderators is included in the numerator than we call this \(R_{marginal}^2\), which is defined as:

\[ R^2_{marginal} = \frac{\sigma^2_{fixed}}{\sigma^2_{fixed} + \sigma^2_{study} + \sigma^2_{phylogeny} + \sigma^2_{species} + \sigma^2_{residual}} \tag{7.7}\]

Note that this formula does not include \(\sigma^2_{m}\) as sampling error variance is assumed to be known in meta-analysis. We won’t go into the details on how to calculate and interpret these because we’ll cover this in a later tutorial on publication bias. Suffice it to say that this is a useful statistic for describing how much variation your moderators or model explain.

Conclusion

Hopefully it’s clear from this tutorial why explicitly estimating and reporting upon heterogeneity is so critically important in meta-analysis. Of course, some of these measures are only possible if we know the total sampling variance, which is not possible if one doesn’t use a weighted meta-analytic model.

No single metric tells the whole story. Yang et al. (2025) recommend a pluralistic approach to reporting heterogeneity:

  • \(I^2_{total}\) — tells you what fraction of variance is real signal (not sampling noise);
  • \(CVH_2\) and \(M_2\) — tell you how large that heterogeneity is relative to the mean effect, answering the question “does the average matter if effects are so variable?”;
  • Prediction intervals — give the most intuitive summary of heterogeneity as the expected range of a new result. Reporting both the total PI and the study-level PI (Yang et al., 2024) helps distinguish noise within studies from genuine variation across biological contexts;
  • An orchard plot — visualises all of the above at once and should accompany any numerical heterogeneity summary (Nakagawa et al., 2021; Nakagawa et al., 2023).

Together these metrics give a far richer picture of heterogeneity than \(I^2\) alone.

References

Borenstein, M. (2019). Heterogenity in meta-analysis. In (ed. Cooper, H.), Hedges, L. V.), and Valentine, J. C.), pp. 454–466. New York: Russell Sage Foundation.
Gurevitch, J., Koricheva, J., Nakagawa, S. and Stewart, G. (2018). Meta-analysis and the science of research synthesis. Nature 555, 176–182.
Higgins, J. P. T. and Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 21, 1539–1558.
Higgins, J. P. T., Thompson, S. G., Deeks, J. J. and Altman, D. G. (2003). Measuring inconsistency in meta-analyses. British Medical Journal 327, 557–560.
Lajeunesse, M. J. (2010). Achieving synthesis with meta-analysis by combining and comparing all available studies. Ecology 91, 2561–2564.
Nakagawa, S. and Santos, E. S. (2012). Methodological issues and advances in biological meta-analysis. Evolutionary Ecology 26, 1253–1274.
Nakagawa, S. and Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed‐effects models. Methods in Ecology and Evolution 4, 133–142.
Nakagawa, S., Noble, D. W. A., Senior, A. M. and Lagisz, M. (2017a). Meta-evaluation of meta-analysis: Ten appraisal questions for biologists. BMC Biology 15–18, DOI 10.1186/s12915-017-0357-7.
Nakagawa, S., Johnson, P. C. and Schielzeth, H. (2017b). The coefficient of determination r 2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface 14, 20170213.
Nakagawa, S., Lagisz, M., O’Dea, R. E., Rutkowska, J., Yang, Y., Noble, D. W. A. and Senior, A. M. (2021). The orchard plot: Cultivating forest plots for use in ecology, evolution and beyond. Research Synthesis Methods 12, 4–12.
Nakagawa, S., Lagisz, M., O’Dea, R. E., Rutkowska, J., Yang, Y., Noble, D. W. A. and Senior, A. M. (2023). orchaRd 2.0: An R package for visualising meta-analyses with orchard plots. Methods in Ecology and Evolution 14, 2003–2010.
Noble, D. W. A., Pottier, P., Lagisz, M., Burke, S., Drobniak, S. M., O’Dea, R. E. and Nakagawa, S. (2022). Meta-analytic approaches and effect sizes to account for “nuisance heterogeneity” in comparative physiology. Journal of Experimental Biology 225, jeb243225.
O’Dea, R. E., Lagisz, M., Jennions, M. D., Koricheva, J., Noble, D. W. A., Parker, T. H., Gurevitch, J., Page, M. J., Stewart, G., Moher, D., et al. (2021). Preferred reporting items for systematic reviews and meta-analyses in ecology and evolutionary biology: A PRISMA extension. Biological Reviews doi: 10.1111/brv.12721,.
Pottier, P., Burke, S., Drobniak, S. M., Lagisz, M. and Nakagawa, S. (2021). Sexual (in) equality? A meta‐analysis of sex differences in thermal acclimation capacity across ectotherms. Functional Ecology 35, 2663–2678, https://doi.org/10.1111/1365–2435.13899.
Senior, A. M., Grueber, C. E., Kamiya, T., Lagisz, M., O’dwyer, K., Santos, E. S. A. and Nakagawa, S. (2016). Heterogeneity in ecological and evolutionary meta‐analyses: Its magnitude and implications. Ecology 97, 3293–3299.
Yang, Y., Noble, D. W. A., Senior, A. M., Lagisz, M. and Nakagawa, S. (2024). Interpreting prediction intervals and distributions for decoding biological generality in meta-analyses. bioRxiv.
Yang, Y., Noble, D. W. A., Spake, R., Senior, A. M., Lagisz, M. and Nakagawa, S. (2025). A pluralistic framework for measuring, interpreting and decomposing heterogeneity in meta-analysis. Methods in Ecology and Evolution 16, 2710–2725.


Session Information

R version 4.5.3 (2026-03-11)

Platform: aarch64-apple-darwin20

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: emmeans(v.2.0.3), R.rsp(v.0.46.0), patchwork(v.1.3.1), devtools(v.2.4.5), usethis(v.3.2.1), magick(v.2.9.1), equatags(v.0.2.1), mathjaxr(v.2.0-0), pander(v.0.6.6), orchaRd(v.2.2.0), lubridate(v.1.9.5), forcats(v.1.0.1), stringr(v.1.6.0), dplyr(v.1.2.1), purrr(v.1.2.2), readr(v.2.2.0), tidyr(v.1.3.2), tibble(v.3.3.1), ggplot2(v.4.0.3), tidyverse(v.2.0.0), flextable(v.0.9.9), metafor(v.4.8-0), numDeriv(v.2016.8-1.1), metadat(v.1.4-0) and Matrix(v.1.7-4)

loaded via a namespace (and not attached): remotes(v.2.5.0), sandwich(v.3.1-1), rlang(v.1.2.0), magrittr(v.2.0.5), multcomp(v.1.4-28), otel(v.0.2.0), compiler(v.4.5.3), systemfonts(v.1.3.2), vctrs(v.0.7.3), profvis(v.0.4.0), pkgconfig(v.2.0.3), fastmap(v.1.2.0), ellipsis(v.0.3.3), labeling(v.0.4.3), promises(v.1.5.0), rmarkdown(v.2.31), sessioninfo(v.1.2.3), tzdb(v.0.5.0), ggbeeswarm(v.0.7.3), ragg(v.1.5.2), xfun(v.0.57), cachem(v.1.1.0), jsonlite(v.2.0.0), later(v.1.4.8), uuid(v.1.2-2), R6(v.2.6.1), stringi(v.1.8.7), RColorBrewer(v.1.1-3), pkgload(v.1.5.1), estimability(v.1.5.1), Rcpp(v.1.1.1-1.1), knitr(v.1.51), zoo(v.1.8-14), pacman(v.0.5.1), R.utils(v.2.13.0), splines(v.4.5.3), httpuv(v.1.6.16), R.cache(v.0.17.0), timechange(v.0.4.0), tidyselect(v.1.2.1), yaml(v.2.3.12), codetools(v.0.2-20), miniUI(v.0.1.2), curl(v.7.1.0), pkgbuild(v.1.4.8), lattice(v.0.22-9), shiny(v.1.13.0), withr(v.3.0.2), S7(v.0.2.2), askpass(v.1.2.1), coda(v.0.19-4.1), evaluate(v.1.0.5), survival(v.3.8-6), urlchecker(v.1.0.1), zip(v.2.3.3), xml2(v.1.5.2), pillar(v.1.11.1), generics(v.0.1.4), hms(v.1.1.4), scales(v.1.4.0), xtable(v.1.8-8), glue(v.1.8.1), gdtools(v.0.4.2), tools(v.4.5.3), data.table(v.1.18.2.1), fs(v.2.1.0), mvtnorm(v.1.3-7), grid(v.4.5.3), nlme(v.3.1-168), beeswarm(v.0.4.0), vipor(v.0.4.7), latex2exp(v.0.9.8), cli(v.3.6.6), textshaping(v.1.0.5), officer(v.0.6.10), fontBitstreamVera(v.0.1.1), V8(v.6.0.4), katex(v.1.5.0), gtable(v.0.3.6), R.methodsS3(v.1.8.2), digest(v.0.6.39), fontquiver(v.0.2.1), TH.data(v.1.1-3), xslt(v.1.5.1), htmlwidgets(v.1.6.4), farver(v.2.1.2), memoise(v.2.0.1), htmltools(v.0.5.9), R.oo(v.1.27.1), lifecycle(v.1.0.5), mime(v.0.13), MASS(v.7.3-65), fontLiberation(v.0.1.0) and openssl(v.2.4.0)