Introduction

In this tutorial, we demonstrate how meta-analyst’s can implement approaches for correcting inflated Type I error rates when using multilevel models with non-independent effect size data. For ease of presentation, we use simulated data to demonstrate the implementation of a few easy solutions.

R Packages Required

First, we’ll load some of the packages that we’ll need.

# Clean workspace
rm(list = ls())

# Loading packages & Functions

# install.packages('pacman')

# Note. To get the latest development versuon of metafor use the follow code
# remotes::install_github('wviechtb/metafor'); library(metafor)

# Load packages
pacman::p_load(tidyverse, MASS, kableExtra, gridExtra, metafor, MCMCglmm, brms, robumeta, 
    clubSandwich, pander, tidyverse)

Simulating Non-independent Effect Size Data

Here, we will simulate some meta-analytic data. We will keep this very simple, just for demonstration purposes. Hence, we will assume that we have collected data from a total of 30 studies, and we’ll assume that we were able to extract n = 5 effect sizes from each of these 30 studies. In total, we have a data set that contains n = 150 effect sizes.

# Simulate a dataset composed of 30 papers, each having 5 effects from the same
# study.

set.seed(123)

# Parameters
no.paper = 30  # Numbers of unique papers
n.effects = 5  # Number of effects per paper. We will keep simple and balanced
rho.e = 0.8  # Correlation in sampling variances
study_id = rep(1:no.paper, each = n.effects)  # Study ID's
var_paper = 0.5  # Between-study variance (i.e., tau2)
var_effect = 0.2  # Effect size (or within study) variance
mu = 0.4  # Average, or overall, effect size
rho = c(0.2, 0.4, 0.6, 0.8)

# Add sampling variance First, sample
n.sample <- rnbinom(no.paper, mu = 40, size = 0.5) + 4

# Assume logRR. So, sampling variance approximated by the CV.
cv <- sample(seq(0.3, 0.35, by = 0.001), no.paper, replace = TRUE)

# Assuming sample size is the same for all effects within study (pretty sensible)
rep_sample <- rep(n.sample, each = n.effects)

# Sampling variances
vi <- (2 * cv^2)/rep_sample

# Create the sampling variance matrix, which accounts for the correlation between
# effect sizes within study
sigma_vi <- as.matrix(Matrix::bdiag(impute_covariance_matrix(vi = vi, cluster = study_id, 
    r = rho.e)))

# Now create the effect size level covariance matrix. Thsi would simulate a
# situation where, for example, you have effect sizes on reading and maths that
# are clearly correlated, but to varying extents across studies

# Create list of matrices, 1 for each study
matrices <- list()
for (i in 1:no.paper) {
    r <- sample(rho, size = 1, replace = TRUE)
    tmp <- matrix(r, nrow = n.effects, ncol = n.effects)
    diag(tmp) <- 1
    matrices[[i]] <- tmp
}

cor_yi <- as.matrix(Matrix::bdiag(matrices))

cov_yi <- cor_yi * sqrt(var_effect) * sqrt(var_effect)

# Derive data from simulation
yi <- mu + rep(rnorm(no.paper, 0, sqrt(var_paper)), each = n.effects) + mvrnorm(n = 1, 
    mu = rep(0, n.effects * no.paper), cov_yi) + mvrnorm(n = 1, mu = rep(0, n.effects * 
    no.paper), sigma_vi)

# Create the data. We'll just assume that meta-analysts have already derived
# their effect size and sampling variance
data <- data.frame(study_id = study_id, yi = yi, vi = vi, obs = c(1:(n.effects * 
    no.paper)))

Now that we have our simulated (i.e., fake or toy) data, we can demonstrate a few corrections that can be applied to multilevel meta-analytic models that will offset any possible inflated Type I error rates.

Fit the Multi-level Meta-analytic (MLMA) Model

First, lets just fit our multilevel meta-analytic (MLMA) model. We can do that using our simulated data as follows:

mod_multilevel = metafor::rma.mv(yi = yi, V = vi, mods = ~1, random = list(~1 | study_id, 
    ~1 | obs), data = data, test = "t")
summary(mod_multilevel)
## 
## Multivariate Meta-Analysis Model (k = 150; method: REML)
## 
##   logLik  Deviance       AIC       BIC      AICc 
## -87.3457  174.6914  180.6914  189.7033  180.8569   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed    factor 
## sigma^2.1  0.8567  0.9256     30     no  study_id 
## sigma^2.2  0.0779  0.2791    150     no       obs 
## 
## Test for Heterogeneity:
## Q(df = 149) = 18903.9961, p-val < .0001
## 
## Model Results:
## 
## estimate      se    tval   df    pval    ci.lb   ci.ub 
##   0.2143  0.1708  1.2551  149  0.2114  -0.1231  0.5518    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We have fit a simple MLMA model that estimates the overall meta-analytic mean. We can see that our model estimates this correctly. Remember that the true mean is 0.4, and we are pretty close to this value (i.e., 0.21). This is also true of our random effect variance estimates. In this case, we know that the MLMA model is ignoring the shared sampling variance dependence in effect sizes. We expect that this ‘should’ (at least on average) inflate Type I error rates. Assuming we did not know any better we would want to account for this dependence. Below, we describe a few corrections that meta-analysts can apply that should overcome the problems associated with not accounting for effect size dependence.

Before moving on to some useful corrections, users should be aware that the most up-to-date version of metafor (version 2.5-101) does now provide users with some protection against Type I errors. Instead of using the number of effect sizes in the calculation of the degrees of freedom we can instead make use of the total numbers of papers instead. We show in our simulations that a “papers-1” degrees of freedom can be fairly good. This can be implemented as follows after installing the development version of metafor (see “R Packages Required” above):

mod_multilevel_pdf = rma.mv(yi = yi, V = vi, mods = ~1, random = list(~1 | study_id, 
    ~1 | obs), data = data, test = "t", dfs = "contain")
summary(mod_multilevel_pdf)
## 
## Multivariate Meta-Analysis Model (k = 150; method: REML)
## 
##   logLik  Deviance       AIC       BIC      AICc 
## -87.3457  174.6914  180.6914  189.7033  180.8569   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed    factor 
## sigma^2.1  0.8567  0.9256     30     no  study_id 
## sigma^2.2  0.0779  0.2791    150     no       obs 
## 
## Test for Heterogeneity:
## Q(df = 149) = 18903.9961, p-val < .0001
## 
## Model Results:
## 
## estimate      se    tval  df    pval    ci.lb   ci.ub 
##   0.2143  0.1708  1.2551  29  0.2195  -0.1349  0.5636    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see here that the df for the model has changes from 149 to 29, and the p-value has been adjusted accordingly.

Correction using Robust Variance Estimator (RVE) with Saitterwaite Degrees of Freedom Correction

A very simple solution is to make use of robust variance estimators (RVE). This can be done in a few packages, but very easily using the clubSandwich package (Hedges et al. 2010, Tipton and Pustejovsky 2015, Pustejovsky 2020), which also works well with metafor (Viechtbauer 2010) models. This approach also makes use of a Saitterwaite degrees of freedom correction (Satterthwaite 1946, Tipton and Pustejovsky 2015). This works with metafor objects quite elegantly. The benefit of such an approach is simply that we need not make any assumptions about what the correlation between effect sizes actually is (assuming we didn’t know the true correlation) (Hedges et al. 2010, Tipton and Pustejovsky 2015). In addition, it also will account for possible heteroscedascity. This solution can be implemented as follows using our MLMA model we fit in the above section.

mod_RVE <- coef_test(mod_multilevel, vcov = "CR2", cluster = data$study_id)
print(mod_RVE)
##     Coef. Estimate    SE t-stat d.f. p-val (Satt) Sig.
## 1 intrcpt    0.214 0.171   1.26   29        0.219

With this simple (and well balanced) data, our RVE approaches doesn’t change the results much.

Correction by Applying Bayesian Multi-level Meta-analytic Model

As we describe in our comment, Bayesian approaches, assuming one has a good sample size, do a very good job correcting for inflated Type I errors across a variety of situations. Bayesian MLMA models can be fit in various packages. Probably the most flexible for meta-analyst’s are MCMCglmm (Hadfield 2010) and brms (Bürkner 2017, 2018). Here, we demonstrate how to fit the same MLMA model using MCMCglmm which has a syntax that is different from the typical one used in packages such as metafor and lme4 (Bates et al. 2015), which meta-analysts might be more accustomed too.

prior <- list(R = list(V = 1, nu = 0.002), G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, 
    alpha.V = 25^2)))

bayes_multilevel <- MCMCglmm(yi ~ 1, mev = data$vi, random = ~study_id, data = data, 
    prior = prior, verbose = FALSE)
summary(bayes_multilevel)
## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 90 
## 
##  G-structure:  ~study_id
## 
##          post.mean l-95% CI u-95% CI eff.samp
## study_id      1.01    0.465     1.66      186
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units    0.0803   0.0583    0.102     1000
## 
##  Location effects: yi ~ 1 
## 
##             post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept)     0.213   -0.205    0.550     1106  0.24

As expected, our credible intervals get a little bit wider.

Correction by Modeling the Entire Sampling Covariance Matrix

Of course, we can also take an approach proposed by Noble et al. (2017), where we fit the covariance matrix directly by simply assuming that effects that come from the same study are correlated by r = 0.5. Ultimately, one could change this correlation, depending on the situation and context, but r = 0.5 will probably suffice in many situations. This assumes, however, that the degree of correlation among effect sizes within a study is the same across studies. This assumption is relaxed in the RVE approaches described above. We can also test whether this is a safe assumption by combining it with a ClubSandwich estimator. We can build the matrix an implement this approach as follows:

vcv <- impute_covariance_matrix(vi = data$vi, cluster = data$study_id, r = 0.5)

mod_multilevel_vcv <- metafor::rma.mv(yi = yi, V = vcv, mods = ~1, random = list(~1 | 
    study_id, ~1 | obs), data = data, test = "t")

mod_multilevel_vcv <- coef_test(mod_multilevel_vcv, vcov = "CR2", cluster = data$study_id)
print(mod_multilevel_vcv)
##     Coef. Estimate   SE t-stat d.f. p-val (Satt) Sig.
## 1 intrcpt    0.212 0.17   1.24   29        0.223

Conclusions

The goal of our short tutorial was to dispel the idea that overcoming, and implementing, solutions to deal with non-independent effect sizes when working with multi-level meta-analytic models is challenging. Our simulations show (Nakagawa et al. 2021) that there are a number of very easily implemented solutions. As such, meta-analyst’s can harness the power of MLMA models without the need to average effect sizes, as suggested by Song et al. (2020).

Session Info

R version 4.0.5 (2021-03-31)

Platform: x86_64-apple-darwin17.0 (64-bit)

locale: en_AU.UTF-8||en_AU.UTF-8||en_AU.UTF-8||C||en_AU.UTF-8||en_AU.UTF-8

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: pander(v.0.6.3), clubSandwich(v.0.5.3), robumeta(v.2.0), brms(v.2.15.0), Rcpp(v.1.0.6), MCMCglmm(v.2.32), ape(v.5.4-1), coda(v.0.19-4), metafor(v.2.5-101), Matrix(v.1.3-2), gridExtra(v.2.3), kableExtra(v.1.3.4), MASS(v.7.3-53.1), forcats(v.0.5.1), stringr(v.1.4.0), dplyr(v.1.0.5), purrr(v.0.3.4), readr(v.1.4.0), tidyr(v.1.1.3), tibble(v.3.1.0), ggplot2(v.3.3.3) and tidyverse(v.1.3.0)

loaded via a namespace (and not attached): readxl(v.1.3.1), backports(v.1.2.1), systemfonts(v.1.0.1), plyr(v.1.8.6), igraph(v.1.2.6), splines(v.4.0.5), crosstalk(v.1.1.1), TH.data(v.1.0-10), rstantools(v.2.1.1), inline(v.0.3.17), digest(v.0.6.27), htmltools(v.0.5.1.1), rsconnect(v.0.8.16), fansi(v.0.4.2), magrittr(v.2.0.1), remotes(v.2.3.0), modelr(v.0.1.8), RcppParallel(v.5.0.3), matrixStats(v.0.58.0), sandwich(v.3.0-0), xts(v.0.12.1), svglite(v.2.0.0), prettyunits(v.1.1.1), colorspace(v.2.0-0), rvest(v.1.0.0), haven(v.2.3.1), xfun(v.0.22), callr(v.3.6.0), crayon(v.1.4.1), jsonlite(v.1.7.2), lme4(v.1.1-26), survival(v.3.2-10), zoo(v.1.8-9), glue(v.1.4.2), gtable(v.0.3.0), emmeans(v.1.5.5-1), webshot(v.0.5.2), V8(v.3.4.0), pkgbuild(v.1.2.0), rstan(v.2.21.1), abind(v.1.4-5), scales(v.1.1.1), mvtnorm(v.1.1-1), DBI(v.1.1.1), miniUI(v.0.1.1.1), viridisLite(v.0.4.0), xtable(v.1.8-4), klippy(v.0.0.0.9500), StanHeaders(v.2.21.0-7), stats4(v.4.0.5), DT(v.0.18), htmlwidgets(v.1.5.3), httr(v.1.4.2), threejs(v.0.3.3), ellipsis(v.0.3.1), pkgconfig(v.2.0.3), loo(v.2.4.1), sass(v.0.3.1), dbplyr(v.2.1.1), utf8(v.1.2.1), tidyselect(v.1.1.0), rlang(v.0.4.10), reshape2(v.1.4.4), later(v.1.1.0.1), munsell(v.0.5.0), cellranger(v.1.1.0), tools(v.4.0.5), cli(v.2.4.0), generics(v.0.1.0), pacman(v.0.5.1), broom(v.0.7.6), mathjaxr(v.1.4-0), ggridges(v.0.5.3), evaluate(v.0.14), fastmap(v.1.1.0), yaml(v.2.2.1), processx(v.3.5.1), knitr(v.1.31), fs(v.1.5.0), nlme(v.3.1-152), mime(v.0.10), projpred(v.2.0.2), formatR(v.1.9), xml2(v.1.3.2), compiler(v.4.0.5), bayesplot(v.1.8.0), shinythemes(v.1.2.0), rstudioapi(v.0.13), curl(v.4.3), gamm4(v.0.2-6), reprex(v.2.0.0), statmod(v.1.4.35), bslib(v.0.2.4), stringi(v.1.5.3), ps(v.1.6.0), Brobdingnag(v.1.2-6), cubature(v.2.0.4.1), lattice(v.0.20-41), nloptr(v.1.2.2.2), markdown(v.1.1), shinyjs(v.2.0.0), tensorA(v.0.36.2), vctrs(v.0.3.7), pillar(v.1.5.1), lifecycle(v.1.0.0), jquerylib(v.0.1.3), bridgesampling(v.1.0-0), estimability(v.1.3), corpcor(v.1.6.9), httpuv(v.1.5.5), R6(v.2.5.0), bookdown(v.0.21), promises(v.1.2.0.1), codetools(v.0.2-18), boot(v.1.3-27), colourpicker(v.1.1.0), gtools(v.3.8.2), assertthat(v.0.2.1), withr(v.2.4.1), shinystan(v.2.5.0), multcomp(v.1.4-16), mgcv(v.1.8-34), parallel(v.4.0.5), hms(v.1.0.0), grid(v.4.0.5), minqa(v.1.2.4), rmarkdown(v.2.7), shiny(v.1.6.0), lubridate(v.1.7.10), base64enc(v.0.1-3) and dygraphs(v.1.1.1.6)

References

Bates, D., M. Maechler, B. Bolker, and S. Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67:1–48. doi:10.18637/jss.v067.i01.

Bürkner, P.-C. 2017. Brms: An r package for bayesian multilevel models using stan. Journal of Statistical Software 80:1–28. doi:10.18637/jss.v080.i01.

Bürkner, P.-C. 2018. Advanced bayesian multilevel modeling with the r package brms. The R Journal 10:395–411. doi:10.32614/RJ–2018–017.

Hadfield, J. 2010. MCMC methods for multi-response generalised linear mixed models: The mcmcglmm r package. Journal of Statistical Software 33:1–22.

Hedges, L. V., E. Tipton, and M. C. Johnson. 2010. Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods 1:39–65.

Nakagawa, S., A. M. Senior, W. Viechtbauer, and D. W. A. Noble. 2021. An assessment of statistical methods for non-independent data in ecological meta-analyses: Comment. Ecology in press.

Noble, D. W., M. Lagisz, R. E. O’Dea, and S. Nakagawa. 2017. Non‐independence and sensitivity analyses in ecological and evolutionary meta‐analyses. Molecular Ecology 26:2410–2425.

Pustejovsky, J. 2020. ClubSandwich: Cluster-robust (sandwich) variance estimators with small-sample corrections. R package version 0.5.1. https://CRAN.R-project.org/package=clubSandwich.

Satterthwaite, F. E. 1946. An approximate distribution of estimates of variance components. Biometrics Bulletin 2:110–114.

Song, C., S. D. Peacor, C. W. Osenberg, and J. R. Bence. 2020. An assessment of statistical methods for nonindependent data in ecological meta-analyses. Ecology online: e03184.

Tipton, E., and J. E. Pustejovsky. 2015. Small-sample adjustments for tests of moderators and model fit using robust variance estimation in meta-regression. Journal of Educational and Behavioral Statistics 40:604–634.

Viechtbauer, W. 2010. Conducting meta-analyses in r with the metafor package. Journal of Statistical Software 36:1–48.