Analytical Approaches for Assessment of Publication Bias in Meta-analysis

In the last tutorial we talked about one of the most common visual tools for assessing whether publication bias might be present, funnel plots (Jennions et al., 2013; Nakagawa et al., 2022a; Rothstein et al., 2005). However, how do we know for sure if publication bias is actually going to be a problem? Do we have less subjective tools that can both quantitatively test and correct for the possibility of publication bias? The short answer is, we do! The caveat being here that these are still not definitive proof for publication bias and we must keep that in mind. Any results from publication bias tests should be interpreted with caution and viewed as a sensitivity analysis (Koricheva et al., 2013; Noble et al., 2017) because we will never truly know how many missing effect sizes actually exist.

In this tutorial, we’ll overview some ways we can attempt to understand whether publication bias is present using multilevel meta regression to quantitatively assess in a more objective way whether it’s present. We can also use these statistical tools to attempt to ‘correct’ for it.

Statistically Testing and Correcting for Publication Bias

Introduction

We’re now going to expand upon the meta-analysis by Arnold et al. (2021) to explore more rigorously whether publication bias is present, and if so, try and re-interpret effects as if such bias did not actually exist.

Download the Data

# Packages
pacman::p_load(tidyverse, metafor, orchaRd)

# Download the data. Exclude NA in r and sample size columns
arnold_data <- read.csv("https://raw.githubusercontent.com/pieterarnold/fitness-rmr-meta/main/MR_Fitness_Data_revised.csv")

# Exclude some NA's in sample size and r
arnold_data <- arnold_data[complete.cases(arnold_data$n.rep) & complete.cases(arnold_data$r),
    ]

# Calculate the effect size, ZCOR
arnold_data <- metafor::escalc(measure = "ZCOR", ri = r, ni = n.rep, data = arnold_data,
    var.names = c("Zr", "Zr_v"))

# Lets subset to endotherms for demonstration purposes
arnold_data_endo <- arnold_data %>%
    mutate(endos = ifelse(Class %in% c("Mammalia", "Aves"), "endo", "ecto")) %>%
    filter(endos == "endo" & Zr <= 3)  # Note that one sample that was an extreme outlier was removed in the paper.

# Add in observation-level (residual)
arnold_data_endo$obs <- 1:dim(arnold_data_endo)[1]

Meta-regression Approaches to Publication Bias

Nakagawa et al. (2022a; building off of work by Stanley and Doucouliagos, 2012; Stanley and Doucouliagos, 2014) outline a method for testing and accounting for publication bias using multilevel meta-regression approaches. Here, we include sampling standard error (SE) (and variance – V) directly as a moderator in our meta-regression model. We can test explicitly test whether sampling SE and V significantly explain variation in our effect size. To be more explicit, we can adjust our multilevel meta-regression model as follows:

\[ y_{i} = \mu + \beta_{se}se + s_{j[i]} + spp_{k[i]} + e_{i} + m_{i} \\ m_{i} \sim N(0, v_{i}) \\ s_{j} \sim N(0, \tau^2) \\ spp_{k} \sim N(0, \sigma_{k}^2) \\ e_{i} \sim N(0, \sigma_{e}^2) \]

Of course, we could also include a suite of other moderators in our analysis to also account for more heterogeneity like we have shown in past tutorials:

\[ y_{i} = \mu + \sum_{i = 1}^{N_{m}}\beta_{m}x_{m} + \beta_{se}se + s_{j[i]} + spp_{k[i]} + e_{i} + m_{i} \\ \] Here, \(\beta_{m}\) is the slope (or contrast) for the \(m^{th}\) moderator (where m = 1, …, \(N_{moderator}\) or \(N_{m}\)). Of course, \(\beta_{se}se\) can be included in this term, but for clarity we separate it out to be clear SE or V is included as a moderator in addition to other moderators. Stanley and Doucouliagos (2012) and Stanley and Doucouliagos (2014) have shown that when \(\mu\) in a model that estimates \(\beta_{se}\) is significant than the intercept (in this case, \(\mu\)) provides the best estimate of an adjusted meta-analytic that accounts for the possibility of publication bias (Nakagawa et al., 2022a) (see correction for original paper Nakagawa et al., 2022b). However, Stanley and Doucouliagos (2012) and Stanley and Doucouliagos (2014) have shown that when \(\mu\) is significant than including SE is best replaced with the sampling variance (i.e., \(SE^2\)) instead because the intercept using SE is downwardly biased.

As such, Nakagawa et al. (2022a) recommend a two step approach. First fitting a model with SE. If the intercept of that model is significant than re-fitting the model using V (or \(SE^2\)) as this will give us a better estimate of the adjusted mean effect size.

Why does including sampling variance in the model adjust the overall meta-analytic mean?

Task!

Think about how the intercept, \(\mu\), changes its meaning when a continuous moderator, such as \(se\), is included in the model. When estimating \(\beta_{se}\) explicitly how do we now interpret the intercept or meta-analytic mean?


Answer!

The reason why the intercept (\(\mu\)) is an adjusted meta-analytic mean is because when we include a continuous moderator the intercept itself takes on a different meaning. For example, if we fit the multilevel meta-regression model using SE than the intercept estimated in that model is the overall meta-analytic mean when the SE is zero. Or, in other words, when there is no sampling variability or uncertainty around the estimate. We can see that this is true by setting SE = 0 in the equation:

\[ y_{i} = \mu + \beta_{se}*0 = \mu \\ \]


Fitting a Multi-level Meta-Regression model to Test and Correct for Publication bias

Let’s now apply the methods proposed above to Arnold et al. (2021)’s data. To simplify, we are just going to remove other moderators and only fit SE and/ or V. This just simplifies the interpretation of the intercept, but we could always include the fitness trait type moderator in these models as well.

# Including sampling standard error as moderator
metareg_se <- rma.mv(yi = Zr, V = Zr_v, mods = ~sqrt(Zr_v), test = "t", dfs = "contain",
    data = arnold_data_endo, random = list(~1 | Ref, ~1 | obs))
summary(metareg_se)
## 
## Multivariate Meta-Analysis Model (k = 209; method: REML)
## 
##   logLik  Deviance       AIC       BIC      AICc   
## -77.9464  155.8928  163.8928  177.2237  164.0908   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed  factor 
## sigma^2.1  0.0626  0.2503     62     no     Ref 
## sigma^2.2  0.0380  0.1951    209     no     obs 
## 
## Test for Residual Heterogeneity:
## QE(df = 207) = 844.4496, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 207) = 14.3314, p-val = 0.0002
## 
## Model Results:
## 
##             estimate      se     tval   df    pval    ci.lb   ci.ub      
## intrcpt      -0.0668  0.0727  -0.9184   60  0.3621  -0.2122  0.0787      
## sqrt(Zr_v)    1.3908  0.3674   3.7857  207  0.0002   0.6665  2.1151  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We’ll also fit the model with sampling variance for good measure, but given that the intrcpt is not significant we can interpret the first model.

# Including sampling variance as moderator
metareg_v <- rma.mv(yi = Zr, V = Zr_v, mods = ~Zr_v, test = "t", dfs = "contain",
    data = arnold_data_endo, random = list(~1 | Ref, ~1 | obs))
summary(metareg_v)
## 
## Multivariate Meta-Analysis Model (k = 209; method: REML)
## 
##   logLik  Deviance       AIC       BIC      AICc   
## -76.1349  152.2698  160.2698  173.6006  160.4678   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed  factor 
## sigma^2.1  0.0559  0.2364     62     no     Ref 
## sigma^2.2  0.0381  0.1953    209     no     obs 
## 
## Test for Residual Heterogeneity:
## QE(df = 207) = 840.6907, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 207) = 18.7982, p-val < .0001
## 
## Model Results:
## 
##          estimate      se    tval   df    pval    ci.lb   ci.ub      
## intrcpt    0.0469  0.0467  1.0028   60  0.3200  -0.0466  0.1403      
## Zr_v       3.0051  0.6931  4.3357  207  <.0001   1.6387  4.3716  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpreting the results

Task!

Is there evidence for publication bias? If so, what is the adjusted meta-analytic mean estimate?


Answer!

Yes, there is evidence for publication bias because the slope estimate for sqrt(Zr) or the sampling standard error is significant. Given that the intrcpt is not significant we can interpret the model using SE instead of V. We can see from this model that the adjusted Zr when there is no uncertainty is -0.067with a 95% confidence interval that overlaps zero (i.e., 95% CI = -0.212 to 0.079). In other words, if no uncertainty around estimates exists, or we have a very high powered set of studies than we would expect the correlation to be, on average, -0.067.


Conclusions

Hopefully it is clear now how we can more objectively test, and correct, for the possibility of publication bias. Again, these should be viewed cautiously and used only as a sensitivity analysis. An important point that we did not mention is that, we need to use different moderators depending on the type of effect size being used in the meta-analysis. Zr is rather straight forward, but Nakagawa et al. (2022c) discuss the challenges in using standardised mean differences and log response ratio when fitting models with SE and V. They recommend instead using the effective sample size in meta-regression models.

References

Arnold, P. A., Delean, S., Cassey, P. and White, C. R. (2021). Meta‐analysis reveals that resting metabolic rate is not consistently related to fitness and performance in animals. Journal of Comparative Physiology B 191, 1097–1110.
Jennions, M. D., Lortie, C. J., Rosenberg, M. S. and Rothstein, H. R. (2013). Publication and related biases. In: Handbook of Meta-Analysis in Ecology and Evolution (eds J. Koricheva, J. Gurevitch & K. Mengersen). Princeton University Press, Princeton and Oxford. pp 207–236.
Koricheva, J., Gurevitch, J. and Mengersen, K. (2013). Handbook of meta-analysis in ecology and evolution. Princeton University Press, Princeton, New Jersey.
Nakagawa, S., Lagisz, M., Jennions, M. D., Koricheva, J., Daniel W. A. Noble, T. H. P., Sánchez-Tójar, A., Yang, Y. and O’Dea, R. E. (2022b). CORRIGENDUM: Methods for testing publication bias in ecological and evolutionary meta-analyses. Methods in Ecology and Evolution 13, 4–21, DOI: 10.1111/2041–210X.13908.
Nakagawa, S., Senior, A. M., Viechtbauer, W. and Noble, D. W. A. (2022c). An assessment of statistical methods for non-independent data in ecological meta-analyses: comment. Ecology 103, e03490, https://doi.org/10.1002/ecy.3490.
Nakagawa, S., Lagisz, M., Jennions, M. D., Koricheva, J., Daniel W. A. Noble, T. H. P., Sánchez-Tójar, A., Yang, Y. and O’Dea, R. E. (2022a). Methods for testing publication bias in ecological and evolutionary meta-analyses. Methods in Ecology and Evolution 13, 4–21.
Noble, D. W. A., Lagisz, M., O’Dea, R. E. and Nakagawa, S. (2017). Non‐independence and sensitivity analyses in ecological and evolutionary meta‐analyses. Molecular Ecology 26, 2410–2425.
Rothstein, H. R., Sutton, A. J. and Borenstein, M. (2005). Publication bias in meta-analysis: Prevention, assessment and adjustments. Wiley, Chichester.
Stanley, T. D. and Doucouliagos, H. (2012). Meta-regression analysis in eco- nomics and business. routledge.
Stanley, T. D. and Doucouliagos, H. (2014). Meta-regression approxi- mations to reduce publication selection bias. Research Synthesis Methods 5, 60–78.


Session Information

R version 4.2.0 (2022-04-22)

Platform: aarch64-apple-darwin20 (64-bit)

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: magick(v.2.7.3), vembedr(v.0.1.5), equatags(v.0.1.1), mathjaxr(v.1.6-0), pander(v.0.6.5), orchaRd(v.2.0), forcats(v.0.5.2), stringr(v.1.4.1), dplyr(v.1.0.10), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.1), tibble(v.3.1.8), ggplot2(v.3.3.6), tidyverse(v.1.3.2), flextable(v.0.7.1), metafor(v.3.8-1), metadat(v.1.2-0) and Matrix(v.1.4-1)

loaded via a namespace (and not attached): nlme(v.3.1-157), fs(v.1.5.2), lubridate(v.1.8.0), httr(v.1.4.4), tools(v.4.2.0), backports(v.1.4.1), bslib(v.0.4.0), utf8(v.1.2.2), R6(v.2.5.1), DBI(v.1.1.3), colorspace(v.2.0-3), withr(v.2.5.0), tidyselect(v.1.1.2), curl(v.4.3.2), compiler(v.4.2.0), cli(v.3.4.0), rvest(v.1.0.3), formatR(v.1.12), pacman(v.0.5.1), xml2(v.1.3.3), officer(v.0.4.2), bookdown(v.0.28), sass(v.0.4.2), scales(v.1.2.1), systemfonts(v.1.0.4), digest(v.0.6.29), rmarkdown(v.2.16.1), base64enc(v.0.1-3), pkgconfig(v.2.0.3), htmltools(v.0.5.3), dbplyr(v.2.2.1), fastmap(v.1.1.0), rlang(v.1.0.5), readxl(v.1.4.1), rstudioapi(v.0.14), jquerylib(v.0.1.4), generics(v.0.1.3), jsonlite(v.1.8.0), zip(v.2.2.0), googlesheets4(v.1.0.1), magrittr(v.2.0.3), Rcpp(v.1.0.9), munsell(v.0.5.0), fansi(v.1.0.3), gdtools(v.0.2.4), lifecycle(v.1.0.2), stringi(v.1.7.8), yaml(v.2.3.5), grid(v.4.2.0), crayon(v.1.5.1), lattice(v.0.20-45), haven(v.2.5.1), hms(v.1.1.2), knitr(v.1.40), klippy(v.0.0.0.9500), pillar(v.1.8.1), uuid(v.1.1-0), xslt(v.1.4.3), reprex(v.2.0.2), glue(v.1.6.2), evaluate(v.0.16), data.table(v.1.14.2), remotes(v.2.4.2), modelr(v.0.1.9), vctrs(v.0.4.1), tzdb(v.0.3.0), cellranger(v.1.1.0), gtable(v.0.3.1), assertthat(v.0.2.1), locatexec(v.0.1.1), cachem(v.1.0.6), xfun(v.0.33), broom(v.1.0.1), googledrive(v.2.0.0), gargle(v.1.2.1) and ellipsis(v.0.3.2)