Introduction to Publication Bias in Meta-analysis

Meta-analyst’s have worked hard to develop tools that can be used to try and understand different forms of publication practices and biases within the scientific literature. Such biases can occur if studies reporting non-significant or opposite results to what is predicted are not found in systematic searches [‘i.e., the ’file-drawer’ problem; Jennions et al. (2013)]. Alternatively, biases could result from selective reporting or ‘p-hacking’.

Visual and quantitative tools have been developed try and identify and ‘correct’ for such biases on meta-analytic results (Jennions et al., 2013; Nakagawa et al., 2022; Rothstein et al., 2005). Having said that, aside from working hard to try and incorporate ‘gray literature’ (unpublished theses, government reports, etc.) and working hard to include work done in non-English speaking languages, there is little one can truly due to counteract publication biases beyond a few simple tools (which all have limitations in themselves). We cannot know for certain what isn’t published in many cases or how a sample of existing work on a topic might be biased. Nonetheless, exploring the possibility of publication bias and its possible effects on conclusions is a core component of meta-analysis (O’Dea et al., 2021).

In this tutorial, we’ll overview some ways we can attempt to understand whether publication bias is present or not using visual tools. In the next tutorial, we will cover some analytical approaches that might be used as a sensitivity analysis to explicitly test whether publication bias is present and attempt to to estimate how this changes the effect size if it didn’t exist. Of course, often we will never know whether such biases exist and high heterogeneity can result in apparent publication bias when non exist. The goal here is to formally play a thought experiment: if publication bias were to exist what form would it be expected to take and how would our conclusions change if we were to have access to all available studies regardless of significance or power?

Visually Assessing Publication Bias

Introduction

We’re going to have a look at a meta-analysis by Arnold et al. (2021) that explores the relationship between resting metabolic rate and fitness in animals. Publication bias is slightly subtle in this particular meta-analysis, but it does appear to be present in some form both visually and analytically. We’ll start off this tutorial just visually exploring for evidence of publication bias and dicuss what it might look like and why.

Download the Data

# Packages
pacman::p_load(tidyverse, metafor, orchaRd)

# Download the data. Exclude NA in r and sample size columns
arnold_data <- read.csv("https://raw.githubusercontent.com/pieterarnold/fitness-rmr-meta/main/MR_Fitness_Data_revised.csv")

# Exclude some NA's in sample size and r
arnold_data <- arnold_data[complete.cases(arnold_data$n.rep) & complete.cases(arnold_data$r),
    ]

# Calculate the effect size, ZCOR
arnold_data <- metafor::escalc(measure = "ZCOR", ri = r, ni = n.rep, data = arnold_data,
    var.names = c("Zr", "Zr_v"))

# Lets subset to endotherms for demonstration purposes
arnold_data_endo <- arnold_data %>%
    mutate(endos = ifelse(Class %in% c("Mammalia", "Aves"), "endo", "ecto")) %>%
    filter(endos == "endo" & Zr <= 3)  # Note that one sample that was an extreme outlier was removed in the paper.

# Add in observation-level (residual)
arnold_data_endo$obs <- 1:dim(arnold_data_endo)[1]

Contour-enhanced Funnel Plots

Funnel plots are by far the most common visual tool for assessing the possibility of publication bias (Nakagawa et al., 2022). Just like any exploratory analysis, these are just visual tools. Let’s have a look at a funnel plot of the data. Funnel plots plot the the effect size (x-axis) against some form of uncertainty around the effect size, such as sampling variance or precision (y-axis). While we acknowledge that many other types of plots exist to explore the possibility for publication bias (Jennions et al., 2013; Nakagawa et al., 2022; Rothstein et al., 2005), we will only cover this more common type.

If no publication bias exists then we would expect the plot to look fairly symmetrical and funnel shaped (hence why it’s called a funnel plot!). The reason why the shape is a funnel is because the sampling variance is expected to decrease (or the precision increase) when the sample size, and thus power, increases. These ‘high-powered’ studies are at the top of the ‘funnel’ in the narrow-necked region, so to say, because we expect the effect size form these studies to fluctuate very little based on sampling process. In contrast, as the power of studies decrease, and therefore their sampling variance increases, we expect the spread of effect sizes to increase simply because small sample sizes results in greater variability of effects and effects that are larger in magnitude (by chance alone).

# Lets make a funnel plot to visualize the data in relation to the precision,
# inverse sampling standard error,
metafor::funnel(x = arnold_data_endo$Zr, vi = arnold_data_endo$Zr_v, yaxis = "seinv",
    digits = 2, level = c(0.1, 0.05, 0.01), shade = c("white", "gray55", "gray 75"),
    las = 1, xlab = "Correlation Coefficient (r)", atransf = tanh, legend = TRUE)
Funnel plot depicting the correlation between metabolism and fitness as a function of precision (1 / SE). The dotted lines are the theoretical 95% sampling variance intervals - the interval with which we expect effect size estimates to fall within if only sampling variance drives differences in effects. Shaded regions represent the p-value of studies. The white region indicates studies where the p-value is between 0.1 and 1; dark gray where the p-value of studies is between 0.05 and 0.1 and the lighter gray regions where the p-value of studies is significant.

Figure 1: Funnel plot depicting the correlation between metabolism and fitness as a function of precision (1 / SE). The dotted lines are the theoretical 95% sampling variance intervals - the interval with which we expect effect size estimates to fall within if only sampling variance drives differences in effects. Shaded regions represent the p-value of studies. The white region indicates studies where the p-value is between 0.1 and 1; dark gray where the p-value of studies is between 0.05 and 0.1 and the lighter gray regions where the p-value of studies is significant.

We can see from Fig. 1 above the typical funnel shape. You will notice that most effects lie in the positive correlation space – in other words there is a strong positive correlation between BMR and fitness. However, we also find some studies that show the opposite pattern. We expect that based on sampling theory alone, and indeed many of these effects fall close to the dotted sampling error intervals. Studies in the light grey regions are studies where the p-value was significant.

What do we expect if publication bias were present?

Task!

Think about what you would expect the funnel plot to look like and why.


Answer!

We might expect under a file-drawer situation (i.e., where researchers stash away poorer quality studies showing opposite effects in their desk drawers) that studies with low power (i.e., low precision, wide standard errors, and small sample sizes) and non-significant correlations will go unpublished. This should be particularly true for studies that show the opposite to what we might predict by theory – specifically, negative correlations from studies with small sample sizes / low precision that are not significant. This is one factor that can drive what we call funnel asymmetry, showing a bunch of missing effect sizes in the bottom left corner of the funnel.


Interpreting our Funnel Plot

If we look at Fig. 1 we do see some hint of this scenario. There is a noticeable blank space in the bottom left corner with negative correlations based on very small sample sizes that are generally small to moderate in magnitude going unpublished. The contour-enhanced funnel plot also tells us that these are studies that failed to find a significant correlation. But, interestingly, we also see that if the magnitude of correlation is large enough in the negative direction even with small sample sizes these can get published, but for the most part these are significant at 0.05. We can only speculate as to why or if this is even a real signature of publication bias. However, this might suggest that if folks estimate large enough correlations and these are in the opposite direction to what one might expect these arguably ‘surprising’ results are more likely to be published than if the correlation is weak and in the opposite direction.


Funnel Plot with Meta-analytic Residuals

Visually identifying publication bias in funnel plots can be extremely challenging. That’s because high amounts of heterogeneity, artefacts, data irregularities (mistakes, fraud) and simply just chance can just as easily result in apparent funnel asymmetry (Nakagawa et al., 2022). In fact, publication bias can even be present without any asymmetry at all! (see Sanchez-Tojar et al., 2018). Given that heterogeneity is very high in ecological and evolutionary meta-analyses (see Senior et al., 2016), we should try to attempt to correct for this (as best as we can anyway) to determine if patterns in the funnel plot remain. We can achieve this to some extent by plotting a funnel plot of meta-analytic residuals, taken from a full multi-level meta-regression model that explicitly accounts for as much heterogeneity as possible. Nakagawa et al. (2022) provide a thorough supplement that walks through different publication bias methods, how to apply them and also plotting different types of funnel plots.

# Fitness type
metareg <- rma.mv(yi = Zr, V = Zr_v, mods = ~FitnessClassification, data = arnold_data_endo,
    random = list(~1 | Ref, ~1 | obs))
summary(metareg)
## 
## Multivariate Meta-Analysis Model (k = 209; method: REML)
## 
##   logLik  Deviance       AIC       BIC      AICc  ​ 
## -63.9162  127.8324  147.8324  180.8654  148.9903   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed  factor 
## sigma^2.1  0.0828  0.2878     62     no     Ref 
## sigma^2.2  0.0260  0.1611    209     no     obs 
## 
## Test for Residual Heterogeneity:
## QE(df = 201) = 846.1346, p-val < .0001
## 
## Test of Moderators (coefficients 2:8):
## QM(df = 7) = 44.9882, p-val < .0001
## 
## Model Results:
## 
##                                         estimate      se     zval    pval​ 
## intrcpt                                   0.2282  0.0606   3.7692  0.0002 
## FitnessClassificationAerobic capacity    -0.2366  0.1002  -2.3616  0.0182 
## FitnessClassificationBoldness             0.2767  0.1986   1.3933  0.1635 
## FitnessClassificationDominance            0.2147  0.1398   1.5361  0.1245 
## FitnessClassificationGrowth              -0.0326  0.1270  -0.2567  0.7974 
## FitnessClassificationMovement/activity   -0.3791  0.0652  -5.8172  <.0001 
## FitnessClassificationReproduction        -0.1157  0.0838  -1.3812  0.1672 
## FitnessClassificationSurvival            -0.1828  0.0864  -2.1141  0.0345 
##                                           ci.lb    ci.ub 
## intrcpt                                  0.1096   0.3469  *** 
## FitnessClassificationAerobic capacity   -0.4330  -0.0402    * 
## FitnessClassificationBoldness           -0.1125   0.6658      
## FitnessClassificationDominance          -0.0593   0.4887      
## FitnessClassificationGrowth             -0.2814   0.2162      
## FitnessClassificationMovement/activity  -0.5069  -0.2514  *** 
## FitnessClassificationReproduction       -0.2799   0.0485      
## FitnessClassificationSurvival           -0.3522  -0.0133    * 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can have a look at how much variation this model explains by calculating marginal (\(R_{marginal}^2\)) and conditional (\(R_{conditional}^2\)) \(R^2\).

# How much variation?
orchaRd::r2_ml(metareg) * 100
##    R2_marginal R2_conditional 
##           26.8           82.5

What is the difference between marginal and conditional \(R^2\)?

Task!

Interpret the meaning of both values output from our \(R^2\) calculations


Answer!

From the output above we can see that the fixed effects or moderator explains 26.773% of the variation in Zr estimates (as indicated by \(R_{marginal}^2\)), while when we consider both the fixed and random effects than the model explains 82.525% of the variation in Zr estimates (as indicated by \(R_{conditional}^2\)). Interestingly, we still have a significant amount of residual heterogeneity that we cannot explain (\(QE\) = 846.135, p = 5.477^{-80}).


Funnel Plot with Residuals

Now that we have a full model estimated, we can extract the residuals from this model and re-make our funnel plot using the residuals.

# Extract residuals based on the fixed effect predictions (marginal) and the
# fixed and random effect (conditional)
resid_marg <- arnold_data_endo$Zr - predict(metareg)[[1]]  # Calculates the residuals based on fixed effects only
vi_residm <- arnold_data_endo$Zr_v + predict(metareg)[[2]]^2  # Adds prediction errors

# Conditional requires random effect blups for study (Ref) and effect size obs
blups <- ranef(metareg)  # extracts blups for each random effect. Stored as a list.

study_ef <- blups$Ref[[1]][match(arnold_data_endo$Ref, row.names(blups$Ref))]  # This code will expand each random effect for study out to match the study ID across the entire dataset
study_vi <- (blups$Ref[[2]][match(arnold_data_endo$Ref, row.names(blups$Ref))])^2

effect_effect <- blups$obs[[1]]
vi_effect <- blups$obs[[1]]^2

# Now we have everything we need to calculate residuals
resid_cond <- arnold_data_endo$Zr - (predict(metareg)[[1]] + study_ef + effect_effect)
vi_resid_cond <- vi_residm + study_vi + vi_effect

# Lets make a funnel plot to visualize the data in relation to the precision,
# inverse sampling standard error,
par(mfrow = c(1, 2))
metafor::funnel(x = resid_marg, vi = vi_residm, yaxis = "seinv", digits = 2, level = c(0.1,
    0.05, 0.01), shade = c("white", "gray55", "gray 75"), las = 1, xlab = "Meta-analytic Residuals (marginal)",
    legend = FALSE)
metafor::funnel(x = resid_cond, vi = vi_resid_cond, yaxis = "seinv", digits = 2,
    level = c(0.1, 0.05, 0.01), shade = c("white", "gray55", "gray 75"), las = 1,
    xlab = "Meta-analytic Residuals (conditional)", legend = FALSE)
Funnel plot of meta-analytic residuals against precision (1 / SE). The dotted lines are the theoretical 95% sampling variance intervals - the interval with which we expect effect size estimates to fall within if only sampling variance drives differences in effects. Shaded regions represent the p-value of studies. The white region indicates studies where the p-value is between 0.1 and 1; dark gray where the p-value of studies is between 0.05 and 0.1 and the lighter gray regions where the p-value of studies is significant.

Figure 2: Funnel plot of meta-analytic residuals against precision (1 / SE). The dotted lines are the theoretical 95% sampling variance intervals - the interval with which we expect effect size estimates to fall within if only sampling variance drives differences in effects. Shaded regions represent the p-value of studies. The white region indicates studies where the p-value is between 0.1 and 1; dark gray where the p-value of studies is between 0.05 and 0.1 and the lighter gray regions where the p-value of studies is significant.

How do our conclusions about the possibility of publication bias change by plotting residuals?

Task!

After looking at the new funnel plots (Fig. 2) that account for varying amounts of heterogeneity in the model would you say our conclusions about publication bias being present or not change at all? If not, why not?


Answer!

Maybe. Even after accounting for factors that explain (or estimate) heterogeneity in the data we still see areas of the funnel plot that appear to be missing effect size estimates, notably for the marginal residuals we still see the missing effects on the left side. However, when looking at the conditional (taking into account fixed and random effects) we see that there is a gap yet when precision is low, but we are still seeing some effect size estimates being published with very low precision, which is not really what we might expect from publication bias.


Conclusions

While funnel plots can be useful they do have clear limitations. Nakagawa et al. (2021) overview the many assumption being made, particularly when using residuals, and how non-independence of sampling errors can cause problems. In the next section, we’ll overview some more sophisticated, less subjective, ways to assess and correct for publication bias.

References

Arnold, P. A., Delean, S., Cassey, P. and White, C. R. (2021). Meta‐analysis reveals that resting metabolic rate is not consistently related to fitness and performance in animals. Journal of Comparative Physiology B 191, 1097–1110.
Jennions, M. D., Lortie, C. J., Rosenberg, M. S. and Rothstein, H. R. (2013). Publication and related biases. In: Handbook of Meta-Analysis in Ecology and Evolution (eds J. Koricheva, J. Gurevitch & K. Mengersen). Princeton University Press, Princeton and Oxford. pp 207–236.
Nakagawa, S., Lagisz, M., O’Dea, R. E., Rutkowska, J., Yang, Y., Noble, D. W. A. and Senior, A. M. (2021). The orchard plot: Cultivating forest plots for use in ecology, evolution and beyond. Research Synthesis Methods 12, 4–12.
Nakagawa, S., Lagisz, M., Jennions, M. D., Koricheva, J., Daniel W. A. Noble, T. H. P., Sánchez-Tójar, A., Yang, Y. and O’Dea, R. E. (2022). Methods for testing publication bias in ecological and evolutionary meta-analyses. Methods in Ecology and Evolution 13, 4–21.
O’Dea, R. E., Lagisz, M., Jennions, M. D., Koricheva, J., Noble, D. W. A., Parker, T. H., Gurevitch, J., Page, M. J., Stewart, G., Moher, D., et al. (2021). Preferred reporting items for systematic reviews and meta-analyses in ecology and evolutionary biology: A PRISMA extension. Biological Reviews doi: 10.1111/brv.12721,.
Rothstein, H. R., Sutton, A. J. and Borenstein, M. (2005). Publication bias in meta-analysis: Prevention, assessment and adjustments. Wiley, Chichester.
Sanchez-Tojar, A., Nakagawa, S., ́n, M. S.-F., Martin, D. A., Ramani, S., Girndt, A., kony, V. B. ́, Kempenaers, B., Liker, A. ́s, Westneat, D. F., et al. (2018). Meta-analysis challenges a textbook example of status signalling and demonstrates publication bias. eLife 7, e37385. DOI: https://doi.org/10.7554/eLife.37385.
Senior, A. M., Grueber, C. E., Kamiya, T., Lagisz, M., O’dwyer, K., Santos, E. S. A. and Nakagawa, S. (2016). Heterogeneity in ecological and evolutionary meta‐analyses: Its magnitude and implications. Ecology 97, 3293–3299.


Session Information

R version 4.0.3 (2020-10-10)

Platform: x86_64-apple-darwin17.0 (64-bit)

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: magick(v.2.7.2), vembedr(v.0.1.5), equatags(v.0.1.0), mathjaxr(v.1.6-0), pander(v.0.6.4), orchaRd(v.2.0), forcats(v.0.5.1), stringr(v.1.4.0), dplyr(v.1.0.9), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.0), tibble(v.3.1.7), ggplot2(v.3.3.6), tidyverse(v.1.3.1), flextable(v.0.6.6), metafor(v.3.4-0), metadat(v.1.3-0) and Matrix(v.1.3-4)

loaded via a namespace (and not attached): nlme(v.3.1-152), fs(v.1.5.2), lubridate(v.1.8.0), httr(v.1.4.3), tools(v.4.0.3), backports(v.1.4.1), bslib(v.0.3.1), utf8(v.1.2.2), R6(v.2.5.1), DBI(v.1.1.3), colorspace(v.2.0-3), withr(v.2.5.0), tidyselect(v.1.1.2), curl(v.4.3.2), compiler(v.4.0.3), cli(v.3.3.0), rvest(v.1.0.2), formatR(v.1.11), pacman(v.0.5.1), xml2(v.1.3.3), officer(v.0.3.18), bookdown(v.0.22), sass(v.0.4.1), scales(v.1.2.0), rappdirs(v.0.3.3), systemfonts(v.1.0.2), digest(v.0.6.29), rmarkdown(v.2.14), base64enc(v.0.1-3), pkgconfig(v.2.0.3), htmltools(v.0.5.2), highr(v.0.9), dbplyr(v.2.2.1), fastmap(v.1.1.0), rlang(v.1.0.3), readxl(v.1.4.0), rstudioapi(v.0.13), jquerylib(v.0.1.4), generics(v.0.1.2), jsonlite(v.1.8.0), zip(v.2.2.0), magrittr(v.2.0.3), Rcpp(v.1.0.8.3), munsell(v.0.5.0), fansi(v.1.0.3), gdtools(v.0.2.3), lifecycle(v.1.0.1), stringi(v.1.7.6), yaml(v.2.3.5), grid(v.4.0.3), crayon(v.1.5.1), lattice(v.0.20-44), haven(v.2.5.0), hms(v.1.1.1), knitr(v.1.39), klippy(v.0.0.0.9500), pillar(v.1.7.0), uuid(v.1.1-0), reprex(v.2.0.1), xslt(v.1.4.3), glue(v.1.6.2), evaluate(v.0.15), data.table(v.1.14.2), remotes(v.2.4.2), modelr(v.0.1.8), vctrs(v.0.4.1), tzdb(v.0.3.0), cellranger(v.1.1.0), gtable(v.0.3.0), assertthat(v.0.2.1), locatexec(v.0.1.1), xfun(v.0.31), broom(v.1.0.0) and ellipsis(v.0.3.2)