Meta-analyst’s have worked hard to develop tools that can be used to try and understand different forms of publication practices and biases within the scientific literature. Such biases can occur if studies reporting non-significant or opposite results to what is predicted are not found in systematic searches [‘i.e., the ’file-drawer’ problem; Jennions et al. (2013)]. Alternatively, biases could result from selective reporting or ‘p-hacking’.
Visual and quantitative tools have been developed try and identify and ‘correct’ for such biases on meta-analytic results (Jennions et al., 2013; Nakagawa et al., 2022; Rothstein et al., 2005). Having said that, aside from working hard to try and incorporate ‘gray literature’ (unpublished theses, government reports, etc.) and working hard to include work done in non-English speaking languages, there is little one can truly due to counteract publication biases beyond a few simple tools (which all have limitations in themselves). We cannot know for certain what isn’t published in many cases or how a sample of existing work on a topic might be biased. Nonetheless, exploring the possibility of publication bias and its possible effects on conclusions is a core component of meta-analysis (O’Dea et al., 2021).
In this tutorial, we’ll overview some ways we can attempt to understand whether publication bias is present or not using visual tools. In the next tutorial, we will cover some analytical approaches that might be used as a sensitivity analysis to explicitly test whether publication bias is present and attempt to to estimate how this changes the effect size if it didn’t exist. Of course, often we will never know whether such biases exist and high heterogeneity can result in apparent publication bias when non exist. The goal here is to formally play a thought experiment: if publication bias were to exist what form would it be expected to take and how would our conclusions change if we were to have access to all available studies regardless of significance or power?
We’re going to have a look at a meta-analysis by Arnold et al. (2021) that explores the relationship between resting metabolic rate and fitness in animals. Publication bias is slightly subtle in this particular meta-analysis, but it does appear to be present in some form both visually and analytically. We’ll start off this tutorial just visually exploring for evidence of publication bias and dicuss what it might look like and why.
# Packages
pacman::p_load(tidyverse, metafor, orchaRd)
# Download the data. Exclude NA in r and sample size columns
arnold_data <- read.csv("https://raw.githubusercontent.com/pieterarnold/fitness-rmr-meta/main/MR_Fitness_Data_revised.csv")
# Exclude some NA's in sample size and r
arnold_data <- arnold_data[complete.cases(arnold_data$n.rep) & complete.cases(arnold_data$r),
]
# Calculate the effect size, ZCOR
arnold_data <- metafor::escalc(measure = "ZCOR", ri = r, ni = n.rep, data = arnold_data,
var.names = c("Zr", "Zr_v"))
# Lets subset to endotherms for demonstration purposes
arnold_data_endo <- arnold_data %>%
mutate(endos = ifelse(Class %in% c("Mammalia", "Aves"), "endo", "ecto")) %>%
filter(endos == "endo" & Zr <= 3) # Note that one sample that was an extreme outlier was removed in the paper.
# Add in observation-level (residual)
arnold_data_endo$obs <- 1:dim(arnold_data_endo)[1]
Funnel plots are by far the most common visual tool for assessing the possibility of publication bias (Nakagawa et al., 2022). Just like any exploratory analysis, these are just visual tools. Let’s have a look at a funnel plot of the data. Funnel plots plot the the effect size (x-axis) against some form of uncertainty around the effect size, such as sampling variance or precision (y-axis). While we acknowledge that many other types of plots exist to explore the possibility for publication bias (Jennions et al., 2013; Nakagawa et al., 2022; Rothstein et al., 2005), we will only cover this more common type.
If no publication bias exists then we would expect the plot to look fairly symmetrical and funnel shaped (hence why it’s called a funnel plot!). The reason why the shape is a funnel is because the sampling variance is expected to decrease (or the precision increase) when the sample size, and thus power, increases. These ‘high-powered’ studies are at the top of the ‘funnel’ in the narrow-necked region, so to say, because we expect the effect size form these studies to fluctuate very little based on sampling process. In contrast, as the power of studies decrease, and therefore their sampling variance increases, we expect the spread of effect sizes to increase simply because small sample sizes results in greater variability of effects and effects that are larger in magnitude (by chance alone).
# Lets make a funnel plot to visualize the data in relation to the precision,
# inverse sampling standard error,
metafor::funnel(x = arnold_data_endo$Zr, vi = arnold_data_endo$Zr_v, yaxis = "seinv",
digits = 2, level = c(0.1, 0.05, 0.01), shade = c("white", "gray55", "gray 75"),
las = 1, xlab = "Correlation Coefficient (r)", atransf = tanh, legend = TRUE)
We can see from Fig. 1 above the typical funnel shape. You will notice that most effects lie in the positive correlation space – in other words there is a strong positive correlation between BMR and fitness. However, we also find some studies that show the opposite pattern. We expect that based on sampling theory alone, and indeed many of these effects fall close to the dotted sampling error intervals. Studies in the light grey regions are studies where the p-value was significant.
Think about what you would expect the funnel plot to look like and why.
We might expect under a file-drawer situation (i.e., where researchers stash away poorer quality studies showing opposite effects in their desk drawers) that studies with low power (i.e., low precision, wide standard errors, and small sample sizes) and non-significant correlations will go unpublished. This should be particularly true for studies that show the opposite to what we might predict by theory – specifically, negative correlations from studies with small sample sizes / low precision that are not significant. This is one factor that can drive what we call funnel asymmetry, showing a bunch of missing effect sizes in the bottom left corner of the funnel.
If we look at Fig. 1 we do see some hint of this scenario. There is a noticeable blank space in the bottom left corner with negative correlations based on very small sample sizes that are generally small to moderate in magnitude going unpublished. The contour-enhanced funnel plot also tells us that these are studies that failed to find a significant correlation. But, interestingly, we also see that if the magnitude of correlation is large enough in the negative direction even with small sample sizes these can get published, but for the most part these are significant at 0.05. We can only speculate as to why or if this is even a real signature of publication bias. However, this might suggest that if folks estimate large enough correlations and these are in the opposite direction to what one might expect these arguably ‘surprising’ results are more likely to be published than if the correlation is weak and in the opposite direction.
Visually identifying publication bias in funnel plots can be extremely challenging. That’s because high amounts of heterogeneity, artefacts, data irregularities (mistakes, fraud) and simply just chance can just as easily result in apparent funnel asymmetry (Nakagawa et al., 2022). In fact, publication bias can even be present without any asymmetry at all! (see Sanchez-Tojar et al., 2018). Given that heterogeneity is very high in ecological and evolutionary meta-analyses (see Senior et al., 2016), we should try to attempt to correct for this (as best as we can anyway) to determine if patterns in the funnel plot remain. We can achieve this to some extent by plotting a funnel plot of meta-analytic residuals, taken from a full multi-level meta-regression model that explicitly accounts for as much heterogeneity as possible. Nakagawa et al. (2022) provide a thorough supplement that walks through different publication bias methods, how to apply them and also plotting different types of funnel plots.
# Fitness type
metareg <- rma.mv(yi = Zr, V = Zr_v, mods = ~FitnessClassification, data = arnold_data_endo,
random = list(~1 | Ref, ~1 | obs))
summary(metareg)
##
## Multivariate Meta-Analysis Model (k = 209; method: REML)
##
## logLik Deviance AIC BIC AICc
## -63.9162 127.8324 147.8324 180.8654 148.9903
##
## Variance Components:
##
## estim sqrt nlvls fixed factor
## sigma^2.1 0.0828 0.2878 62 no Ref
## sigma^2.2 0.0260 0.1611 209 no obs
##
## Test for Residual Heterogeneity:
## QE(df = 201) = 846.1346, p-val < .0001
##
## Test of Moderators (coefficients 2:8):
## QM(df = 7) = 44.9882, p-val < .0001
##
## Model Results:
##
## estimate se zval pval
## intrcpt 0.2282 0.0606 3.7692 0.0002
## FitnessClassificationAerobic capacity -0.2366 0.1002 -2.3616 0.0182
## FitnessClassificationBoldness 0.2767 0.1986 1.3933 0.1635
## FitnessClassificationDominance 0.2147 0.1398 1.5361 0.1245
## FitnessClassificationGrowth -0.0326 0.1270 -0.2567 0.7974
## FitnessClassificationMovement/activity -0.3791 0.0652 -5.8172 <.0001
## FitnessClassificationReproduction -0.1157 0.0838 -1.3812 0.1672
## FitnessClassificationSurvival -0.1828 0.0864 -2.1141 0.0345
## ci.lb ci.ub
## intrcpt 0.1096 0.3469 ***
## FitnessClassificationAerobic capacity -0.4330 -0.0402 *
## FitnessClassificationBoldness -0.1125 0.6658
## FitnessClassificationDominance -0.0593 0.4887
## FitnessClassificationGrowth -0.2814 0.2162
## FitnessClassificationMovement/activity -0.5069 -0.2514 ***
## FitnessClassificationReproduction -0.2799 0.0485
## FitnessClassificationSurvival -0.3522 -0.0133 *
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can have a look at how much variation this model explains by calculating marginal (\(R_{marginal}^2\)) and conditional (\(R_{conditional}^2\)) \(R^2\).
# How much variation?
orchaRd::r2_ml(metareg) * 100
## R2_marginal R2_conditional
## 26.8 82.5
Interpret the meaning of both values output from our \(R^2\) calculations
From the output above we can see that the fixed effects or moderator explains 26.773% of the variation in Zr estimates (as indicated by \(R_{marginal}^2\)), while when we consider both the fixed and random effects than the model explains 82.525% of the variation in Zr estimates (as indicated by \(R_{conditional}^2\)). Interestingly, we still have a significant amount of residual heterogeneity that we cannot explain (\(QE\) = 846.135, p = 5.477^{-80}).
Now that we have a full model estimated, we can extract the residuals from this model and re-make our funnel plot using the residuals.
# Extract residuals based on the fixed effect predictions (marginal) and the
# fixed and random effect (conditional)
resid_marg <- arnold_data_endo$Zr - predict(metareg)[[1]] # Calculates the residuals based on fixed effects only
vi_residm <- arnold_data_endo$Zr_v + predict(metareg)[[2]]^2 # Adds prediction errors
# Conditional requires random effect blups for study (Ref) and effect size obs
blups <- ranef(metareg) # extracts blups for each random effect. Stored as a list.
study_ef <- blups$Ref[[1]][match(arnold_data_endo$Ref, row.names(blups$Ref))] # This code will expand each random effect for study out to match the study ID across the entire dataset
study_vi <- (blups$Ref[[2]][match(arnold_data_endo$Ref, row.names(blups$Ref))])^2
effect_effect <- blups$obs[[1]]
vi_effect <- blups$obs[[1]]^2
# Now we have everything we need to calculate residuals
resid_cond <- arnold_data_endo$Zr - (predict(metareg)[[1]] + study_ef + effect_effect)
vi_resid_cond <- vi_residm + study_vi + vi_effect
# Lets make a funnel plot to visualize the data in relation to the precision,
# inverse sampling standard error,
par(mfrow = c(1, 2))
metafor::funnel(x = resid_marg, vi = vi_residm, yaxis = "seinv", digits = 2, level = c(0.1,
0.05, 0.01), shade = c("white", "gray55", "gray 75"), las = 1, xlab = "Meta-analytic Residuals (marginal)",
legend = FALSE)
metafor::funnel(x = resid_cond, vi = vi_resid_cond, yaxis = "seinv", digits = 2,
level = c(0.1, 0.05, 0.01), shade = c("white", "gray55", "gray 75"), las = 1,
xlab = "Meta-analytic Residuals (conditional)", legend = FALSE)
After looking at the new funnel plots (Fig. 2) that account for varying amounts of heterogeneity in the model would you say our conclusions about publication bias being present or not change at all? If not, why not?
Maybe. Even after accounting for factors that explain (or estimate) heterogeneity in the data we still see areas of the funnel plot that appear to be missing effect size estimates, notably for the marginal residuals we still see the missing effects on the left side. However, when looking at the conditional (taking into account fixed and random effects) we see that there is a gap yet when precision is low, but we are still seeing some effect size estimates being published with very low precision, which is not really what we might expect from publication bias.
While funnel plots can be useful they do have clear limitations. Nakagawa et al. (2021) overview the many assumption being made, particularly when using residuals, and how non-independence of sampling errors can cause problems. In the next section, we’ll overview some more sophisticated, less subjective, ways to assess and correct for publication bias.
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: magick(v.2.7.2), vembedr(v.0.1.5), equatags(v.0.1.0), mathjaxr(v.1.6-0), pander(v.0.6.4), orchaRd(v.2.0), forcats(v.0.5.1), stringr(v.1.4.0), dplyr(v.1.0.9), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.0), tibble(v.3.1.7), ggplot2(v.3.3.6), tidyverse(v.1.3.1), flextable(v.0.6.6), metafor(v.3.4-0), metadat(v.1.3-0) and Matrix(v.1.3-4)
loaded via a namespace (and not attached): nlme(v.3.1-152), fs(v.1.5.2), lubridate(v.1.8.0), httr(v.1.4.3), tools(v.4.0.3), backports(v.1.4.1), bslib(v.0.3.1), utf8(v.1.2.2), R6(v.2.5.1), DBI(v.1.1.3), colorspace(v.2.0-3), withr(v.2.5.0), tidyselect(v.1.1.2), curl(v.4.3.2), compiler(v.4.0.3), cli(v.3.3.0), rvest(v.1.0.2), formatR(v.1.11), pacman(v.0.5.1), xml2(v.1.3.3), officer(v.0.3.18), bookdown(v.0.22), sass(v.0.4.1), scales(v.1.2.0), rappdirs(v.0.3.3), systemfonts(v.1.0.2), digest(v.0.6.29), rmarkdown(v.2.14), base64enc(v.0.1-3), pkgconfig(v.2.0.3), htmltools(v.0.5.2), highr(v.0.9), dbplyr(v.2.2.1), fastmap(v.1.1.0), rlang(v.1.0.3), readxl(v.1.4.0), rstudioapi(v.0.13), jquerylib(v.0.1.4), generics(v.0.1.2), jsonlite(v.1.8.0), zip(v.2.2.0), magrittr(v.2.0.3), Rcpp(v.1.0.8.3), munsell(v.0.5.0), fansi(v.1.0.3), gdtools(v.0.2.3), lifecycle(v.1.0.1), stringi(v.1.7.6), yaml(v.2.3.5), grid(v.4.0.3), crayon(v.1.5.1), lattice(v.0.20-44), haven(v.2.5.0), hms(v.1.1.1), knitr(v.1.39), klippy(v.0.0.0.9500), pillar(v.1.7.0), uuid(v.1.1-0), reprex(v.2.0.1), xslt(v.1.4.3), glue(v.1.6.2), evaluate(v.0.15), data.table(v.1.14.2), remotes(v.2.4.2), modelr(v.0.1.8), vctrs(v.0.4.1), tzdb(v.0.3.0), cellranger(v.1.1.0), gtable(v.0.3.0), assertthat(v.0.2.1), locatexec(v.0.1.1), xfun(v.0.31), broom(v.1.0.0) and ellipsis(v.0.3.2)