Introduction to BayesGmed

Motivation

This vignette introduces the R package BayesGmed, a package designed for a Bayesian causal mediation analysis in Stan, a probabilistic programming language for Bayesian inference. BayesGmed uses a parametric approach for the specification of the outcome and mediator model, and uses the g-formula approach for estimation. In addition to the estimation of causal mediation effects, the package also allows researchers to conduct sensitivity analysis.

Getting started

You can install from Github via devtools

devtools::install_gitlab(belayb/BayesGmed)

Example

BayesGmed comes with an example data, example_data, which contains a binary outcome \(Y\), a single binary mediator \(M\), a binary exposure \(A\), and two numeric confounding variables \(Z = (Z_1, Z_2)\). The first 6 rows of the data is visualized below.

library(BayesGmed)
head(example_data)
#>   Z1 Z2 A M Y
#> 1  0  1 1 1 0
#> 2  1  0 0 1 0
#> 3  0  0 0 0 0
#> 4  1  1 1 0 0
#> 5  0  1 1 0 0
#> 6  1  0 1 1 0

The aim in this example data is to estimate the direct and indirect effect of \(A\) on \(Y\) adjusting for \(Z\). To do this, we may proced as follow.

\[logit( P(Y_{i} = 1|A_{i},M_{i},\mathbf{Z}_{i}) ) = \alpha_{0} + \mathbf{\alpha}_{Z}^{'}\mathbf{Z}_{i} + \alpha_{A}A_{i} + \alpha_{M}M_{i},\]

\[logit(P(M_{i} = 1| A_{i},\mathbf{Z}_{i} ) ) = \beta_{0} + \mathbf{\beta}_{Z}^{'}\mathbf{Z}_{i} + \beta_{A}A_{i}.\]

To complete the model specification, we assume the following priors for the model parameters.

\[ \begin{aligned} \beta &\sim MVN(location_m, scale_m)\\ \alpha &\sim MVN(location_y, scale_y) \end{aligned} \]

We then need to specify the parameters of the prior distrbutions or assume a hyper-prior. BayesGmed currently only allow fixed values and these values can be passed as part of the model call and if not the following defult values will be used

P <- 3 # number of covariates plus the intercept term 
priors <- list(scale_m = 2.5*diag(P+1), 
               scale_y = 2.5*diag(P+2),
               location_m = rep(0, P+1), 
               location_y = rep(0, P+2))

The model can then be fitted as follow

fit1 <- bayesgmed(outcome = "Y", mediator =  "M", treat = "A", covariates = c("Z1", "Z2"), dist.y = "binary",
dist.m = "binary", link.y = "logit", link.m = "logit", data = example_data)
#> 
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.00014 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.4 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.968 seconds (Warm-up)
#> Chain 1:                0.953 seconds (Sampling)
#> Chain 1:                1.921 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 7.2e-05 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.72 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 1.062 seconds (Warm-up)
#> Chain 2:                1.094 seconds (Sampling)
#> Chain 2:                2.156 seconds (Total)
#> Chain 2: 
#> 
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 3).
#> Chain 3: 
#> Chain 3: Gradient evaluation took 5.9e-05 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.59 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3: 
#> Chain 3: 
#> Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 3: 
#> Chain 3:  Elapsed Time: 1.252 seconds (Warm-up)
#> Chain 3:                1.577 seconds (Sampling)
#> Chain 3:                2.829 seconds (Total)
#> Chain 3: 
#> 
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 4).
#> Chain 4: 
#> Chain 4: Gradient evaluation took 5.7e-05 seconds
#> Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.57 seconds.
#> Chain 4: Adjust your expectations accordingly!
#> Chain 4: 
#> Chain 4: 
#> Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 4: 
#> Chain 4:  Elapsed Time: 1.154 seconds (Warm-up)
#> Chain 4:                1.349 seconds (Sampling)
#> Chain 4:                2.503 seconds (Total)
#> Chain 4:

bayesgmed_summary(fit1)
#>     Parameter  Mean    SE Median   2.5% 97.5% n_eff Rhat
#> 1 NDE_control 0.263 0.069  0.263  0.127 0.397  5292    1
#> 2 NDE_treated 0.286 0.069  0.287  0.150 0.420  5012    1
#> 3 NIE_control 0.042 0.036  0.040 -0.027 0.117  3925    1
#> 4 NIE_treated 0.065 0.048  0.063 -0.027 0.163  4434    1
#> 5        ANDE 0.274 0.064  0.273  0.148 0.398  5482    1
#> 6        ANIE 0.053 0.034  0.052 -0.010 0.123  4353    1
#> 7          TE 0.327 0.065  0.327  0.200 0.453  4959    1