Noah Greifer updated the package source to reflect two changes to the CRAN checks that resulted in `bcf`

being removed from CRAN in April 2023. Noah’s updates:

- Removed
`sprintf()`

from the C++ source code, as it is now deprecated, and - Removed
`CXX_STD = CXX11`

from`src/Makevars`

and`src/Makevars.win`

, as C++11 is now a CRAN default.

The prediction method introduced in the previous `bcf`

version writes tree samples to text files, which can grow large if many samples are retained. Users concerned about the size of text file outputs may suppress writing to text files by specifying `no_output = TRUE`

in the call to `bcf()`

.

Sampling employs within-chain parallelism through `RcppParallel`

, but `bcf`

does not, for the time being, run multiple chains in parallel through R’s high level `doParallel`

interface.

This implementation extends existing `bcf`

functionality by:

- allowing for heteroskedastic errors
- automating multi-chain implementations
- providing a suite of convergence diagnostic functions via the
`coda`

package - accelerating some underlying computations, resulting in shorter runtimes
- providing a function to predict treatment effects based on an existing model using new data

The original version of `bcf`

does not allow for weights, which are often used in practical applications to account for heteroskedasticity. Where the original BCF model was specified as:

y_{i} ∼ N(μ(x_{i}) + τ(x_{i}) z_{i}, σ^{2}),

which assumes that all outcomes y_{i} have the same variance σ^{2}, in the extended version we can relax this assumption to allow for heteroskedasticity in y_{i}:

y_{i} ∼ N(μ(x_{i}) + τ(x_{i}) z_{i}, σ^{2}/w_{i})

Incorporating weights impacts several parts of the code, including the computation of:

- sufficient statistics
- leaf node means
- leaf node means variance
- error variance (sigma)

In Bayesian analysis, it is useful to produce different runs of the same model – with different starting values – as a way of assessing convergence. If the different runs produce drastically different posterior distributions, it is a sign that the model has not converged fully. In this version of `bcf`

we have automated multichain processing and incorporated key MCMC diagnostics from the `coda`

package, including effective sample sizes and the Gelman-Rubin statistic (“R hat”).

Finally, our implementation conducts some steps of the sampling procedure in parallel to maximize computational efficiency. Our testing shows that these enhancements have reduced runtimes by around 50%, across various experimental conditions.

It is now possible to predict the treatment effect for a new set of units. Once users have produced a satisfactory `bcf`

run (using training data), they can use this fitted `bcf`

object to predict on a new set of test data. This is possible even with runs that have multiple chains.