Type: | Package |
Title: | Assessing Complex Heterogeneity in Surrogacy |
Version: | 2.0 |
Date: | 2025-04-10 |
Description: | Provides functions to assess complex heterogeneity in the strength of a surrogate marker with respect to multiple baseline covariates, in either a randomized treatment setting or observational setting. For a randomized treatment setting, the functions assess and test for heterogeneity using both a parametric model and a semiparametric two-step model. More details for the randomized setting are available in: Knowlton, R., Tian, L., & Parast, L. (2025). "A General Framework to Assess Complex Heterogeneity in the Strength of a Surrogate Marker," Statistics in Medicine, 44(5), e70001 <doi:10.1002/sim.70001>. For an observational setting, functions in this package assess complex heterogeneity in the strength of a surrogate marker using meta-learners, with options for different base learners. More details for the observational setting will be available in the future in: Knowlton, R., Parast, L. (2025) "Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners." A tutorial for this package can be found at https://www.laylaparast.com/cohetsurr. |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
Imports: | stats, matrixStats, mvtnorm, mgcv, grf |
NeedsCompilation: | no |
Packaged: | 2025-04-10 20:56:27 UTC; parastlm |
Author: | Rebecca Knowlton [aut], Layla Parast [aut, cre] |
Maintainer: | Layla Parast <parast@austin.utexas.edu> |
Repository: | CRAN |
Date/Publication: | 2025-04-11 02:10:02 UTC |
Performs bootstrap estimation procedures for the variance of the proportion of treatment effect explained, the omnibus test, and identifying a region above a treshold.
Description
Performs bootstrap estimation procedures for the variance of the proportion of treatment effect explained, the omnibus test, and identifying a region above a treshold in a randomized treatment setting.
Usage
boot.var(data.control, data.treat, W.grid.expand, type, test = FALSE,
data.all = NULL, num.cov = NULL, results.for.test = NULL, threshold = NULL)
Arguments
data.control |
dataframe containing data from the control group, specifically the outcome Y, the surrogate S, and the baseline covariates W |
data.treat |
dataframe containing data from the treamtent group, specifically the outcome Y, the surrogate S, and the baseline covariates W |
W.grid.expand |
expanded version of the W grid of baseline covariates, where each row is a specific combination of the covariates for which the estimates should be provided |
type |
options are "model", "two step", or "both"; specifies the estimation method that should be used for the proportion of treatment effect explained |
test |
TRUE or FALSE, if test for heterogeneity is wanted |
data.all |
dataframe containing data from the control and treamtent groups, specifically the outcome Y, the surrogate S, and the baseline covariates W |
num.cov |
number of baseline covariates in the matrix W |
results.for.test |
the grid of estimates for the proportion of treatment effect explained obtained prior to bootstrapping, needed for the omnibus test |
threshold |
threshold to flag regions where the estimated proportion of the treatment effect explained is at least that high |
Value
A list is returned:
return.grid |
grid of variance estimates for the overall treatment effect, the residual treatment effect, and the proportion of treatment effect explained as a function of the baseline covariates, W. If requested by user, includes regions flagged above the threshold. |
pval |
p-value(s) from the F test and the two step omnibus test for heterogeneity, depending on type argument. |
Estimates the proportion of treatment effect explained by the surrogate marker as a function of multiple baseline covariates in a randomized treatment setting.
Description
Assesses complex heterogeneity in the utility of a surrogate marker by estimating the proportion of treatment effect explained by the surrogate marker as a function of multiple baseline covariates in a randomized treatment setting. Optionally, tests for evidence of heterogeneity overall and flags regions where the proportion of treatment effect explained is above a given threshold.
Usage
complex.heterogeneity(y, s, a, W.mat, type = "model", variance = FALSE,
test = FALSE, W.grid = NULL, grid.size = 4, threshold = NULL)
Arguments
y |
y, the outcome |
s |
s, the surrogate marker |
a |
a, the treatment assignment with 1 indicating the treatment group and 0 indicating the control group, assumed to be randomized |
W.mat |
matrix of baseline covariate observations, where the first column is W1, second columns is W2, etc. |
type |
options are "model", "two step", or "both"; specifies the estimation method that should be used for the proportion of treatment effect explained |
variance |
TRUE or FALSE, if variance/standard error estimates are wanted |
test |
TRUE or FALSE, if test for heterogeneity is wanted |
W.grid |
grid for the baseline covariates W where estimation will be provided |
grid.size |
number of measures for each baseline covariate to include in the estimation grid, if one is not provided by the user directly |
threshold |
threshold to flag regions where the estimated proportion of the treatment effect explained is at least that high |
Value
A list is returned:
return.grid |
grid of estimates for the overall treatment effect, the residual treatment effect, and the proportion of treatment effect explained as a function of the baseline covariates, W. Includes variance estimates and regions flagged above the threshold, if specified by the user. |
pval |
p-value(s) from the F test and the two step omnibus test for heterogeneity, depending on type argument. |
Author(s)
Rebecca Knowlton
References
Knowlton, R., Tian, L., & Parast, L. (2025). A General Framework to Assess Complex Heterogeneity in the Strength of a Surrogate Marker. Statistics in Medicine, 44(5), e70001.
Examples
data(exampledata)
names(exampledata)
complex.heterogeneity(y = exampledata$y,
s = exampledata$s,
a = exampledata$a,
W.mat = matrix(cbind(exampledata$w1, exampledata$w2), ncol = 2),
type = "model",
W.grid = matrix(cbind(exampledata$w1.grid, exampledata$w2.grid),ncol=2))
Example data
Description
Example data
Usage
data("exampledata")
Format
A list with 7 elements representing 1000 observations from a treatment group and 1000 observations from a control group, and a grid of baseline covariate values at which to calculate estimates:
y
the outcome
s
the surrogate marker
a
the randomized treatment assignment, where 1 indicates treatment and 0 indicates control
w1
the first baseline covariate of interest
w2
the second baseline covariate of interest
w1.grid
the grid of first baseline covariate values to provide estimates for
w2.grid
the grid of second baseline covariate values to provide estimates for
Examples
data(exampledata)
names(exampledata)
Calculate bootstrapped variance estimates in an observational setting.
Description
Calculates bootstrapped variance estimates of delta, delta.s, and R.s, and optionally calculates p-values for identifying individuals for whom the surrogate is strong.
Usage
obs.boot.var(df.train, df.test, type, numeric_predictors, categorical_predictors,
threshold, use.actual.control.S, gam.smoothers, tree.tuners)
Arguments
df.train |
A dataframe containing training data. |
df.test |
A dataframe containing testing data. |
type |
Options are "linear", "gam", "trees", or "all"; type of base learners to use. |
numeric_predictors |
The column names in the dataframes that represent numeric baseline covariates. |
categorical_predictors |
The column names in the dataframes that represent categorical baseline covariates. |
threshold |
An optional threshold to test individuals for the null hypothesis that PTE is greater than the threshold. |
use.actual.control.S |
TRUE or FALSE, if user prefers to use the actual observed values for the surrogate in the control group instead of predicting values from the base learners. |
gam.smoothers |
A list of smoothing parameters to use for GAM base learners, so they are not retuned with bootstrapping iterations ("m1sp", "m0sp", "m1ssp", "m0ssp", "s0") |
tree.tuners |
A list of tuning parameters to use for tree base learners, so they are not retuned with bootstrapping iterations ("m1sp", "m0sp", "m1ssp", "m0ssp", "s0") |
Value
A dataframe is returned, which is the df.test argument with new columns appended for the estimated variances of delta, delta.s, and R.s, as well as p-values if a threshold is provided.
Estimate the proportion of the treatment effect explained in an observational setting.
Description
Fits base learners using the specified type of model on the training data, and uses those models to calculate delta, delta.s, and R.s on the testing set.
Usage
obs.estimate.PTE(df.train, df.test, type, numeric_predictors, categorical_predictors,
use.actual.control.S, gam.smoothers, tree.tuners, want.smooth, want.tune)
Arguments
df.train |
A dataframe containing training data. |
df.test |
A dataframe containing testing data. |
type |
Options are "linear", "gam", "trees", or "all"; type of base learners to use. |
numeric_predictors |
The column names in the dataframes that represent numeric baseline covariates. |
categorical_predictors |
The column names in the dataframes that represent categorical baseline covariates. |
use.actual.control.S |
TRUE or FALSE, if user prefers to use the actual observed values for the surrogate in the control group instead of predicting values from the base learners. |
gam.smoothers |
A list of smoothing parameters to use for GAM base learners, so they are not retuned with bootstrapping iterations ("m1sp", "m0sp", "m1ssp", "m0ssp", "s0") |
tree.tuners |
A list of tuning parameters to use for tree base learners, so they are not retuned with bootstrapping iterations ("m1sp", "m0sp", "m1ssp", "m0ssp", "s0") |
want.smooth |
TRUE or FALSE, if smoothing parameters for GAM should be saved |
want.tune |
TRUE or FALSE, if tuning parameters for trees should be saved |
Value
A list is returned:
df.test |
df.test argument with new columns appended for the estimates of delta, delta.s, and R.s |
smooth_params |
A list of smoothing parameters used for GAM base learners ("m1sp", "m0sp", "m1ssp", "m0ssp", "s0") |
tuner_params |
A list of tuning parameters used for tree base learners ("m1sp", "m0sp", "m1ssp", "m0ssp", "s0") |
Estimate the proportion of the treatment effect explained by the surrogate marker as a function of multiple baseline covariates in an observational setting.
Description
Assesses surrogate heterogeneity in real world data by estimating the proportion of the treatment effect explained as a function of baseline covariates. Optionally tests individuals for strong surrogacy based on a threshold.
Usage
obs.het.surr(df.train, df.test, type, var.want = FALSE, threshold = NULL,
use.actual.control.S = FALSE)
Arguments
df.train |
dataframe containing training data; must have columns G (treatment assignment), S (surrogate marker), and Y (primary outcome), in addition to the baseline covariates of interest |
df.test |
dataframe containing testing data; must contain the same baseline covariate columns as the training data |
type |
options are "linear", "gam", "trees", or "all"; type of base learners to use |
var.want |
TRUE or FALSE, if variance estimates are wanted |
threshold |
optional threshold to test individuals for the null hypothesis that PTE is greater than the threshold; must have var.want = TRUE to return p-values |
use.actual.control.S |
TRUE or FALSE, if user prefers to use the actual observed values for the surrogate in the control group instead of predicting values from the base learners |
Value
A dataframe is returned, which is the df.test argument with new columns appended for the estimates and corresponding variances of delta, delta.s, and R.s. If a threshold is specified, returns a p-value for the null hypothesis that PTE > threshold.
Author(s)
Rebecca Knowlton
References
Knowlton, R. and Parast, L. (2025) “Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners." Under Review.
Examples
data(obs_exampledata_train)
data(obs_exampledata_test)
obs.het.surr(df.train = obs_exampledata_train, df.test = obs_exampledata_test,
type = "linear", var.want = FALSE)
Example testing data for observational setting
Description
Example testing data for observational setting
Usage
data("obs_exampledata_test")
Format
A data frame with 200 observations on the following 9 variables.
X1
a numeric baseline covariate of interest
X2
a numeric baseline covariate of interest
X3
a numeric baseline covariate of interest
X4
a numeric baseline covariate of interest
X5
a numeric baseline covariate of interest
X6
a numeric baseline covariate of interest
G
the non-randomized treatment assignment, where 1 indicates treated and 0 indicates control
S
the surrogate marker
Y
the primary outcome
Examples
data(obs_exampledata_test)
names(obs_exampledata_test)
Example training data for observational setting
Description
Example training data for observational setting
Usage
data("obs_exampledata_train")
Format
A data frame with 1800 observations on the following 9 variables.
X1
a numeric baseline covariate of interest
X2
a numeric baseline covariate of interest
X3
a numeric baseline covariate of interest
X4
a numeric baseline covariate of interest
X5
a numeric baseline covariate of interest
X6
a numeric baseline covariate of interest
G
the non-randomized treatment assignment, where 1 indicates treated and 0 indicates control
S
the surrogate marker
Y
the primary outcome
Examples
data(obs_exampledata_train)
names(obs_exampledata_train)
Estimates the proportion of treatment effect explained as a function of multiple baseline covariates, W, using a parametric model.
Description
Estimates the proportion of treatment effect explained as a function of multiple baseline covariates, W, using a parametric model in a randomized treatment setting.
Usage
parametric.est(data.control, data.treat, W.grid.expand)
Arguments
data.control |
dataframe containing data from the control group, specifically the outcome Y, the surrogate S, and the baseline covariates W |
data.treat |
dataframe containing data from the treamtent group, specifically the outcome Y, the surrogate S, and the baseline covariates W |
W.grid.expand |
expanded version of the W grid of baseline covariates, where each row is a specific combination of the covariates for which the estimates should be provided |
Value
A grid of estimates is returned of the proportion of treatment effect explained, the overall treatment effect, and the residual treatment effect for the given baseline covariate combinations.
Estimates the proportion of treatment effect explained as a function of multiple baseline covariates, W, using a two step, semiparametric model.
Description
Estimates the proportion of treatment effect explained as a function of multiple baseline covariates, W, using a two step, semiparametric model in a randomized treatment setting.
Usage
two.step.est(data.control, data.treat, W.grid.expand.function)
Arguments
data.control |
dataframe containing data from the control group, specifically the outcome Y, the surrogate S, and the baseline covariates W |
data.treat |
dataframe containing data from the treamtent group, specifically the outcome Y, the surrogate S, and the baseline covariates W |
W.grid.expand.function |
expanded version of the W grid of baseline covariates, where each row is a specific combination of the covariates for which the estimates should be provided |
Value
A grid of estimates is returned of the proportion of treatment effect explained, the overall treatment effect, and the residual treatment effect for the given baseline covariate combinations.