Version: | 0.3-5 |
Title: | Randomization Inference Tools |
Description: | Tools for randomization-based inference. Current focus is on the d^2 omnibus test of differences of means following Hansen and Bowers (2008) <doi:10.1214/08-STS254> . This test is useful for assessing balance in matched observational studies or for analysis of outcomes in block-randomized experiments. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyData: | true |
Depends: | R (≥ 3.5.0), ggplot2, survival |
Imports: | grDevices, abind, xtable, svd, stats, graphics, methods, SparseM, tidyr, tibble, dplyr |
Suggests: | knitr, rmarkdown, testthat, roxygen2, MASS |
Enhances: | optmatch |
URL: | https://cran.r-project.org/package=RItools |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2025-05-17 13:17:15 UTC; jwbowers |
Author: | Jake Bowers [aut, cre], Mark Fredrickson [aut], Ben Hansen [aut], Josh Errickson [ctb] |
Maintainer: | Jake Bowers <jwbowers@illinois.edu> |
Repository: | CRAN |
Date/Publication: | 2025-05-17 13:30:02 UTC |
CovsAlignedToADesign S4 class
Description
A class for representing covariate matrices after alignment within stratum, for a (single) given stratifying factor. There can also be a clustering variable, assumed to be nested within the stratifying variable.
Details
In contrast to DesignOptions, this class represents the combination of a single Covariates table, realized treatment assignment and treatment assignment scheme, not multiple treatment assignment schemes (designs). These Covariates are assumed to reflect regularization, s.t. missings have been patched with a value and then all of the covariate values have been aligned within each stratum. In lieu of a NotMissing slot there will be Covariates columns, also centered within a stratum, recording non-missingness of the original data.
Ordinarily the StrataWeightRatio slot has an entry for each unit, representing ratio of
specified stratum weight to the product of h_b
(the harmonic mean of n_\{tb\}
and
n_{cb}
, the counts of treatment and control clusters in stratum b
) with bar-w_b
,
(the arithmetic mean of aggregated cluster weights within that stratum). It can also
be the numeric vector 1, without names, meaning the intended weight ratio is always 1.
Slots
Covariates
Numeric matrix, as in ModelMatrixPlus, except: will include NM columns; all columns presumed to have been stratum-centered (aligned)
UnitWeights
vector of weights associated w/ rows of Covariates
Z
Logical indicating treatment assignment
StrataMatrix
A sparse matrix with n rows and s columns, with 1 if the unit is in that stratification
StrataWeightRatio
For each unit, ratio of stratum weight to
h_b
; but see Details.Cluster
Factor indicating who's in the same cluster with who
OriginalVariables
Look up table associating Covariates cols to terms in the calling formula, as in ModelMatrixPlus
DesignOptions S4 class
Description
Extends the ModelMatrixPlus class
Details
If the DesignOptions represents clusters of elements, as when it was created by aggregating another DesignOptions or ModelMatrixPlus object, then its Covariates and NotMissing slots are populated with (weighted) averages, not totals. E.g., NotMissing columns consist of weighted averages of element-wise non-missingness indicators over clusters, with weights given by (the element-level precursor to) the UnitWeights vector. As otherwise, columns of the NotMissing matrix represent terms from a model formula, rather than columns the terms may have expanded to.
If present, the null stratification (all units in same stratum) in indicated
by the corresponding column of the StrataFrame
slot bearing the name
‘--
’.
Slots
Z
Logical indicating treatment assignment
StrataFrame
Factors indicating strata
Cluster
Factor indicating who's in the same cluster with who
Create stratum weights to be associated with a DesignOptions
Description
Apply weighting function to a DesignOptions by stratum, returning results in a format suitable to be associated with an existing design and used in further calcs.
Usage
DesignWeights(design, stratum.weights = harmonic_times_mean_weight)
Arguments
design |
DesignOptions |
stratum.weights |
Stratum weights function. Will be fed a count data.frame with Tx.grp (indicating the treatment group), stratum.code, all other covariates and unit.weights. |
Details
The function expects its DesignOptions argument to represent aggregated data,
i.e. clusters not elements within clusters. Its stratum.weights
argument
is a function that is applied to a data frame representing clusters,
with variables Tx.grp
, stratum.code
, covariates as named in the
design
argument (a DesignOptions object), and unit.weights
(either as
culled or inferred from originating balanceTest
call or as aggregated up
from those unit weights). Returns a
weighting factor to be associated with each stratum, this factor determining the stratum
weight by being multiplied by mean of unit weights over clusters in that stratum.
Specifically, the function's value is a data frame of two variables,
sweights
and wtratio
, with rows representing strata.
The sweights
vector represents internally
calculated or user-provided stratum.weights
, one for each
stratum, scaled so that their sum is 1; in Hansen & Bowers (2008), these
weights are denoted w_{b}
. wtratio
is the ratio of
sweights
to the product of half the harmonic
mean of n_{tb}
and n_{cb}
, the number of treatment and control
clusters in stratum b
, with the mean of the weights associated with
each of these clusters. In the notation of Hansen & Bowers
(2008), this is w_{b}/(h_b \bar{m}_b)
. Despite the name
‘wtratio
’, this ratio's denominator is not a weight
in the sense of summing to 1 across strata. The ratio is expected
downstream in HB08
(in internal calculations
involving ‘wtr
’).
Value
data frame w/ rows for strata, cols sweights
and wtratio
.
Adjusted & combined differences as in Hansen & Bowers (2008)
Description
Adjusted & combined differences as in Hansen & Bowers (2008)
Usage
HB08(alignedcovs)
Arguments
alignedcovs |
A CovsAlignedToADesign object |
Value
list with components:
- z
First item
- p
Second item
- Msq
Squared Mahalanobis distance of combined differences from origin
- DF
degrees of freedom
- adj.diff.of.totals
Vector of sum statistics z't - E(Z't), where t represents cluster totals of the product of the covariate with unit weights. Hansen & Bowers (2008) refer to this as the adjusted difference vector, or d(z,x).
- tcov
Matrix of null covariances of Z'x-tilde vector, as above.
References
Hansen, B.B. and Bowers, J. (2008), “Covariate Balance in Simple, Stratified and Clustered Comparative Studies,” Statistical Science 23.
See Also
balanceTest
, alignDesignsByStrata
Hansen & Bowers (2008) inferentials 2016 [81e3ecf] version
Description
Hansen & Bowers (2008) inferentials 2016 [81e3ecf] version
Usage
HB08_2016(alignedcovs)
Arguments
alignedcovs |
A CovsAlignedToADesign object |
Value
list, as in HB08
ModelMatrixPlus S4 class
Description
If the Covariates matrix has an intercept, it will only be in the first column.
Details
More on NotMissing slot: It's matrix of numbers in [0,1].
First col is entirely TRUE
or 1, like an intercept, unless corresponding
UnitWeight is 0, in which case it may also be 0 (see below). Subsequent cols
present only if there are missing covariate values, in which case these cols are
named for terms (of the original calling formula or data frame) that possess
missing values. Terms with the same missing data pattern are mapped to a single
column of this matrix. If the ModelMatrixPlus is representing elements, each column should
be all 1s and 0s, indicating which elements have non-missing values for the term
represented by that column. If the ModelMatrixPlus as a whole represents clusters,
then there can be fractional values, but that situation should only arise in the
DesignOptions class exension of this class, so it's documented there.
Slots
Covariates
The numeric matrix that 'model.matrix' would have returned.
OriginalVariables
look-up table associating Covariates columns with terms of the originating model formula
TermLabels
labels of terms of the originating model formula
contrasts
Contrasts, a list of contrasts or NULL, as returned by 'model.matrix.default'
NotMissing
Matrix of numbers in [0,1] with as many rows as the Covariates table but only one more col than there are distinct covariate missingness patterns (at least 1, nothing missing). First col is entirely T or 1, like an intercept.
NM.Covariates
integer look-up table mapping Covariates columns to columns of NotMissing. (If nothing missing for that column, this is 0.)
NM.terms
integer look-up table mapping term labels to columns of NotMissing (0 means nothing missing in that column)
UnitWeights
vector of weights associated w/ rows of the ModelMatrixPlus
Helper function to slm_fit_csr
Description
This function performs some checks and takes action to ensure positive definiteness of matrices passed to SparseM functions.
Usage
SparseM_solve(x, y, ...)
Arguments
x |
A slm.fit.csr |
y |
A slm.fit.csr |
... |
A slm.fit.csr |
Value
list containing coefficients (vector or matrix), the Cholesky decomposition (of class matrix.csr.chol), and a vector specifying the indices of which values on the diagonal of x'x are nonzero. These are named "coef", "chol" and "gramian_reduction_index", respectively.
Stratum Weighted DesignOptions
Description
Stratum Weighted DesignOptions
Slots
Sweights
stratum weights
Aggregate DesignOptions
Description
Totals up all the covariates, as well as user-provided unit weights. (What it does to NotMissing entries is described in docs for DesignOptions class.)
Usage
aggregateDesigns(design)
Arguments
design |
DesignOptions |
Details
If design@Cluster
has extraneous (non-represented) levels, they will be dropped.
Value
another DesignOptions representing the clusters
Align DesignOptions by Strata
Description
Align DesignOptions by Strata
Usage
alignDesignsByStrata(a_stratification, design, post.align.transform = NULL)
Arguments
a_stratification |
name of a column of 'design@strataFrame'/ element of 'design@Sweights' |
design |
DesignOptions |
post.align.transform |
A post-align transform (cf |
Value
CovsAlignedToADesign
Standardized Differences for Stratified Comparisons
Description
Covariate balance, with treatment/covariate association tests
Usage
balanceTest(
fmla,
data,
strata = NULL,
unit.weights,
stratum.weights = harmonic_times_mean_weight,
subset,
include.NA.flags = TRUE,
covariate.scales = setNames(numeric(0), character(0)),
post.alignment.transform = NULL,
inferentials.calculator = HB08,
p.adjust.method = "holm"
)
Arguments
fmla |
A formula containing an indicator of treatment assignment on the left hand side and covariates at right. |
data |
A data frame in which |
strata |
A list of right-hand-side-only formulas containing
the factor(s) identifying the strata, with |
unit.weights |
Per-unit weight, or 0 if unit does not meet condition specified by subset argument. If there are clusters, the cluster weight is the sum of unit weights of elements within the cluster. Within each stratum, unit weights will be normalized to sum to the number of clusters in the stratum. |
stratum.weights |
Function returning non-negative weight for each stratum; see details. |
subset |
Optional: condition or vector specifying a subset of observations to be permitted to have positive unit weights. |
include.NA.flags |
Present item missingness comparisons as well as covariates themselves? |
covariate.scales |
covariate dispersion estimates to use
as denominators of |
post.alignment.transform |
Optional transformation applied to covariates just after their stratum means are subtracted off. Should accept a vector of weights as its second argument. |
inferentials.calculator |
Function; calculates ‘inferential’ statistics. (Not currently intended for use by end-users.) |
p.adjust.method |
Method of p-value adjustment for the univariate tests. See the |
Details
Given a grouping variable (treatment assignment, exposure status, etc) and variables on which to compare the groups, compare averages across groups and test hypothesis of no selection into groups on the basis of that variable. The multivariate test is the method of combined differences discussed by Hansen and Bowers (2008, Statist. Sci.), a variant of Hotelling's T-squared test; the univariate tests are presented with multiplicity adjustments, the details of which can be controlled by the user. Clustering, weighting and/or stratification variables can be provided, and are addressed by the tests.
The function assembles various univariate descriptive statistics
for the groups to be compared: (weighted) means of treatment and
control groups; differences of these (adjusted differences); and
adjusted differences as multiples of a pooled s.d. of the variable
in the treatment and control groups (standard differences). Pooled
s.d.s are calculated with weights but without attention to clustering,
and ordinarily without attention to stratification. (If the user does
not request unstratified comparisons, overriding the default setting,
then pooled s.d.s are calculated with weights corresponding to the first
stratification for which comparison is requested. In this case as
in the default setting, the same pooled s.d.s are used for standardization
under each stratification considered. This facilitates comparison of
standard differences across stratification schemes.) Means
are contrasted separately for each provided stratifying factor and, by
default, for the unstratified comparison, in each case with weights
reflecting a standardization appropriate to the designated (post-)
stratification of the sample. In the case without stratification
or clustering, the only weighting used to calculate treatment and
control group means is that provided by the user as
unit.weights
; in the absence of such an argument, these
means are unweighted. When there are strata, within-stratum means
of treatment or of control observations are calculated using
unit.weights
, if provided, and then these are combined
across strata according to a ‘effect of treatment on
treated’-type weighting scheme. (The function's
stratum.weights
argument figures in the function's
inferential calculations but not these descriptive calculations.)
To figure a stratum's effect of treatment on treated weight, the
sum of all unit.weights
associated with treatment or
control group observations within the stratum is multiplied by the
fraction of clusters in that stratum that are associated with the
treatment rather than the control condition. (Unless this
fraction is 0 or 1, in which case the stratum is downweighted to
0.)
The function also calculates univariate and multivariate inferential
statistics, targeting the hypothesis that assignment was random within strata. These
calculations also pool unit.weights
-weighted, within-stratum group means across strata,
but the default weighting of strata differs from that of the descriptive calculations.
With stratum.weights=harmonic_times_mean_weight
(the default), each stratum
is weighted in proportion to the product of the stratum mean of unit.weights
and the harmonic mean 1/[(1/a + 1/b)/2]=2*a*b/(a+b)
of the number of
treated units (a) and control units (b) in the stratum; this weighting is optimal
under certain modeling assumptions (discussed in Kalton 1968 and Hansen and
Bowers 2008, Sections 3.2 and 5). The multivariate assessment is based on a Mahalanobis-type
distance that combines each of the univariate mean differences while accounting
for correlations among them. It's similar to the Hotelling's T-squared statistic,
except standardized using a permutation covariance. See Hansen and Bowers (2008).
In contrast to the earlier function xBalance
that it is intended to replace,
balanceTest
accepts only binary assignment variables (for now).
stratum.weights
must be a function of a single argument,
a data frame containing the variables in data
and
additionally Tx.grp
, stratum.code
, and unit.weights
,
returning a named numeric vector of non-negative weights identified by stratum.
(For an example, enter getFromNamespace("harmonic", "RItools")
.)
the data stratum.weights
function.
If the stratifying factor has NAs, these cases are dropped. On the other hand, if NAs in a covariate are found then those observations are dropped for descriptive calculations and "imputed" to the stratum mean of the variable for inferential calculations. When covariate values are dropped due to missingness, proportions of observations not missing on that variable are recorded and returned. The printed output presents non-missing proportions alongside of the variables themselves, distinguishing the former by placing them at the bottom of the list and enclosing the variable's name in parentheses. If a variable shares a missingness pattern with other another variable, its missingness information may be labeled with the name of the other variable in the output.
Value
An object of class c("balancetest", "xbal", "list")
. Several
methods are inherited from the "xbal" class returned by
xBalance
function.
Note
Evidence pertaining to the hypothesis that a treatment variable is not associated with differences in covariate values is assessed by comparing the differences of means, without standardization, to their distributions under hypothetical shuffles of the treatment variable, a permutation or randomization distribution. For the unstratified comparison, this reference distribution consists of differences as the treatment assignments of clusters are freely permuted. For stratified comparisons, the reference distributions describes re-randomizations of this type performed separately in each stratum. Significance assessments are based on large-sample approximations to these reference distributions.
Author(s)
Ben Hansen and Jake Bowers and Mark Fredrickson
References
Hansen, B.B. and Bowers, J. (2008), “Covariate Balance in Simple, Stratified and Clustered Comparative Studies,” Statistical Science 23.
Kalton, G. (1968), “Standardization: A technique to control for extraneous variables,” Applied Statistics 17, 118–136.
See Also
Examples
data(nuclearplants)
## No strata
balanceTest(pr ~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
data=nuclearplants)
## Stratified
## Note use of the `. - cost` to use all columns except `cost`
balanceTest(pr ~ . - cost + strata(pt),
data=nuclearplants)
##Missing data handling.
testdata <- nuclearplants
testdata$date[testdata$date < 68] <- NA
balanceTest(pr ~ . - cost + strata(pt),
data = testdata)
## Variable-by-variable Wilcoxon rank sum tests, with an omnibus test
## of multivariate differences on rank scale.
balanceTest(pr ~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
data = nuclearplants,
post.alignment.transform = function(x, weights) rank(x))
## (Note that the post alignment transform is expected to be a function
## accepting a second argument, even if the argument is not used.
## The unit weights vector will be provided as this second argument,
## enabling use of e.g. `post.alignment.transform=Hmisc::wtd.rank`
## to furnish a version of the Wilcoxon test even when there are clusters and/or weights.)
## An experiment where clusters of individuals are assigned to treatment within strata
## assessing balance of cluster level treatment on both cluster
## and individual level baseline attributes
data(ym_long)
## Look at balance on teriles of cluster size as well as other variables
teriles <- quantile(ym_long$n_practice, seq(1/3,1,by=1/3))
teriles <- c(0, teriles)
balanceTest(trt ~ cut(n_practice, teriles)+assessed+hypo+lipid+
aspirin+strata(assess_strata)+cluster(practice),
data=ym_long)
balanceTest helper function
Description
Makes strata weights
Usage
balanceTest.make.stratwts(stratum.weights, ss.df, zz, data, normalize.weights)
Arguments
stratum.weights |
Weights |
ss.df |
df. |
zz |
treatment |
data |
data |
normalize.weights |
weights |
Value
list
xBalance helper function
Description
Make engine
Usage
balanceTestEngine(
ss,
zz,
mm,
report,
swt,
s.p,
normalize.weights,
zzname,
post.align.trans,
p.adjust.method
)
Arguments
ss |
ss |
zz |
zz |
mm |
mm |
report |
report |
swt |
swt |
s.p |
s.p |
normalize.weights |
normalize.weights |
zzname |
zzname |
post.align.trans |
post.align.trans |
p.adjust.method |
Method to adjust P. |
Value
List
Create a plot of the balance on variables across different stratifications.
Description
This plotting function summarizes variable by stratification matrices. For
each variable (a row in the x
argument), the values are under each
stratification (the columns of x
) plotted on the same line.
Usage
balanceplot(
x,
ordered = FALSE,
segments = TRUE,
colors = "black",
shapes = c(15, 16, 17, 18, 0, 1, 10, 12, 13, 14),
segments.args = list(col = "grey"),
points.args = list(cex = 1),
xlab = "Balance",
xrange = NULL,
groups = NULL,
tiptext = NULL,
include.legend = TRUE,
legend.title = NULL,
plotfun = .balanceplot,
...
)
Arguments
x |
A matrix of variables (rows) by strata (columns). |
ordered |
Should the variables be ordered from most to least imbalance on the first statistic? |
segments |
Should lines be drawn between points for each variable? |
colors |
Either a vector or a matrix of shape indicators
suitable to use as a |
shapes |
Either a vector or a matrix of shape indicators
suitable to use as a |
segments.args |
A list of arguments to pass to the
|
points.args |
A list of arguments to pass to the |
xlab |
The label of the x-axis of the plot. |
xrange |
The range of x-axis. By default, it is 1.25 times the range of |
groups |
A factor that indicates the group of each row in
|
tiptext |
ignored (legacy argument retained for internal reasons)
<!– If you are using the |
include.legend |
Should a legend be included? |
legend.title |
An optional title to attach to the legend. |
plotfun |
Function to do the plotting; defaults to [RItools:::.balanceplot] |
... |
Additional arguments to pass to |
Details
It is conventional to standardize the differences to common scale
(e.g. z-scores), but this is not required. When ordered
is
set to true, plotting will automatically order the data from
largest imbalance to smallest based on the first column of
x
.
You can fine tune the colors and shapes with the like named
arguments. Any other arguments to the points
function
can be passed in a list as points.args
. Likewise, you can
fine tune the segments between points with segments.args
.
Value
Returns NULL, displays plot
See Also
plot.xbal
, xBalance
,
segments
, points
Examples
set.seed(20121204)
# generate some balance data
nvars <- 10
varnames <- paste("V", letters[1:nvars])
balance_data <- matrix(c(rnorm(n = nvars, mean = 1, sd = 0.5),
rnorm(n = nvars, mean = 0, sd = 0.5)),
ncol = 2)
colnames(balance_data) <- c("Before Adjustment", "After Matching")
rownames(balance_data) <- varnames
balanceplot(balance_data,
colors = c("red", "green"),
xlab = "Balance Before/After Matching")
# base R graphics are allowed
abline(v = colMeans(balance_data), lty = 3, col = "grey")
Generate Descriptives
Description
Use a design object to generate descriptive statistics that ignore clustering. Stratum weights are respected if provided (by passing a design arg of class StratumWeightedDesignOptions). If not provided, stratum weights default to "Effect of Treatment on Treated" weighting. That is, when combining within-stratum averages (which will themselves have been weighted by unit weights), each stratum receives a weight equal to the product of the stratum sum of unit weights with the fraction of clusters within the stratum that were assigned to the treatment condition.
Usage
designToDescriptives(design, covariate.scales = NULL)
Arguments
design |
A DesignOptions object |
covariate.scales |
Scale estimates for covariates, a named numeric vector |
Details
By default, covariates are scaled by their pooled s.d.s, square roots
of half of their treatment group variances plus half of their control
group variances. If weights are provided, these are weighted variances.
If descriptives are requested for an unstratified setup, i.e. a
stratification named ‘--
’, then covariate s.d.s
are calculated against it; otherwise the variances reflect stratification,
and are calculated against the first stratification found. Either way,
if descriptives are calculated for multiple stratifications, only one
set of covariate s.d.s will have been calculated, and these underlie
standard difference calculations for each of the stratifications.
If a named numeric covariate.scales
argument is provided, any
covariates named in the vector will have their pooled s.d.s taken from
it, rather than from the internal calculation.
Value
Descriptives
Number of treatment clusters by stratum
Description
Calculate the number of treatment clusters by stratum – these being proportional to "effect of treatment on treated" weights when assignment probabilities are uniform within each stratum.
Usage
effectOfTreatmentOnTreated(data)
Arguments
data |
Data. |
Details
NB: currently, i.e. as of this inline note's commit, this function is used only in testing.
Value
Cluster count
Flattens xBalance output.
Description
Details...
Usage
flatten.xbalresult(
x,
show.signif.stars = getOption("show.signif.stars"),
show.pvals = !show.signif.stars,
...
)
Arguments
x |
x |
show.signif.stars |
Should signif stars be shown? |
show.pvals |
Should p-vals be shown? |
... |
Ignored |
Value
Structure
Returns formula
attribute of an xbal
object.
Description
Returns formula
attribute of an xbal
object.
Usage
## S3 method for class 'xbal'
formula(x, ...)
Arguments
x |
An |
... |
Ignored. |
Value
The formula corresponding to xbal
.
Helper function to slm_fit_csr
Description
This function generates a matrix that can be used to reduce the dimensions of x'x and xy such that positive definiteness is ensured and more practically, that SparseM::chol will work
Usage
gramian_reduction(zeroes)
Arguments
zeroes |
logical vector indicating which entries of the diagonal of x'x are zeroes. |
Value
SparseM matrix that will reduce the dimension of x'x and xy
Harmonic mean
Description
Calculate harmonic mean
Usage
harmonic(data)
Arguments
data |
Data. |
Value
numeric vector of length nlevels(data$stratum.code)
Harmonic mean times mean of weights
Description
Harmonic mean times mean of weights
Usage
harmonic_times_mean_weight(data)
Arguments
data |
Value
numeric vector of length nlevels(data$stratum.code)
Identify vars recording not-missing (NM) info
Description
ID variables recording NM information, from names and positions in the variable list. Presumption is that the NM cols appear at the end of the list of vars and are encased in ‘()’. If something in the code changes to make this assumption untrue, then this helper is designed to err on the side of not identifying other columns as NM cols.
Usage
identify_NM_vars(vnames)
Arguments
vnames |
character, variable names |
Value
character vector of names of NM vars, possibly of length 0
Author(s)
Hansen
Create a DesignOptions object from a formula and some data
Description
The formula must have a left hand side that can be converted to a logical.
Usage
makeDesigns(fmla, data)
Arguments
fmla |
Formula |
data |
Data |
Details
On the RHS: - It may have at most one cluster() argument. - It may have one or more strata() arguments. - All other variables are considered covariates.
NAs in a cluster() or strata() variable will be dropped. NAs in covariates will be passed through, but without being flagged as NotMissing (as available data items will)
Value
DesignOptions
Get p-value for Z-stats
Description
Get p-value for Z-stats
Usage
makePval(zs)
Arguments
zs |
A Z-statistic. |
Value
A P-value
Model matrices along with compact encodings of data availability/missingness
Description
Grow a model matrix while at the same time compactly encoding missingness patterns in RHS variables of a model frame.
Usage
model_matrix(object, data = environment(object), remove.intercept = TRUE, ...)
Arguments
object |
Model formula or terms object (as in 'model.matrix') |
data |
data.frame, as in 'model.matrix()' but has to have ‘ |
remove.intercept |
logical |
... |
passed to 'model.matrix.default' (and further) |
Value
ModelMatrixPlus, i.e. model matrix enriched with missing data info
Author(s)
Ben B Hansen
Impute NA's
Description
Function used to fill NAs with imputation values, while adding NA flags to the data.
Usage
naImpute(FMLA, DATA, impfn = median, na.rm = TRUE, include.NA.flags = TRUE)
Arguments
FMLA |
Formula |
DATA |
Data |
impfn |
Function for imputing. |
na.rm |
What to do with NA's |
include.NA.flags |
Should NA flags be included |
Value
Structure
Nuclear Power Station Construction Data
Description
The data relate to the construction of 32 light water reactor (LWR) plants constructed in the U.S.A in the late 1960's and early 1970's. The data was collected with the aim of predicting the cost of construction of further LWR plants. 6 of the power plants had partial turnkey guarantees and it is possible that, for these plants, some manufacturers' subsidies may be hidden in the quoted capital costs.
Usage
nuclearplants
Format
A data frame with 32 rows and 11 columns
cost: The capital cost of construction in millions of dollars adjusted to 1976 base.
date: The date on which the construction permit was issued. The data are measured in years since January 1 1990 to the nearest month.
t1: The time between application for and issue of the construction permit.
t2: The time between issue of operating license and construction permit.
cap: The net capacity of the power plant (MWe).
pr: A binary variable where
1
indicates the prior existence of a LWR plant at the same site.ne: A binary variable where
1
indicates that the plant was constructed in the north-east region of the U.S.A.ct: A binary variable where
1
indicates the use of a cooling tower in the plant.bw: A binary variable where
1
indicates that the nuclear steam supply system was manufactured by Babcock-Wilcox.cum.n: The cumulative number of power plants constructed by each architect-engineer.
pt: A binary variable where
1
indicates those plants with partial turnkey guarantees.
Source
The data were obtained from the boot
package, for
which they were in turn taken from Cox and Snell (1981). Although
the data themselves are the same as those in the nuclear
data frame in the boot
package, the row names of the data
frame have been changed. (The new row names were selected to
ease certain demonstrations in optmatch
.)
This documentation page is also adapted from the boot
package, written by Angelo Canty and ported to R by Brian Ripley.
References
Cox, D.R. and Snell, E.J. (1981) Applied Statistics: Principles and Examples. Chapman and Hall.
Formatting suitable for stat expressed in units specific to var
Description
formats a var-stat-strata array by var, with rounding potentially rounding a bit less for an "adj.diff" column
Usage
original_units_var_formatter(arr, digits, var_format = list())
Arguments
arr |
numeric array |
digits |
number of digits for rounding |
var_format |
A list of lists. Each named item of the outer list will be matched to a variable. The inner lists should have two items, 'mean' and 'diff'. The first formats statistics based on averages. The 'diff' item should format statistics that are differences. |
Value
array of same dimension as arr but of type character
Author(s)
Hansen
Plot of balance across multiple strata
Description
The plot allows a quick visual comparison of the effect of different
stratification designs on the comparability of different
variables. This is not a replacement for the omnibus statistical test
reported as part of print.xbal
. This plot does allow the
analyst an easy way to identify variables that might be the primary culprits
of overall imbalances and/or a way to assess whether certain important
covariates might be imbalanced even if the omnibus test reports that
the stratification overall produces balance.
Usage
## S3 method for class 'balancetest'
plot(
x,
xlab = "Standardized Differences",
statistic = "std.diff",
absolute = FALSE,
strata.labels = NULL,
variable.labels = NULL,
groups = NULL,
...
)
Arguments
x |
An object returned by |
xlab |
The label for the x-axis of the plot |
statistic |
The statistic to plot. The default choice of standardized difference is a good choice as it will have roughly the same scale for all plotted variables. |
absolute |
Convert the results to the absolute value of the statistic. |
strata.labels |
A named vector of the from |
variable.labels |
A named vector of the from |
groups |
A vector of group names for each variable in
|
... |
additional arguments to pass to |
Details
By default all variables and all strata are plotted. The scope
of the plot can be reduced by using the subset.xbal
function to
make a smaller xbal
object with only the desired variables or
strata.
balanceTest
can produce several different summary statistics for
each variable, any of which can serve as the data for this plot. By default,
the standardized differences between treated and control units makes a good
choice as all variables are on the same scale. Other statistics can be
selected using the statistic
argument.
The result of this function is a ggplot
object. Most
display of the plot can be manipulated using additional commands appended to
the plot option. For example, the entire theme of the plot can be changed to
black and white using plot(b) + theme_bw()
, where b
is the
result of a call to balanceTest
. The points on the plot are
known as "values", so colors or symbols used for each strata can be updated
using the scale_color_manual
function. For example,
plot(b) + scale_color_manaual(values = c('red', 'green', 'blue'))
for
a balance test of three stratification variables.
Value
A ggplot2
object that can be further manipulated (e.g., to set the colors or text).
See Also
Plot of balance across multiple strata
Description
The plot allows a quick visual comparison of the effect of different
stratification designs on the comparability of different
variables. This is not a replacement for the omnibus statistical test
reported as part of print.xbal
. This plot does allow the
analyst an easy way to identify variables that might be the primary culprits
of overall imbalances and/or a way to assess whether certain important
covariates might be imbalanced even if the omnibus test reports that
the stratification overall produces balance.
Usage
## S3 method for class 'xbal'
plot(
x,
xlab = "Standardized Differences",
statistic = "std.diff",
absolute = FALSE,
strata.labels = NULL,
variable.labels = NULL,
groups = NULL,
ggplot = FALSE,
...
)
Arguments
x |
An object returned by |
xlab |
The label for the x-axis of the plot |
statistic |
The statistic to plot. The default choice of standardized difference is a good choice as it will have roughly the same scale for all plotted variables. |
absolute |
Convert the results to the absolute value of the statistic. |
strata.labels |
A named vector of the from |
variable.labels |
A named vector of the from |
groups |
A vector of group names for each variable in
|
ggplot |
Use ggplot2 to create figure. By default, uses base R graphics. |
... |
additional arguments to pass to |
Details
By default all variables and all strata are plotted. The scope
of the plot can be reduced by using the subset.xbal
function to
make a smaller xbal
object with only the desired variables or
strata.
xBalance
can produce several different summary statistics for
each variable, any of which can serve as the data for this plot. By default,
the standardized differences between treated and control units makes a good
choice as all variables are on the same scale. Other statistics can be
selected using the statistic
argument.
Value
Returns NULL, displays plot
See Also
xBalance
, subset.xbal
, balanceplot
Examples
data(nuclearplants)
xb <- xBalance(pr ~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
strata = list(none = NULL, pt = ~pt),
data = nuclearplants)
# Using the default grouping:
plot(xb, variable.labels = c(date = "Date",
t1 = "Time 1",
t2 = "Time 2",
cap = "Capacity",
ne = "In North East",
ct = "Cooling Tower",
bw = "Babcock-Wilcox",
cum.n = "Total Plants Built"),
strata.labels = c("--" = "Raw Data", "pt" = "Partial Turn-key"),
absolute = TRUE)
# Using user supplied grouping
plot(xb, variable.labels = c(date = "Date",
t1 = "Time 1",
t2 = "Time 2",
cap = "Capacity",
ne = "In North East",
ct = "Cooling Tower",
bw = "Babcock-Wilcox",
cum.n = "Total Plants Built"),
strata.labels = c("--" = "Raw Data", "pt" = "Partial Turn-key"),
absolute = TRUE,
groups = c("Group A", "Group A", "Group A", "Group B",
"Group B", "Group B", "Group A", "Group B"))
Printing xBalance and balanceTest Objects
Description
A print
method for balance test objects produced by xBalance
and balanceTest
.
Usage
## S3 method for class 'xbal'
print(
x,
which.strata = dimnames(x$results)[["strata"]],
which.stats = dimnames(x$results)[["stat"]],
which.vars = dimnames(x$results)[["vars"]],
print.overall = TRUE,
digits = NULL,
printme = TRUE,
show.signif.stars = getOption("show.signif.stars"),
show.pvals = !show.signif.stars,
horizontal = TRUE,
report = NULL,
...
)
Arguments
x |
An object of class "xbal" which is the result of a call
to |
which.strata |
The stratification candidates to include in the printout. Default is all. |
which.stats |
The test statistics to include. Default is all
those requested from the call to |
which.vars |
The variables for which test information should be displayed. Default is all. |
print.overall |
Should the omnibus test be reported? Default
is |
digits |
To how many digits should the results be displayed?
Default is |
printme |
Print the table to the console? Default is
|
show.signif.stars |
Use stars to indicate z-statistics larger
than conventional thresholds. Default is |
show.pvals |
Instead of stars, use p-values to summarize the
information in the z-statistics. Default is |
horizontal |
Display the results for different candidate
stratifications side-by-side (Default, |
report |
What to report. |
... |
Other arguements. Not currently used. |
Value
- vartable
The formatted table of variable-by-variable statistics for each stratification.
- overalltable
If the overall Chi-squared statistic is requested, a formatted version of that table is returned.
See Also
Examples
data(nuclearplants)
xb1 <- balanceTest(pr ~ date + t1 + t2 + cap + ne + ct + bw + cum.n + strata(pt),
data = nuclearplants)
print(xb1)
print(xb1, show.pvals = TRUE)
print(xb1, horizontal = FALSE)
## The following doesn't work yet.
## Not run: print(xb1, which.vars=c("date","t1"),
which.stats=c("adj.means","z.scores","p.values"))
## End(Not run)
## The following example prints the adjusted means
## labeled as "treatmentvar=0" and "treatmentvar=1" using the
## formula provided to xBalance().
# This is erroring with the change to devtools, FIXME
## Not run: print(xb1,
which.vars = c("date", "t1"),
which.stats = c("pr=0", "pr=1", "z", "p"))
## End(Not run)
## Only printing out a specific stratification factor
xb2 <- balanceTest(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n + strata(pt),
data = nuclearplants)
print(xb2, which.strata = "pt")
Scale DesignOptions
Description
Scale DesignOptions
Usage
## S3 method for class 'DesignOptions'
scale(x, center = TRUE, scale = TRUE)
Arguments
x |
DesignOptions object |
center |
logical, or a function acceptable as |
scale |
logical, whether to scale |
SparseM::slm.fit.csr, made tolerant to faults that recur in RItools
Description
[SparseM's slm.fit.csr()] expects a full-rank x that's not just a column of 1s. This variant somewhat relaxes these expectations.
Usage
slm_fit_csr(x, y, ...)
Arguments
x |
As slm.fit.csr |
y |
As slm.fit.csr |
... |
As slm.fit.csr |
Details
'slm.fit.csr' has a bug for intercept only models (admittedly, these are generally a little silly to be done as a sparse matrix), but in order to avoid duplicate code, if everything is in a single strata, we use the intercept only model.
This function's expectation of x is that either it has full column rank, or the reduced submatrix of x that excludes all-zero columns has full column rank. (When this expectation is not met, it's likely that [SparseM::chol()] will fail, causing this function to error; the error messages won't necessarily suggest this.) The positions of nonzero x-columns (ie columns with nonzero entries) are returns as the value of 'gramian_reduction_index', while 'chol' is the Cholesky decomposition of that submatrix's Gramian.
Value
A list consisting of:
coefficients |
coefficients |
chol |
Cholesky factor of Gramian matrix |
residuals |
residuals |
fitted |
fitted values |
df.residual |
degrees of freedom |
gramian_reduction_index |
Column indices identifying reduction of x matrix of which Gramian is taken; see Details |
Convert Matrix to vector
Description
Convert Matrix to vector
Usage
sparseToVec(s, column = TRUE)
Arguments
s |
Matrix |
column |
Column (TRUE) or row (FALSE)? |
Value
vector
Select variables, strata, and statistics from a xbal
or balancetest
object
Description
If any of the arguments are not specified, all the of relevant items are included.
Usage
## S3 method for class 'xbal'
subset(x, vars = NULL, strata = NULL, stats = NULL, tests = NULL, ...)
Arguments
x |
The |
vars |
The variable names to select. |
strata |
The strata names to select. |
stats |
The names of the variable level statistics to select. |
tests |
The names of the group level tests to select. |
... |
Other arguments (ignored) |
Value
A xbal
object with just the appropriate items
selected.
broom::tidy()
/glance()
methods for balanceTest()
results
Description
Portion out the value of a balanceTest()
call in a manner consistent
with assumptions of the broom package.
Usage
tidy.xbal(
x,
strata = dimnames(x[["results"]])[["strata"]][1],
varnames_crosswalk = c(z = "statistic", p = "p.value"),
format = FALSE,
digits = max(2, getOption("digits") - 4),
...
)
glance.xbal(x, strata = dimnames(x[["results"]])[["strata"]][1], ...)
Arguments
x |
object of class |
strata |
which stratification to return info about? Defaults to last one specified in originating function call (which appears first in the xbal array). |
varnames_crosswalk |
character vector of new names for xbal columns, named by the xbal column |
format |
if true, apply |
digits |
passed to |
... |
Additional arguments passed to |
Details
tidy.xbal()
gives per-variable
statistics whereas glance.xbal()
extracts combined-difference related
calculations. In both cases one has to specify which stratification one wants
statistics about, as xbal objects can store info about several stratifications.
tidy.xbal()
has a parameter varnames_crosswalk
not shared with
glance.xbal()
. It should be a named character vector, the elements
of which give names of columns to be returned and the names of which correspond
to columns of xbal objects' ‘results’ entry. Its ordering dictates the order
of the result. The default value translates between conventional xbal
column names and broom package conventional names.
- vars
variable name
- Control
mean of LHS variable = 0 group
- Treatment
mean of LHS variable = 1 group
- adj.diff
T - C diff w/ direct standardization for strata if applicable
- std.diff
adj.diff/pooled.sd
- pooled.sd
pooled SD
- statistic
z
column from the xbal object- p.value
p
column from the xbal object
Additional parameters beyond those listed here are ignored (at this time).
Value
data frame composed of: for [RItools::tidy()]
, a column of variable labels (vars
) and
additional columns of balance-related stats; for [RItools::glance()]
, scalars describing
a combined differences test, if found, and otherwise NULL
.
Safe way to temporarily override options()
Description
Safe way to temporarily override options()
Usage
withOptions(optionsToChange, fun)
Arguments
optionsToChange |
Which options. |
fun |
Function to run with new options. |
Value
Result of fun
.
Standardized Differences for Stratified Comparisons
Description
Given covariates, a treatment variable, and a stratifying factor, calculates standardized mean differences along each covariate, with and without the stratification and tests for conditional independence of the treatment variable and the covariates within strata.
Usage
xBalance(
fmla,
strata = list(unstrat = NULL),
data,
report = c("std.diffs", "z.scores", "adj.means", "adj.mean.diffs",
"adj.mean.diffs.null.sd", "chisquare.test", "p.values", "all")[1:2],
stratum.weights = harmonic,
na.rm = FALSE,
covariate.scaling = NULL,
normalize.weights = TRUE,
impfn = median,
post.alignment.transform = NULL,
pseudoinversion_tol = .Machine$double.eps
)
Arguments
fmla |
A formula containing an indicator of treatment assignment on the left hand side and covariates at right. |
strata |
A list of right-hand-side-only formulas containing
the factor(s) identifying the strata, with |
data |
A data frame in which |
report |
Character vector listing measures to report for each
stratification; a subset of |
stratum.weights |
Weights to be applied when aggregating
across strata specified by |
na.rm |
Whether to remove rows with NAs on any variables
mentioned on the RHS of |
covariate.scaling |
A scale factor to apply to covariates in
calculating |
normalize.weights |
If |
impfn |
A function to impute missing values when
|
post.alignment.transform |
Optional transformation applied to covariates just after their stratum means are subtracted off. |
pseudoinversion_tol |
The function uses a singular value decomposition to invert a covariance matrix. Singular values less than this tolerance will be treated as zero. |
Details
Note: the newer balanceTest
function provides the same
functionality as xBalance
with additional support for clustered
designs. While there are no plans to deprecate xBalance
, users are
encouraged to use balanceTest
going forward.
In the unstratified case, the standardized difference of covariate
means is the mean in the treatment group minus the mean in the
control group, divided by the S.D. (standard deviation) in the
same variable estimated by pooling treatment and control group
S.D.s on the same variable. In the stratified case, the
denominator of the standardized difference remains the same but
the numerator is a weighted average of within-stratum differences
in means on the covariate. By default, each stratum is weighted
in proportion to the harmonic mean 1/[(1/a +
1/b)/2]=2*a*b/(a+b)
of the number of treated units (a) and
control units (b) in the stratum; this weighting is optimal under
certain modeling assumptions (discussed in Kalton 1968, Hansen and
Bowers 2008). This weighting can be modified using the
stratum.weights
argument; see below.
When the treatment variable, the variable specified by the
left-hand side of fmla
, is not binary, xBalance
calculates the covariates' regressions on the treatment variable,
in the stratified case pooling these regressions across strata
using weights that default to the stratum-wise sum of squared
deviations of the treatment variable from its stratum mean.
(Applied to binary treatment variables, this recipe gives the same
result as the one given above.) In the numerator of the
standardized difference, we get a “pooled S.D.” from separating
units into two groups, one in which the treatment variable is 0 or
less and another in which it is positive. If report
includes "adj.means", covariate means for the former of these
groups are reported, along with the sums of these means and the
covariates' regressions on either the treatment variable, in the
unstratified (“pre”) case, or the treatment variable and the
strata, in the stratified (“post”) case.
stratum.weights
can be either a function or a numeric
vector of weights. If it is a numeric vector, it should be
non-negative and it should have stratum names as its names. (i.e.,
its names should be equal to the levels of the factor specified by
strata
.) If it is a function, it should accept one
argument, a data frame containing the variables in data
and
additionally Tx.grp
and stratum.code
, and return a
vector of non-negative weights with stratum codes as names; for an
example, do getFromNamespace("harmonic", "RItools")
.
If covariate.scaling
is not NULL
, no scaling is
applied. This behavior is likely to change in future versions.
(If you want no scaling, set covariate.scaling=1
, as this
is likely to retain this meaning in the future.)
adj.mean.diffs.null.sd
returns the standard deviation of
the Normal approximated randomization distribution of the
strata-adjusted difference of means under the strict null of no
effect.
Value
An object of class c("xbal", "list")
. There are
plot
, print
, and xtable
methods for class
"xbal"
; the print
method is demonstrated in the
examples.
Note
Evidence pertaining to the hypothesis that a treatment variable is not associated with differences in covariate values is assessed by comparing the differences of means (or regression coefficients), without standardization, to their distributions under hypothetical shuffles of the treatment variable, a permutation or randomization distribution. For the unstratified comparison, this reference distribution consists of differences (more generally, regression coefficients) when the treatment variable is permuted without regard to strata. For the stratified comparison, the reference distribution is determined by randomly permuting the treatment variable within strata, then re-calculating the treatment-control differences (regressions of each covariate on the permuted treatment variable). Significance assessments are based on the large-sample Normal approximation to these reference distributions.
Author(s)
Ben Hansen and Jake Bowers and Mark Fredrickson
References
Hansen, B.B. and Bowers, J. (2008), “Covariate Balance in Simple, Stratified and Clustered Comparative Studies,” Statistical Science 23.
Kalton, G. (1968), “Standardization: A technique to control for extraneous variables,” Applied Statistics 17, 118–136.
See Also
Examples
data(nuclearplants)
##No strata, default output
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
data=nuclearplants)
##No strata, all output
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
data=nuclearplants,
report=c("all"))
##Stratified, all output
xBalance(pr~.-cost-pt, strata=factor(nuclearplants$pt),
data=nuclearplants,
report=c("adj.means", "adj.mean.diffs",
"adj.mean.diffs.null.sd",
"chisquare.test", "std.diffs",
"z.scores", "p.values"))
##Comparing unstratified to stratified, just adjusted means and
#omnibus test
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
strata=list(unstrat=NULL, pt=~pt),
data=nuclearplants,
report=c("adj.means", "chisquare.test"))
##Comparing unstratified to stratified, just adjusted means and
#omnibus test
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
strata=data.frame(unstrat=factor('none'),
pt=factor(nuclearplants$pt)),
data=nuclearplants,
report=c("adj.means", "chisquare.test"))
##Missing data handling.
testdata<-nuclearplants
testdata$date[testdata$date<68]<-NA
##na.rm=FALSE by default
xBalance(pr ~ date, data = testdata, report="all")
xBalance(pr ~ date, data = testdata, na.rm = TRUE,report="all")
##To match versions of RItools 0.1-9 and older, impute means
#rather than medians.
##Not run, impfn option is not implemented in the most recent version
## Not run: xBalance(pr ~ date, data = testdata, na.rm = FALSE,
report="all", impfn=mean.default)
## End(Not run)
##Comparing unstratified to stratified, just one-by-one wilcoxon
#rank sum tests and omnibus test of multivariate differences on
#rank scale.
xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
strata=data.frame(unstrat=factor('none'),
pt=factor(nuclearplants$pt)),
data=nuclearplants,
report=c("adj.means", "chisquare.test"),
post.alignment.transform=rank)
xBalance helper function
Description
Finds good strata
Usage
xBalance.find.goodstrats(ss.df, zz, mm)
Arguments
ss.df |
Degrees of freedom. |
zz |
Treatment |
mm |
mm |
Value
Data.frame
xBalance helper function
Description
Make pooled SD
Usage
xBalance.makepooledsd(zz, mm, pre.n)
Arguments
zz |
Treatment |
mm |
mm |
pre.n |
pre.n |
Value
pooled SD
An xtable
method for xbal
and balancetest
objects
Description
This function uses the xtable
package
framework to display the results of a call to
balanceTest
in LaTeX format. At the moment, it ignores
the omnibus chi-squared test information.
Usage
## S3 method for class 'xbal'
xtable(
x,
caption = NULL,
label = NULL,
align = c("l", rep("r", ncol(xvardf))),
digits = 2,
display = NULL,
auto = FALSE,
col.labels = NULL,
...
)
Arguments
x |
An object resulting from a call to
|
caption |
See |
label |
See |
align |
See |
digits |
See |
display |
See |
auto |
See |
col.labels |
Labels for the columns (the test
statistics). Default are come from the call to
|
... |
Other arguments to |
Details
The resulting LaTeX will present one row for each variable in the
formula originally passed to balanceTest
, using the
variable name used in the original formula. If you wish to have
reader friendly labels instead of the original variables names,
see the code examples below.
To get decimal aligned columns, specify align=c("l",
rep(".", <ncols>))
, where <ncols>
is the number of columns
to be printed, in your call to xtable
. Then use the
dcolumn
package and define ‘'.'’ within LaTeX: add the
lines \usepackage{dcolumn}
and
\newcolumntype{.}{D{.}{.}{2.2}}
to your LaTeX
document's preamble.
Value
This function produces an xtable
object which can
then be printed with the appropriate print
method (see
print.xtable
).
Examples
data(nuclearplants)
require(xtable)
# Test balance on a variety of variables, with the 'pr' factor
# indicating which sites are control and treatment units, with
# stratification by the 'pt' factor to group similar sites
xb1 <- balanceTest(pr ~ date + t1 + t2 + cap + ne + ct + bw + cum.n + strata(pt),
data = nuclearplants)
xb1.xtab <- xtable(xb1) # This table has right aligned columns
# Add user friendly names in the final table
rownames(xb1.xtab) <- c("Date", "Application to Contruction Time",
"License to Construction Time", "Net Capacity", "Northeast Region", "Cooling Tower",
"Babcock-Wilcox Steam", "Cumlative Plants")
print(xb1.xtab,
add.to.row = attr(xb1.xtab, "latex.add.to.row"),
hline.after = c(0, nrow(xb1.xtab)),
sanitize.text.function = function(x){x},
floating = TRUE,
floating.environment = "sidewaystable")
ASSIST Trial Data from Yudkin and Moher 2001
Description
The ASSIST Trial baseline data from Yudkin and Moher 2001 consist of 21 general practices containing 2142 patients used for the design of a randomized trial which assigned to three treatments aiming to compare methods of preventing coronary heart disease. We have expanded the aggregated data from the practice level to the individual level and added a simulated randomized treatment variable.
Usage
ym_long
Format
A data frame with 2142 rows and 9 columns
practice: Identifier for the practice
id: Identifier for patients
n_practice: Number of patients with CHD in the practice
assessed: Whether a patient was coded as "adequately assessed" (the outcome of the study, measured here at baseline).
aspirin: Whether a patient was treated with aspirin at baseline.
hypo: Whether a patient was treated with hypotensives at baseline.
lipid: Whether patient was treated with lipid-lowering drugs at baseline.
assess_strata: Strata of the practice defined by the proportion of people adequately assessed at baseline (three strata, following Yudkin and Moher 2001).
trt: A simulated binary treatment, assigned at random at the practice level within levels of assess_strata.
Source
The data come from Table II on page 345 of Yudkin and Moher 2001, Statistics in Medicine.
References
Yudkin, P. L. and Moher, M. 2001. "Putting theory into practice: a cluster randomized trial with a small number of clusters" Statistics in Medicine, 20:341-349.
ASSIST Trial Data from Yudkin and Moher 2001
Description
The ASSIST Trial baseline data from Yudkin and Moher 2001 consist of 21 general practices containing 2142 patients used for the design of a randomized trial which assigned to three treatments aiming to compare methods of preventing coronary heart disease. This data frame is aggregated to the practice level. We added a simulated randomized treatment variable.
Usage
ym_short
Format
A data frame with 21 rows and 8 columns
practice: Identifier for the practice
n_practice: Number of patients with CHD in the practice
assessed: Whether a patient was coded as "adequately assessed" (the outcome of the study, measured here at baseline).
aspirin: Whether a patient was treated with aspirin at baseline.
hypo: Whether a patient was treated with hypotensives at baseline.
lipid: Whether patient was treated with lipid-lowering drugs at baseline.
assess_strata: Strata of the practice defined by the proportion of people adequately assessed at baseline (three strata, following Yudkin and Moher 2001).
trt: A simulated binary treatment, assigned at random at the practice level within levels of assess_strata.
Source
The data come from Table II on page 345 of Yudkin and Moher 2001, Statistics in Medicine.
References
Yudkin, P. L. and Moher, M. 2001. "Putting theory into practice: a cluster randomized trial with a small number of clusters" Statistics in Medicine, 20:341-349.