Type: | Package |
Title: | Bayesian Probit Choice Modeling |
Version: | 1.1.4 |
Description: | Bayes estimation of probit choice models, both in the cross-sectional and panel setting. The package can analyze binary, multivariate, ordered, and ranked choices, as well as heterogeneity of choice behavior among deciders. The main functionality includes model fitting via Markov chain Monte Carlo m ethods, tools for convergence diagnostic, choice data simulation, in-sample and out-of-sample choice prediction, and model selection using information criteria and Bayes factors. The latent class model extension facilitates preference-based decider classification, where the number of latent classes can be inferred via the Dirichlet process or a weight-based updating heuristic. This allows for flexible modeling of choice behavior without the need to impose structural constraints. For a reference on the method see Oelschlaeger and Bauer (2021) https://trid.trb.org/view/1759753. |
URL: | https://loelschlaeger.de/RprobitB/, https://github.com/loelschlaeger/RprobitB |
BugReports: | https://github.com/loelschlaeger/RprobitB/issues |
License: | GPL-3 |
Encoding: | UTF-8 |
Imports: | checkmate, cli, crayon, doSNOW, foreach, ggplot2, graphics, gridExtra, MASS, mixtools, mvtnorm, oeli (≥ 0.4.1), parallel, plotROC, progress, Rcpp, Rdpack, rlang, stats, utils, viridis |
LinkingTo: | Rcpp, RcppArmadillo |
Suggests: | knitr, mlogit, testthat (≥ 3.0.0) |
Depends: | R (≥ 3.5.0) |
RoxygenNote: | 7.3.1 |
RdMacros: | Rdpack |
VignetteBuilder: | knitr |
LazyData: | true |
LazyDataCompression: | xz |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2024-02-26 10:52:22 UTC; loelschlaeger@ad.uni-bielefeld.de |
Author: | Lennart Oelschläger
|
Maintainer: | Lennart Oelschläger <oelschlaeger.lennart@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-02-26 14:10:06 UTC |
RprobitB: Bayesian Probit Choice Modeling
Description
Bayes estimation of probit choice models, both in the cross-sectional and panel setting. The package can analyze binary, multivariate, ordered, and ranked choices, as well as heterogeneity of choice behavior among deciders. The main functionality includes model fitting via Markov chain Monte Carlo m ethods, tools for convergence diagnostic, choice data simulation, in-sample and out-of-sample choice prediction, and model selection using information criteria and Bayes factors. The latent class model extension facilitates preference-based decider classification, where the number of latent classes can be inferred via the Dirichlet process or a weight-based updating heuristic. This allows for flexible modeling of choice behavior without the need to impose structural constraints. For a reference on the method see Oelschlaeger and Bauer (2021) https://trid.trb.org/view/1759753.
Author(s)
Maintainer: Lennart Oelschläger oelschlaeger.lennart@gmail.com (ORCID)
Authors:
Dietmar Bauer dietmar.bauer@uni-bielefeld.de (ORCID)
Other contributors:
Sebastian Büscher sebastian.buescher@uni-bielefeld.de [contributor]
Manuel Batram manuel.batram@uni-bielefeld.de [contributor]
See Also
Useful links:
Report bugs at https://github.com/loelschlaeger/RprobitB/issues
Compute Gelman-Rubin statistic
Description
This function computes the Gelman-Rubin statistic R_hat
.
Usage
R_hat(samples, parts = 2)
Arguments
samples |
A vector or a matrix of samples from a Markov chain, e.g. Gibbs samples.
If |
parts |
The number of parts to divide each chain into sub-chains. |
Value
A numeric value, the Gelman-Rubin statistic.
References
https://bookdown.org/rdpeng/advstatcomp/monitoring-convergence.html
Examples
no_chains <- 2
length_chains <- 1e3
samples <- matrix(NA_real_, length_chains, no_chains)
samples[1, ] <- 1
Gamma <- matrix(c(0.8, 0.1, 0.2, 0.9), 2, 2)
for (c in 1:no_chains) {
for (t in 2:length_chains) {
samples[t, c] <- sample(1:2, 1, prob = Gamma[samples[t - 1, c], ])
}
}
R_hat(samples)
Create object of class RprobitB_data
Description
This function constructs an object of class RprobitB_data
.
Usage
RprobitB_data(
data,
choice_data,
N,
T,
J,
P_f,
P_r,
alternatives,
ordered,
ranked,
base,
form,
re,
ASC,
effects,
standardize,
simulated,
choice_available,
true_parameter,
res_var_names
)
Arguments
data |
A list with the choice data.
The list has |
choice_data |
A |
N |
The number (greater or equal 1) of decision makers. |
T |
The number (greater or equal 1) of choice occasions or a vector of choice
occasions of length |
J |
The number (greater or equal 2) of choice alternatives. |
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
P_r |
The number of covariates connected to a random coefficient (can be 0). |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
ordered |
A boolean, |
ranked |
TBA |
base |
A character, the name of the base alternative for covariates that are not
alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to
|
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
re |
A character (vector) of covariates of |
ASC |
A boolean, determining whether the model has ASCs. |
effects |
A data frame with the effect names and booleans indicating whether they are connected to random effects. |
standardize |
A character vector of names of covariates that get standardized.
Covariates of type 1 or 3 have to be addressed by
|
simulated |
A boolean, if |
choice_available |
A boolean, if |
true_parameter |
An object of class |
res_var_names |
A names list of reserved variable names in |
Value
An object of class RprobitB_data
with the arguments of this function
as elements.
Create object of class RprobitB_fit
Description
This function creates an object of class RprobitB_fit
.
Usage
RprobitB_fit(
data,
scale,
level,
normalization,
R,
B,
Q,
latent_classes,
prior,
gibbs_samples,
class_sequence,
comp_time
)
## S3 method for class 'RprobitB_fit'
summary(object, FUN = c(mean = mean, sd = stats::sd, `R^` = R_hat), ...)
Arguments
data |
An object of class |
scale |
A character which determines the utility scale. It is of the form
|
normalization |
An object of class |
R |
The number of iterations of the Gibbs sampler. |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Q |
The thinning factor for the Gibbs samples, i.e. only every |
latent_classes |
Either
|
prior |
A named list of parameters for the prior distributions. See the documentation
of |
gibbs_samples |
An object of class |
class_sequence |
The sequence of class numbers during Gibbs sampling of length |
comp_time |
The time spent for Gibbs sampling. |
... |
Currently not used. |
Value
An object of class RprobitB_fit
.
Create object of class RprobitB_gibbs_samples_statistics
Description
This function creates an object of class
RprobitB_gibbs_samples_statistics
.
Usage
RprobitB_gibbs_samples_statistics(gibbs_samples, FUN = list(mean = mean))
Arguments
gibbs_samples |
An object of class |
FUN |
A (preferably named) list of functions that compute parameter statistics from the Gibbs samples, for example
|
Value
An object of class RprobitB_gibbs_samples_statistics
, which is a list
of statistics from gibbs_samples
obtained by applying the elements of
FUN
.
Create object of class RprobitB_latent_classes
Description
This function creates an object of class RprobitB_latent_classes
which
defines the number of latent classes and their updating scheme.
The RprobitB_latent_classes
object generated
by this function is only of relevance if the model possesses at least one
random coefficient, i.e. if P_r>0
.
Usage
RprobitB_latent_classes(latent_classes = NULL)
## S3 method for class 'RprobitB_latent_classes'
print(x, ...)
Arguments
latent_classes |
Either
|
Details
Why update latent classes?
In order not to have to specify the number of latent classes before estimation.
What options to update latent classes exist?
Currently two updating schemes are implemented, weight-based and via a Dirichlet process, see the vignette on modeling heterogeneity.
What is the default behavior?
One latent class without updates is specified per default. Print an
RprobitB_latent_classes
-object to see a summary of all relevant
(default) parameter settings.
Why is Cmax
required?
The implementation requires an upper bound on the number of latent classes
for saving the Gibbs samples. However, this is not a restriction since the
number of latent classes is bounded by the number of deciders in any case.
A plot method for visualizing the sequence of class numbers after estimation
and can be used to check if Cmax
was reached, see
plot.RprobitB_fit
.
Value
An object of class RprobitB_latent_classes
.
Examples
### default setting
RprobitB:::RprobitB_latent_classes()
### setting for a fixed number of two latent classes
RprobitB:::RprobitB_latent_classes(list(C = 2))
### setting for weight-based on Dirichlet process-based updates
RprobitB:::RprobitB_latent_classes(
list("weight_update" = TRUE, "dp_update" = TRUE)
)
Create object of class RprobitB_normalization
Description
This function creates an object of class RprobitB_normalization
,
which determines the utility scale and level.
Usage
RprobitB_normalization(
level,
scale = "Sigma_1,1 := 1",
form,
re = NULL,
alternatives,
base,
ordered = FALSE
)
## S3 method for class 'RprobitB_normalization'
print(x, ...)
Arguments
level |
The alternative name with respect to which utility differences are computed. Currently, only differences with respect to the last alternative can be computed. |
scale |
A character which determines the utility scale. It is of the form
|
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
base |
A character, the name of the base alternative for covariates that are not
alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to
|
ordered |
A boolean, |
x |
An object of class |
... |
Currently not used. |
Details
Any choice model has to be normalized with respect to the utility level and scale.
For level normalization,
{RprobitB}
takes utility differences with respect to one alternative. For the ordered model where only one utility is modeled,{RprobitB}
fixes the first utility threshold to 0.For scale normalization,
{RprobitB}
fixes one model parameter. Per default, the first error-term variance is fixed to1
. This is specified viascale = "Sigma_1,1 := 1"
. Alternatively, any error-term variance or any non-random coefficient can be fixed.
Value
An object of class RprobitB_normalization
, which is a list of
-
level
, a list with the elementslevel
(the number of the alternative specified by the inputlevel
) andname
(the name of the alternative, i.e. the inputlevel
), or alternativelyNA
in the ordered probit case, and
scale
, a list with the elementsparameter
(either"s"
for an element ofSigma
or"a"
for an element ofalpha
), the parameterindex
, and the fixedvalue
. Ifparameter = "a"
, also thename
of the fixed effect.
Examples
RprobitB:::RprobitB_normalization(
level = "B",
scale = "price := -1",
form = choice ~ price + time + comfort + change | 1,
re = "time",
alternatives = c("A", "B"),
base = "A"
)
Define probit model parameter
Description
This function creates an object of class RprobitB_parameter
, which
contains the parameters of a probit model.
If sample = TRUE
, missing parameters are sampled. All parameters are
checked against the values of P_f
, P_r
, J
, and N
.
Usage
RprobitB_parameter(
P_f,
P_r,
J,
N,
ordered = FALSE,
alpha = NULL,
C = NULL,
s = NULL,
b = NULL,
Omega = NULL,
Sigma = NULL,
Sigma_full = NULL,
beta = NULL,
z = NULL,
d = NULL,
seed = NULL,
sample = TRUE
)
Arguments
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
P_r |
The number of covariates connected to a random coefficient (can be 0). |
J |
The number (greater or equal 2) of choice alternatives. |
N |
The number (greater or equal 1) of decision makers. |
ordered |
A boolean, |
alpha |
The fixed coefficient vector of length |
C |
The number (greater or equal 1) of latent classes of decision makers.
Set to |
s |
The vector of class weights of length |
b |
The matrix of class means as columns of dimension |
Omega |
The matrix of class covariance matrices as columns of dimension
|
Sigma |
The differenced error term covariance matrix of dimension
|
Sigma_full |
The error term covariance matrix of dimension |
beta |
The matrix of the decision-maker specific coefficient vectors of dimension
|
z |
The vector of the allocation variables of length |
d |
The numeric vector of the logarithmic increases of the utility thresholds
in the ordered probit case ( |
seed |
Set a seed for the sampling of missing parameters. |
sample |
A boolean, if |
Value
An object of class RprobitB_parameter
, i.e. a named list with the
model parameters alpha
, C
, s
, b
, Omega
,
Sigma
, Sigma_full
, beta
, and z
.
Examples
RprobitB_parameter(P_f = 1, P_r = 2, J = 3, N = 10)
Compute WAIC value
Description
This function computes the WAIC value of an RprobitB_fit
object.
Usage
WAIC(x)
Arguments
x |
An object of class |
Details
WAIC is short for Widely Applicable (or Watanabe-Akaike) Information Criterion. As for AIC and BIC, the smaller the WAIC value the better the model. Its definition is
WAIC = -2 \cdot lppd + 2 \cdot p_{WAIC},
where lppd
stands for log pointwise predictive density and
p_{WAIC}
is a penalty term proportional to the variance in the
posterior distribution that is sometimes called effective number of
parameters.
The lppd
is approximated as follows. Let
p_{is} = \Pr(y_i\mid \theta_s)
be the probability of observation
y_i
given the s
th set \theta_s
of parameter samples from
the posterior. Then
lppd = \sum_i \log S^{-1} \sum_s p_{si}.
The penalty term is computed as the sum over the variances in log-probability for each observation:
p_{WAIC} = \sum_i V_{\theta} \left[ \log p_{si} \right].
Value
A numeric, the WAIC value, with the following attributes:
-
se_waic
, the standard error of the WAIC value, -
lppd
, the log pointwise predictive density, -
p_waic
, the effective number of parameters, -
p_waic_vec
, the vector of summands ofp_waic
, -
p_si
, the output ofcompute_p_si
.
Re-label alternative specific covariates
Description
In {RprobitB}
, alternative specific covariates must be named in the format
"<covariate>_<alternative>"
. This convenience function generates
the format for a given choice_data
set.
Usage
as_cov_names(choice_data, cov, alternatives)
Arguments
choice_data |
A |
cov |
A character vector of the names of alternative specific covariates in
|
alternatives |
A (character or numeric) vector of the alternative names. |
Value
The choice_data
input with updated column names.
Examples
data("Electricity", package = "mlogit")
cov <- c("pf", "cl", "loc", "wk", "tod", "seas")
alternatives <- 1:4
colnames(Electricity)
Electricity <- as_cov_names(Electricity, cov, alternatives)
colnames(Electricity)
Check model formula
Description
This function checks the input form
.
Usage
check_form(form, re = NULL, ordered = FALSE)
Arguments
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
re |
A character (vector) of covariates of |
ordered |
A boolean, |
Value
A list that contains the following elements:
The input
form
.The name
choice
of the dependent variable inform
.The input
re
.A list
vars
of three character vectors of covariate names of the three covariate types.A boolean
ASC
, determining whether the model has ASCs.
See Also
overview_effects()
for an overview of the model effects
Check prior parameters
Description
This function checks the compatibility of submitted parameters for the prior distributions and sets missing values to default values.
Usage
check_prior(
P_f,
P_r,
J,
ordered = FALSE,
eta = numeric(P_f),
Psi = diag(P_f),
delta = 1,
xi = numeric(P_r),
D = diag(P_r),
nu = P_r + 2,
Theta = diag(P_r),
kappa = if (ordered) 4 else (J + 1),
E = if (ordered) diag(1) else diag(J - 1),
zeta = numeric(J - 2),
Z = diag(J - 2)
)
Arguments
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
P_r |
The number of covariates connected to a random coefficient (can be 0). |
J |
The number (greater or equal 2) of choice alternatives. |
ordered |
A boolean, |
eta |
The mean vector of length |
Psi |
The covariance matrix of dimension |
delta |
A numeric for the concentration parameter vector |
xi |
The mean vector of length |
D |
The covariance matrix of dimension |
nu |
The degrees of freedom (a natural number greater than |
Theta |
The scale matrix of dimension |
kappa |
The degrees of freedom (a natural number greater than |
E |
The scale matrix of dimension |
zeta |
The mean vector of length |
Z |
The covariance matrix of dimension |
Details
A priori, we assume that the model parameters follow these distributions:
-
\alpha \sim N(\eta, \Psi)
-
s \sim Dir(\delta)
-
b_c \sim N(\xi, D)
for all classesc
-
\Omega_c \sim IW(\nu,\Theta)
for all classesc
-
\Sigma \sim IW(\kappa,E)
-
d \sim N(\zeta, Z)
where N
denotes the normal, Dir
the Dirichlet, and IW
the Inverted Wishart distribution.
Value
An object of class RprobitB_prior
, which is a list containing all
prior parameters. Parameters that are not relevant for the model
configuration are set to NA
.
Examples
check_prior(P_f = 1, P_r = 2, J = 3, ordered = TRUE)
Compute choice probabilities
Description
This function returns the choice probabilities of an RprobitB_fit
object.
Usage
choice_probabilities(x, data = NULL, par_set = mean)
Arguments
x |
An object of class |
data |
Either |
par_set |
Specifying the parameter set for calculation and either
|
Value
A data frame of choice probabilities with choice situations in rows and
alternatives in columns. The first two columns are the decider identifier
"id"
and the choice situation identifier "idc"
.
Examples
data <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2)
x <- fit_model(data)
choice_probabilities(x)
Classify deciders preference-based
Description
This function classifies the deciders based on their allocation to the components of the mixing distribution.
Usage
classification(x, add_true = FALSE)
Arguments
x |
An object of class |
add_true |
Set to |
Details
The function can only be used if the model has at least one random effect
(i.e. P_r >= 1
) and at least two latent classes (i.e. C >= 2
).
In that case, let z_1,\dots,z_N
denote the class allocations
of the N
deciders based on their estimated mixed coefficients
\beta = (\beta_1,\dots,\beta_N)
.
Independently for each decider n
, the conditional probability
\Pr(z_n = c \mid s,\beta_n,b,\Omega)
of having \beta_n
allocated to class c
for c=1,\dots,C
depends on the class
allocation vector s
, the class means b=(b_c)_c
and the class
covariance matrices Omega=(Omega_c)_c
and is proportional to
s_c \phi(\beta_n \mid b_c,Omega_c).
This function displays the relative frequencies of which each decider was allocated to the classes during the Gibbs sampling. Only the thinned samples after the burn-in period are considered.
Value
A data frame. The row names are the decider ids. The first C
columns
contain the relative frequencies with which the deciders are allocated to
the C
classes. Next, the column est
contains the estimated
class of the decider based on the highest allocation frequency. If
add_true
, the next column true
contains the true class
memberships.
See Also
update_z()
for the updating function of the class allocation vector.
Extract model effects
Description
This function extracts the estimated model effects.
Usage
## S3 method for class 'RprobitB_fit'
coef(object, ...)
Arguments
object |
An object of class |
... |
Ignored. |
Value
An object of class RprobitB_coef
.
Compute probit choice probabilities
Description
This is a helper function for choice_probabilities
and computes
the probit choice probabilities for a single choice situation with J
alternatives.
Usage
compute_choice_probabilities(X, alternatives, parameter, ordered = FALSE)
Arguments
X |
A matrix of covariates with |
alternatives |
A vector with unique integers from |
parameter |
An object of class |
ordered |
A boolean, |
Value
A probability vector of length length(alternatives)
.
Compute choice probabilities at posterior samples
Description
This function computes the probability for each observed choice at the
(normalized, burned and thinned) samples from the posterior. These
probabilities are required to compute the WAIC
and the
marginal model likelihood mml
.
Usage
compute_p_si(x, ncores = parallel::detectCores() - 1, recompute = FALSE)
Arguments
x |
An object of class |
ncores |
This function is parallelized, set the number of cores here. |
recompute |
Set to |
Value
The object x
, including the object p_si
, which is a matrix of
probabilities, observations in rows and posterior samples in columns.
Extract estimated covariance matrix of mixing distribution
Description
This convenience function returns the estimated covariance matrix of the mixing distribution.
Usage
cov_mix(x, cor = FALSE)
Arguments
x |
An object of class |
cor |
If |
Value
The estimated covariance matrix of the mixing distribution. In case of multiple classes, a list of matrices for each class.
Create labels for Omega
Description
This function creates labels for the model parameter Omega
.
Usage
create_labels_Omega(P_r, C, cov_sym)
Arguments
P_r |
The number of covariates connected to a random coefficient (can be 0). |
cov_sym |
Set to |
Details
The labels are of the form "c.p1,p2"
, where c
is the latent class
number and p1,p2
the indeces of two random coefficients.
Value
A vector of labels for the model parameter Omega
of length
P_r^2 * C
if P_r > 0
and cov_sym = TRUE
or of length P_r*(P_r+1)/2*C
if cov_sym = FALSE
and NULL
otherwise.
Examples
RprobitB:::create_labels_Omega(2, 3, cov_sym = TRUE)
RprobitB:::create_labels_Omega(2, 3, cov_sym = FALSE)
Create labels for Sigma
Description
This function creates labels for the model parameter Sigma
.
Usage
create_labels_Sigma(J, cov_sym, ordered = FALSE)
Arguments
J |
The number (greater or equal 2) of choice alternatives. |
cov_sym |
Set to |
ordered |
A boolean, |
Details
The labels are of the form "j1,j2"
, where j1,j2
are indices
of the two alternatives j1
and j2
.
Value
A vector of labels for the model parameter Sigma
of length
(J-1)^2
if cov_sym = TRUE
or of length J*(J-1)/2
if cov_sym = FALSE
.
If ordered = TRUE
, Sigma
has only one element.
Examples
RprobitB:::create_labels_Sigma(3, cov_sym = TRUE)
RprobitB:::create_labels_Sigma(4, cov_sym = FALSE)
RprobitB:::create_labels_Sigma(4, ordered = TRUE)
Create labels for alpha
Description
This function creates labels for the model parameter alpha
.
Usage
create_labels_alpha(P_f)
Arguments
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
Value
A vector of labels for the model parameter alpha
of length P_f
if P_f > 0
and NULL
otherwise.
Examples
RprobitB:::create_labels_alpha(P_f = 3)
Create labels for b
Description
This function creates labels for the model parameter b
.
Usage
create_labels_b(P_r, C)
Arguments
P_r |
The number of covariates connected to a random coefficient (can be 0). |
Details
The labels are of the form "c.p"
, where c
is the latent class
number and p
the index of the random coefficient.
Value
A vector of labels for the model parameter b
of length P_r * C
if P_r > 0
and NULL
otherwise.
Examples
RprobitB:::create_labels_b(2, 3)
Create labels for d
Description
This function creates labels for the model parameter d
.
Usage
create_labels_d(J, ordered)
Arguments
J |
The number (greater or equal 2) of choice alternatives. |
ordered |
A boolean, |
Details
Note that J
must be greater or equal 3
in the ordered probit
model.
Value
A vector of labels for the model parameter d
of length J - 2
if
ordered = TRUE
and NULL
otherwise.
Examples
RprobitB:::create_labels_d(5, TRUE)
Create labels for s
Description
This function creates labels for the model parameter s
.
Usage
create_labels_s(P_r, C)
Arguments
P_r |
The number of covariates connected to a random coefficient (can be 0). |
Value
A vector of labels for the model parameter s
of length C
if
P_r > 0
and NULL
otherwise.
Examples
RprobitB:::create_labels_s(1, 3)
Create lagged choice covariates
Description
This function creates lagged choice covariates from the data.frame
choice_data
, which is assumed to be sorted by the choice occasions.
Usage
create_lagged_cov(choice_data, column, k = 1, id = "id")
Arguments
choice_data |
A |
column |
A character, the column name in |
k |
A positive number, the number of lags (in units of observations), see the
details. Can be a vector. The default is |
id |
A character, the name of the column in |
Details
Say that choice_data
contains the column column
. Then, the
function call
create_lagged_cov(choice_data, column, k, id)
returns the input choice_data
which includes a new column named
column.k
. This column contains for each decider (based on id
)
and each choice occasion the covariate faced before k
choice
occasions. If this data point is not available, it is set to
NA
. In particular, the first k
values of column.k
will
be NA
(initial condition problem).
Value
The input choice_data
with the additional columns named
column.k
for each element column
and each number k
containing the lagged covariates.
Transform threshold increments to thresholds
Description
This helper function transforms the threshold increments d
to the
thresholds gamma
.
Usage
d_to_gamma(d)
Arguments
d |
A numeric vector of threshold increments. |
Details
The threshold vector gamma
is computed from the threshold increments
d
as c(-100,0,cumsum(exp(d)),100)
, where the bounds
-100
and 100
exist for numerical reasons and the first
threshold is fixed to 0
for identification.
Value
A numeric vector of the thresholds.
Examples
d_to_gamma(c(0,0,0))
Density of multivariate normal distribution
Description
This function computes the density of a multivariate normal distribution.
Usage
dmvnorm(x, mean, Sigma, log = FALSE)
Arguments
x |
A quantile vector of length |
mean |
The mean vector of length |
Sigma |
The covariance matrix of dimension |
log |
A boolean, if |
Value
The density value.
Examples
x = c(0,0)
mean = c(0,0)
Sigma = diag(2)
dmvnorm(x = x, mean = mean, Sigma = Sigma)
dmvnorm(x = x, mean = mean, Sigma = Sigma, log = TRUE)
Sample from prior distributions
Description
This function returns a sample from each parameter's prior distribution.
Usage
draw_from_prior(prior, C = 1)
Arguments
prior |
An object of class |
C |
The number of latent classes. |
Value
A list of draws for alpha
, s
, b
, Omega
, and
Sigma
(if specified for the model).
Examples
prior <- check_prior(P_f = 1, P_r = 2, J = 3)
RprobitB:::draw_from_prior(prior, C = 2)
Euclidean distance
Description
This function computes the euclidean distance between two vectors.
Usage
euc_dist(a, b)
Arguments
a |
A numeric vector. |
b |
Another numeric vector of the same length as |
Value
The euclidean distance.
Filter Gibbs samples
Description
This is a helper function that filters Gibbs samples.
Usage
filter_gibbs_samples(
x,
P_f,
P_r,
J,
C,
cov_sym,
ordered = FALSE,
keep_par = c("s", "alpha", "b", "Omega", "Sigma", "d"),
drop_par = NULL
)
Arguments
x |
An object of class |
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
P_r |
The number of covariates connected to a random coefficient (can be 0). |
J |
The number (greater or equal 2) of choice alternatives. |
cov_sym |
Set to |
ordered |
A boolean, |
keep_par |
A vector of parameter names which are kept. |
drop_par |
A vector of parameter names which get dropped. |
Value
An object of class RprobitB_gibbs_samples
filtered by the labels of
parameter_labels
.
Fit probit model to choice data
Description
This function performs Markov chain Monte Carlo simulation for fitting different types of probit models (binary, multivariate, mixed, latent class, ordered, ranked) to discrete choice data.
Usage
fit_model(
data,
scale = "Sigma_1,1 := 1",
R = 1000,
B = R/2,
Q = 1,
print_progress = getOption("RprobitB_progress"),
prior = NULL,
latent_classes = NULL,
seed = NULL,
fixed_parameter = list()
)
Arguments
data |
An object of class |
scale |
A character which determines the utility scale. It is of the form
|
R |
The number of iterations of the Gibbs sampler. |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Q |
The thinning factor for the Gibbs samples, i.e. only every |
print_progress |
A boolean, determining whether to print the Gibbs sampler progress and the estimated remaining computation time. |
prior |
A named list of parameters for the prior distributions. See the documentation
of |
latent_classes |
Either
|
seed |
Set a seed for the Gibbs sampling. |
fixed_parameter |
Optionally specify a named list with fixed parameter values for |
Details
See the vignette on model fitting for more details.
Value
An object of class RprobitB_fit
.
See Also
-
prepare_data()
andsimulate_choices()
for building anRprobitB_data
object -
update()
for estimating nested models -
transform()
for transforming a fitted model
Examples
data <- simulate_choices(
form = choice ~ var | 0, N = 100, T = 10, J = 3, seed = 1
)
model <- fit_model(data = data, R = 1000, seed = 1)
summary(model)
Extract covariates of choice occasion
Description
This convenience function returns the covariates and the choices of specific choice occasions.
Usage
get_cov(x, id, idc, idc_label)
Arguments
x |
Either an object of class |
id |
A numeric (vector), that specifies the decider(s). |
idc |
A numeric (vector), that specifies the choice occasion(s). |
idc_label |
The name of the column that contains the choice occasion identifier. |
Value
A subset of the choice_data
data frame specified in prepare_data()
.
Markov chain Monte Carlo simulation for the probit model
Description
This function draws from the posterior distribution of the probit model via Markov chain Monte Carlo simulation-
Usage
gibbs_sampling(
sufficient_statistics,
prior,
latent_classes,
fixed_parameter,
init,
R,
B,
print_progress,
ordered,
ranked
)
Arguments
sufficient_statistics |
The output of |
prior |
A named list of parameters for the prior distributions. See the documentation
of |
latent_classes |
Either
|
fixed_parameter |
Optionally specify a named list with fixed parameter values for |
init |
The output of |
R |
The number of iterations of the Gibbs sampler. |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
print_progress |
A boolean, determining whether to print the Gibbs sampler progress and the estimated remaining computation time. |
ordered |
A boolean, |
ranked |
TBA |
Details
This function is not supposed to be called directly, but rather via
fit_model
.
Value
A list of Gibbs samples for
-
Sigma
, -
alpha
(ifP_f>0
), -
s
,z
,b
,Omega
(ifP_r>0
), -
d
(ifordered = TRUE
),
and a vector class_sequence
of length R
, where the r
th
entry is the number of latent classes after iteration r
.
Log-likelihood in the ordered probit model
Description
This function computes the log-likelihood value given the threshold
increments d
.
Usage
ll_ordered(d, y, mu, Tvec)
Arguments
d |
A numeric vector of threshold increments. |
y |
A matrix of the choices. |
mu |
A matrix of the systematic utilities. |
Tvec |
The element |
Value
The log-likelihood value.
Examples
ll_ordered(c(0,0,0), matrix(1), matrix(1), 1)
Handle missing covariates
Description
This function checks for and replaces missing covariate entries in
choice_data
.
Usage
missing_covariates(
choice_data,
impute = "complete_cases",
col_ignore = character()
)
Arguments
choice_data |
A |
impute |
A character that specifies how to handle missing covariate entries in
|
col_ignore |
A character vector of columns that are ignored (none per default). |
Value
The input choice_data
, in which missing covariates are addressed.
Approximate marginal model likelihood
Description
This function approximates the model's marginal likelihood.
Usage
mml(x, S = 0, ncores = parallel::detectCores() - 1, recompute = FALSE)
Arguments
x |
An object of class |
S |
The number of prior samples for the prior arithmetic mean estimate. Per
default, |
ncores |
Computation of the prior arithmetic mean estimate is parallelized, set the number of cores. |
recompute |
Set to |
Details
The model's marginal likelihood p(y\mid M)
for a model M
and data
y
is required for the computation of Bayes factors. In general, the
term has no closed form and must be approximated numerically.
This function uses the posterior Gibbs samples to approximate the model's
marginal likelihood via the posterior harmonic mean estimator.
To check the convergence, call plot(x$mml)
, where x
is the output
of this function. If the estimation does not seem to have
converged, you can improve the approximation by combining the value
with the prior arithmetic mean estimator. The final approximation of the
model's marginal likelihood than is a weighted sum of the posterior harmonic
mean estimate and the prior arithmetic mean estimate,
where the weights are determined by the sample sizes.
Value
The object x
, including the object mml
, which is the model's
approximated marginal likelihood value.
Compare fitted models
Description
This function returns a table with several criteria for model comparison.
Usage
model_selection(
...,
criteria = c("npar", "LL", "AIC", "BIC"),
add_form = FALSE
)
Arguments
... |
One or more objects of class |
criteria |
A vector of one or more of the following characters:
|
add_form |
Set to |
Details
See the vignette on model selection for more details.
Value
A data frame, criteria in columns, models in rows.
Extract number of model parameters
Description
This function extracts the number of model parameters of an
RprobitB_fit
object.
Usage
npar(object, ...)
## S3 method for class 'RprobitB_fit'
npar(object, ...)
Arguments
object |
An object of class |
... |
Optionally more objects of class |
Value
Either a numeric value (if just one object is provided) or a numeric vector.
Print effect overview
Description
This function gives an overview of the effect names, whether the covariate is alternative-specific, whether the coefficient is alternative-specific, and whether it is a random effect.
Usage
overview_effects(
form,
re = NULL,
alternatives,
base = tail(alternatives, 1),
ordered = FALSE
)
Arguments
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
base |
A character, the name of the base alternative for covariates that are not
alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to
|
ordered |
A boolean, |
Value
A data frame, each row is a effect, columns are the effect name
"effect"
, and booleans whether the covariate is alternative-specific
"as_value"
, whether the coefficient is alternative-specific
"as_coef"
, and whether it is a random effect "random"
.
See Also
check_form()
for checking the model formula specification.
Examples
overview_effects(
form = choice ~ price + time + comfort + change | 1,
re = c("price", "time"),
alternatives = c("A", "B"),
base = "A"
)
Create parameters labels
Description
This function creates model parameter labels.
Usage
parameter_labels(
P_f,
P_r,
J,
C,
cov_sym,
ordered = FALSE,
keep_par = c("s", "alpha", "b", "Omega", "Sigma", "d"),
drop_par = NULL
)
Arguments
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
P_r |
The number of covariates connected to a random coefficient (can be 0). |
J |
The number (greater or equal 2) of choice alternatives. |
cov_sym |
Set to |
ordered |
A boolean, |
keep_par |
A vector of parameter names which are kept. |
drop_par |
A vector of parameter names which get dropped. |
Value
A list of labels for the selected model parameters.
Visualize choice data
Description
This function is the plot method for an object of class RprobitB_data
.
Usage
## S3 method for class 'RprobitB_data'
plot(x, by_choice = FALSE, alpha = 1, position = "dodge", ...)
Arguments
x |
An object of class |
by_choice |
Set to |
alpha , position |
Passed to |
... |
Ignored. |
Value
No return value. Draws a plot to the current device.
Examples
data <- simulate_choices(
form = choice ~ cost | 0,
N = 100,
T = 10,
J = 2,
alternatives = c("bus", "car"),
true_parameter = list("alpha" = -1)
)
plot(data, by_choice = TRUE)
Visualize fitted probit model
Description
This function is the plot method for an object of class RprobitB_fit
.
Usage
## S3 method for class 'RprobitB_fit'
plot(x, type, ignore = NULL, ...)
Arguments
x |
An object of class |
type |
The type of plot, which can be one of:
See the details section for visualization options. |
ignore |
A character (vector) of covariate or parameter names that do not get visualized. |
... |
Ignored. |
Value
No return value. Draws a plot to the current device.
Autocorrelation plot of Gibbs samples
Description
This function plots the autocorrelation of the Gibbs samples. The plots
include the total Gibbs sample size TSS
and the effective sample size
ESS
, see the details.
Usage
plot_acf(gibbs_samples, par_labels)
Arguments
gibbs_samples |
A matrix of Gibbs samples. |
par_labels |
A character vector with labels for the Gibbs samples, of length equal to the
number of columns of |
Details
The effective sample size is the value
TSS / \sqrt{1 + 2\sum_{k\geq 1} \rho_k}
,
where \rho_k
is the auto correlation between the chain offset by
k
positions. The auto correlations are estimated via
spec.ar
.
Value
No return value. Draws a plot to the current device.
Plot class allocation (for P_r = 2
only)
Description
This function plots the allocation of decision-maker specific coefficient vectors
beta
given the allocation vector z
, the class means b
,
and the class covariance matrices Omega
.
Usage
plot_class_allocation(beta, z, b, Omega, ...)
Arguments
beta |
The matrix of the decision-maker specific coefficient vectors of dimension
|
z |
The vector of the allocation variables of length |
b |
The matrix of class means as columns of dimension |
Omega |
The matrix of class covariance matrices as columns of dimension
|
... |
Optional visualization parameters:
|
Details
Only applicable in the two-dimensional case, i.e. only if P_r = 2
.
Value
No return value. Draws a plot to the current device.
Visualizing the number of classes during Gibbs sampling
Description
This function plots the number of latent Glasses during Gibbs sampling to visualize the class updating.
Usage
plot_class_seq(class_sequence, B)
Arguments
class_sequence |
The sequence of class numbers during Gibbs sampling of length |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Value
No return value. Draws a plot to the current device.
Plot bivariate contour of mixing distributions
Description
This function plots an estimated bivariate contour mixing distributions.
Usage
plot_mixture_contour(means, covs, weights, names)
Arguments
means |
The class means. |
covs |
The class covariances. |
weights |
The class weights. |
names |
The covariate names. |
Value
An object of class ggplot
.
Plot marginal mixing distributions
Description
This function plots an estimated marginal mixing distributions.
Usage
plot_mixture_marginal(mean, cov, weights, name)
Arguments
mean |
The class means. |
cov |
The class covariances. |
weights |
The class weights. |
name |
The covariate name. |
Value
An object of class ggplot
.
Plot ROC curve
Description
This function draws receiver operating characteristic (ROC) curves.
Usage
plot_roc(..., reference = NULL)
Arguments
... |
One or more |
reference |
The reference alternative. |
Value
No return value. Draws a plot to the current device.
Visualizing the trace of Gibbs samples.
Description
This function plots traces of the Gibbs samples.
Usage
plot_trace(gibbs_samples, par_labels)
Arguments
gibbs_samples |
A matrix of Gibbs samples. |
par_labels |
A character vector of length equal to the number of columns of
|
Value
No return value. Draws a plot to the current device.
Compute point estimates
Description
This function computes the point estimates of an RprobitB_fit
.
Per default, the mean
of the Gibbs samples is used as a point estimate.
However, any statistic that computes a single numeric value out of a vector of
Gibbs samples can be specified for FUN
.
Usage
point_estimates(x, FUN = mean)
Arguments
x |
An object of class |
FUN |
A function that computes a single numeric value out of a vector of numeric values. |
Value
An object of class RprobitB_parameter
.
Examples
data <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2)
model <- fit_model(data)
point_estimates(model)
point_estimates(model, FUN = median)
Parameter sets from posterior samples
Description
This function builds parameter sets from the normalized, burned and thinned posterior samples.
Usage
posterior_pars(x)
Arguments
x |
An object of class |
Value
A list of RprobitB_parameter
objects.
Compute prediction accuracy
Description
This function computes the prediction accuracy of an RprobitB_fit
object. Prediction accuracy means the share of choices that are correctly
predicted by the model, where prediction is based on the maximum choice
probability.
Usage
pred_acc(x, ...)
Arguments
x |
An object of class |
... |
Optionally specify more |
Value
A numeric.
Predict choices
Description
This function predicts the discrete choice behavior
Usage
## S3 method for class 'RprobitB_fit'
predict(object, data = NULL, overview = TRUE, digits = 2, ...)
Arguments
object |
An object of class |
data |
Either
|
overview |
If |
digits |
The number of digits of the returned choice probabilities. |
... |
Ignored. |
Details
Predictions are made based on the maximum predicted probability for each choice alternative. See the vignette on choice prediction for a demonstration on how to visualize the model's sensitivity and specificity by means of a receiver operating characteristic (ROC) curve.
Value
Either a table if overview = TRUE
or a data frame otherwise.
Examples
data <- simulate_choices(
form = choice ~ cov, N = 10, T = 10, J = 2, seed = 1
)
data <- train_test(data, test_proportion = 0.5)
model <- fit_model(data$train)
predict(model)
predict(model, overview = FALSE)
predict(model, data = data$test)
predict(
model,
data = data.frame("cov_A" = c(1, 1, NA, NA), "cov_B" = c(1, NA, 1, NA)),
overview = FALSE
)
Check for flip in preferences after change in model scale.
Description
This function checks if a change in the model scale flipped the preferences.
Usage
preference_flip(model_old, model_new)
Arguments
model_old |
An object of class |
model_new |
An object of class |
Value
No return value, called for side-effects.
Prepare choice data for estimation
Description
This function prepares choice data for estimation.
Usage
prepare_data(
form,
choice_data,
re = NULL,
alternatives = NULL,
ordered = FALSE,
ranked = FALSE,
base = NULL,
id = "id",
idc = NULL,
standardize = NULL,
impute = "complete_cases"
)
Arguments
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
choice_data |
A |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
ordered |
A boolean, |
ranked |
TBA |
base |
A character, the name of the base alternative for covariates that are not
alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to
|
id |
A character, the name of the column in |
idc |
A character, the name of the column in |
standardize |
A character vector of names of covariates that get standardized.
Covariates of type 1 or 3 have to be addressed by
|
impute |
A character that specifies how to handle missing covariate entries in
|
Details
Requirements for the data.frame
choice_data
:
It must contain a column named
id
which contains unique identifier for each decision maker.It can contain a column named
idc
which contains unique identifier for each choice situation of each decision maker. If this information is missing, these identifier are generated automatically by the appearance of the choices in the data set.It can contain a column named
choice
with the observed choices, wherechoice
must match the name of the dependent variable inform
. Such a column is required for model fitting but not for prediction.It must contain a numeric column named p_j for each alternative specific covariate p in
form
and each choice alternative j inalternatives
.It must contain a numeric column named q for each covariate q in
form
that is constant across alternatives.
In the ordered case (ordered = TRUE
), the column choice
must
contain the full ranking of the alternatives in each choice occasion as a
character, where the alternatives are separated by commas, see the examples.
See the vignette on choice data for more details.
Value
An object of class RprobitB_data
.
See Also
-
check_form()
for checking the model formula -
overview_effects()
for an overview of the model effects -
create_lagged_cov()
for creating lagged covariates -
as_cov_names()
for re-labeling alternative-specific covariates -
simulate_choices()
for simulating choice data -
train_test()
for splitting choice data into a train and test subset
Examples
data <- prepare_data(
form = choice ~ price + time + comfort + change | 0,
choice_data = train_choice,
re = c("price", "time"),
id = "deciderID",
idc = "occasionID",
standardize = c("price", "time")
)
### ranked case
choice_data <- data.frame(
"id" = 1:3, "choice" = c("A,B,C", "A,C,B", "B,C,A"), "cov" = 1
)
data <- prepare_data(
form = choice ~ 0 | cov + 0,
choice_data = choice_data,
ranked = TRUE
)
Draw from Dirichlet distribution
Description
Function to draw from a Dirichlet distribution.
Usage
rdirichlet(delta)
Arguments
delta |
A vector, the concentration parameter. |
Value
A vector, the sample from the Dirichlet distribution of the same length as delta
.
Examples
rdirichlet(delta = 1:3)
Draw from multivariate normal distribution
Description
This function draws from a multivariate normal distribution.
Usage
rmvnorm(mu, Sigma)
Arguments
mu |
The mean vector of length |
Sigma |
The covariance matrix of dimension |
Details
The function builds upon the following fact: If \epsilon = (\epsilon_1,\dots,\epsilon_n)
,
where each \epsilon_i
is drawn independently from a standard normal distribution,
then \mu+L\epsilon
is a draw from the multivariate normal distribution
N(\mu,\Sigma)
, where L
is the lower triangular factor of the
Choleski decomposition of \Sigma
.
Value
A numeric vector of length n
.
Examples
mu <- c(0,0)
Sigma <- diag(2)
rmvnorm(mu = mu, Sigma = Sigma)
Draw from one-sided truncated normal
Description
This function draws from a one-sided truncated univariate normal distribution.
Usage
rtnorm(mu, sig, trunpt, above)
Arguments
mu |
The mean. |
sig |
The standard deviation. |
trunpt |
The truncation point. |
above |
A boolean, if |
Value
A numeric value.
Examples
### samples from a standard normal truncated at 1 from above
R <- 1e4
draws <- replicate(R, rtnorm(0,1,1,TRUE))
plot(density(draws))
Draw from two-sided truncated normal
Description
This function draws from a two-sided truncated univariate normal distribution.
Usage
rttnorm(mu, sig, lower_bound, upper_bound)
Arguments
mu |
The mean. |
sig |
The standard deviation. |
lower_bound |
The lower truncation point. |
upper_bound |
The upper truncation point. |
Value
A numeric value.
Examples
### samples from a standard normal truncated at -2 and 2
R <- 1e4
draws <- replicate(R, rttnorm(0,1,-2,2))
plot(density(draws))
Draw from Wishart distribution
Description
This function draws from a Wishart and inverted Wishart distribution.
Usage
rwishart(nu, V)
Arguments
nu |
A numeric, the degrees of freedom. Must be at least the number of dimensions. |
V |
A matrix, the scale matrix. |
Details
The Wishart distribution is a generalization to multiple dimensions of the
gamma distributions and draws from the space of covariance matrices.
Its expectation is nu*V
and its variance increases both in nu
and in the values of V
.
The Wishart distribution is the conjugate prior to the precision matrix of
a multivariate normal distribution and proper if nu
is greater than
the number of dimensions.
Value
A list, the draws from the Wishart (W
), inverted Wishart (IW
), and
corresponding Choleski decomposition (C
and CI
).
Examples
rwishart(nu = 2, V = diag(2))
Set initial values for the Gibbs sampler
Description
This function sets initial values for the Gibbs sampler.
Usage
set_initial_gibbs_values(
N,
T,
J,
P_f,
P_r,
C,
ordered = FALSE,
ranked = FALSE,
suff_stat = NULL
)
Arguments
N |
The number (greater or equal 1) of decision makers. |
T |
The number (greater or equal 1) of choice occasions or a vector of choice
occasions of length |
J |
The number (greater or equal 2) of choice alternatives. |
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
P_r |
The number of covariates connected to a random coefficient (can be 0). |
C |
The number (greater or equal 1) of latent classes. |
ordered |
A boolean, |
ranked |
TBA |
suff_stat |
Optionally the output of |
Value
A list of initial values for the Gibbs sampler.
Examples
RprobitB:::set_initial_gibbs_values(
N = 2, T = 3, J = 3, P_f = 1, P_r = 2, C = 2
)
Simulate choice data
Description
This function simulates choice data from a probit model.
Usage
simulate_choices(
form,
N,
T = 1,
J,
re = NULL,
alternatives = NULL,
ordered = FALSE,
ranked = FALSE,
base = NULL,
covariates = NULL,
seed = NULL,
true_parameter = list()
)
Arguments
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
N |
The number (greater or equal 1) of decision makers. |
T |
The number (greater or equal 1) of choice occasions or a vector of choice
occasions of length |
J |
The number (greater or equal 2) of choice alternatives. |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
ordered |
A boolean, |
ranked |
TBA |
base |
A character, the name of the base alternative for covariates that are not
alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to
|
covariates |
A named list of covariate values. Each element must be a vector of length equal to the number of choice occasions and named according to a covariate. Covariates for which no values are supplied are drawn from a standard normal distribution. |
seed |
Set a seed for the simulation. |
true_parameter |
Optionally specify a named list with true parameter values for |
Details
See the vignette on choice data for more details.
Value
An object of class RprobitB_data
.
See Also
-
check_form()
for checking the model formula -
overview_effects()
for an overview of the model effects -
create_lagged_cov()
for creating lagged covariates -
as_cov_names()
for re-labeling alternative-specific covariates -
prepare_data()
for preparing empirical choice data -
train_test()
for splitting choice data into a train and test subset
Examples
### simulate data from a binary probit model with two latent classes
data <- simulate_choices(
form = choice ~ cost | income | time,
N = 100,
T = 10,
J = 2,
re = c("cost", "time"),
alternatives = c("car", "bus"),
seed = 1,
true_parameter = list(
"alpha" = c(-1, 1),
"b" = matrix(c(-1, -1, -0.5, -1.5, 0, -1), ncol = 2),
"C" = 2
)
)
### simulate data from an ordered probit model
data <- simulate_choices(
form = opinion ~ age + gender,
N = 10,
T = 1:10,
J = 5,
alternatives = c("very bad", "bad", "indifferent", "good", "very good"),
ordered = TRUE,
covariates = list(
"gender" = rep(sample(c(0, 1), 10, replace = TRUE), times = 1:10)
),
seed = 1
)
### simulate data from a ranked probit model
data <- simulate_choices(
form = product ~ price,
N = 10,
T = 1:10,
J = 3,
alternatives = c("A", "B", "C"),
ranked = TRUE,
seed = 1
)
Compute sufficient statistics
Description
This function computes sufficient statistics from an RprobitB_data
object for the Gibbs sampler to save computation time.
Usage
sufficient_statistics(data, normalization)
Arguments
data |
An object of class |
normalization |
An object of class |
Value
A list of sufficient statistics on the data for Gibbs sampling, containing
the elements
N
,T
,J
,P_f
andP_r
fromdata
,-
Tvec
, the vector of choice occasions for each decider of lengthN
, -
csTvec
, a vector of lengthN
with the cumulated sums ofTvec
starting from0
, -
W
, a list of design matrices differenced with respect to alternative numbernormalization$level$level
for each decider in each choice occasion with covariates that are linked to a fixed coefficient (orNA
ifP_f = 0
), -
X
, a list of design matrices differenced with respect to alternative numbernormalization$level$level
for each decider in each choice occasion with covariates that are linked to a random coefficient (orNA
ifP_r = 0
), -
y
, a matrix of dimensionN
xmax(Tvec)
with the observed choices of deciders in rows and choice occasions in columns, decoded to numeric values with respect to their appearance indata$alternatives
, where rows are filled withNA
in case of an unbalanced panel, -
WkW
, a matrix of dimensionP_f^2
x(J-1)^2
, the sum over Kronecker products of each transposed element inW
with itself, -
XkX
, a list of lengthN
, each element is constructed in the same way asWkW
but with the elements inX
and separately for each decider, -
rdiff
(for the ranked case only), a list of matrices that reverse the base differencing and instead difference in such a way that the resulting utility vector is negative.
Stated Preferences for Train Traveling
Description
Data set of 2929 stated choices by 235 Dutch individuals deciding between
two virtual train trip options "A"
and "B"
based on the price,
the travel time, the number of rail-to-rail transfers (changes), and the
level of comfort.
The data were obtained in 1987 by Hague Consulting Group for the National Dutch Railways. Prices were recorded in Dutch guilder and in this data set transformed to Euro at an exchange rate of 2.20371 guilders = 1 Euro.
Usage
train_choice
Format
A data.frame
with 2929 rows and 11 columns:
- deciderID
integer
identifier for the decider- occasionID
integer
identifier for the choice occasion- choice
character
for the chosen alternative (either"A"
or"B"
)- price_A
numeric
price for alternative"A"
in Euro- time_A
numeric
travel time for alternative"A"
in hours- change_A
integer
number of changes for alternative"A"
- comfort_A
integer
comfort level (in decreasing comfort order) for alternative"A"
- price_B
numeric
price for alternative"B"
in Euro- time_B
numeric
travel time for alternative"B"
in hours- change_B
integer
number of changes for alternative"B"
- comfort_B
integer
comfort level (in decreasing comfort order) for alternative"B"
References
Ben-Akiva M, Bolduc D, Bradley M (1993). “Estimation of travel choice models with randomly distributed values of time.” Transportation Research Record, 1413, 88–97.
Meijer E, Rouwendal J (2006). “Measuring welfare effects in models with random coefficients.” Journal of Applied Econometrics, 21(2), 227–244.
Split choice data in train and test subset
Description
This function splits choice data into a train and a test part.
Usage
train_test(
x,
test_proportion = NULL,
test_number = NULL,
by = "N",
random = FALSE,
seed = NULL
)
Arguments
x |
An object of class |
test_proportion |
A number between 0 and 1, the proportion of the test subsample. |
test_number |
A positive integer, the number of observations in the test subsample. |
by |
One of |
random |
If |
seed |
Set a seed for building the subsamples randomly. |
Details
See the vignette on choice data for more details.
Value
A list with two objects of class RprobitB_data
, named "train"
and "test"
.
Examples
### simulate choices for demonstration
x <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2)
### 70% of deciders in the train subsample,
### 30% of deciders in the test subsample
train_test(x, test_proportion = 0.3, by = "N")
### 2 randomly chosen choice occasions per decider in the test subsample,
### the rest in the train subsample
train_test(x, test_number = 2, by = "T", random = TRUE, seed = 1)
Transform fitted probit model
Description
Given an object of class RprobitB_fit
, this function can:
change the length
B
of the burn-in period,change the the thinning factor
Q
of the Gibbs samples,change the utility
scale
.
Usage
## S3 method for class 'RprobitB_fit'
transform(
`_data`,
B = NULL,
Q = NULL,
scale = NULL,
check_preference_flip = TRUE,
...
)
Arguments
_data |
An object of class |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Q |
The thinning factor for the Gibbs samples, i.e. only every |
scale |
A character which determines the utility scale. It is of the form
|
check_preference_flip |
Set to |
... |
Ignored. |
Details
See the vignette "Model fitting" for more details:
vignette("v03_model_fitting", package = "RprobitB")
.
Value
The transformed RprobitB_fit
object.
Transformation of Gibbs samples
Description
This function normalizes, burns and thins the Gibbs samples.
Usage
transform_gibbs_samples(gibbs_samples, R, B, Q, normalization)
Arguments
gibbs_samples |
The output of
|
R |
The number of iterations of the Gibbs sampler. |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Q |
The thinning factor for the Gibbs samples, i.e. only every |
normalization |
An object of class |
Value
A list, the first element gibbs_sampes_raw
is the input
gibbs_samples
, the second element is the normalized, burned, and
thinned version of gibbs_samples
called gibbs_samples_nbt
.
The list gets the class RprobitB_gibbs_samples
.
Transformation of parameter values
Description
This function transforms parameter values based on normalization
.
Usage
transform_parameter(parameter, normalization, ordered = FALSE)
Arguments
parameter |
An object of class |
normalization |
An object of class |
ordered |
A boolean, |
Value
An object of class RprobitB_parameter
.
Transform differenced to non-differenced error term covariance matrix
Description
This function transforms the differenced error term covariance matrix
Sigma
back to a non-differenced error term covariance matrix.
Usage
undiff_Sigma(Sigma, i, checks = TRUE, pos = TRUE, labels = TRUE)
Arguments
Sigma |
An error term covariance matrix of dimension |
i |
An integer, the alternative number with respect to which |
checks |
If |
pos |
If |
labels |
If |
Value
A covariance matrix of dimension J
x J
. If this covariance
matrix gets differenced with respect to alternative i
, the results is
again Sigma
.
Update and re-fit probit model
Description
This function estimates a nested probit model based on a given
RprobitB_fit
object.
Usage
## S3 method for class 'RprobitB_fit'
update(
object,
form,
re,
alternatives,
id,
idc,
standardize,
impute,
scale,
R,
B,
Q,
print_progress,
prior,
latent_classes,
seed,
...
)
Arguments
object |
An object of class |
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
id |
A character, the name of the column in |
idc |
A character, the name of the column in |
standardize |
A character vector of names of covariates that get standardized.
Covariates of type 1 or 3 have to be addressed by
|
impute |
A character that specifies how to handle missing covariate entries in
|
scale |
A character which determines the utility scale. It is of the form
|
R |
The number of iterations of the Gibbs sampler. |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Q |
The thinning factor for the Gibbs samples, i.e. only every |
print_progress |
A boolean, determining whether to print the Gibbs sampler progress and the estimated remaining computation time. |
prior |
A named list of parameters for the prior distributions. See the documentation
of |
latent_classes |
Either
|
seed |
Set a seed for the Gibbs sampling. |
... |
Ignored. |
Details
All parameters (except for object
) are optional and if not specified
retrieved from the specification for object
.
Value
An object of class RprobitB_fit
.
Update class covariances
Description
This function updates the class covariances (independent from the other classes).
Usage
update_Omega(beta, b, z, m, nu, Theta)
Arguments
beta |
The matrix of the decision-maker specific coefficient vectors of dimension
|
b |
The matrix of class means as columns of dimension |
z |
The vector of the allocation variables of length |
m |
The vector of class sizes of length |
nu |
The degrees of freedom (a natural number greater than |
Theta |
The scale matrix of dimension |
Details
The following holds independently for each class c
.
Let \Omega_c
be the covariance matrix of class number c
.
A priori, we assume that \Omega_c
is inverse Wishart distributed
with \nu
degrees of freedom and scale matrix \Theta
.
Let (\beta_n)_{z_n=c}
be the collection of \beta_n
that are currently allocated to class c
,
m_c
the size of class c
, and b_c
the class mean vector.
Due to the conjugacy of the prior, the posterior \Pr(\Omega_c \mid (\beta_n)_{z_n=c})
follows an inverted Wishart distribution
with \nu + m_c
degrees of freedom and scale matrix \Theta^{-1} + \sum_n (\beta_n - b_c)(\beta_n - b_c)'
, where
the product is over the values n
for which z_n=c
holds.
Value
A matrix of updated covariance matrices for each class in columns.
Examples
### N = 100 decider, P_r = 2 random coefficients, and C = 2 latent classes
N <- 100
b <- cbind(c(0,0),c(1,1))
(Omega_true <- matrix(c(1,0.3,0.3,0.5,1,-0.3,-0.3,0.8), ncol=2))
z <- c(rep(1,N/2),rep(2,N/2))
m <- as.numeric(table(z))
beta <- sapply(z, function(z) rmvnorm(b[,z], matrix(Omega_true[,z],2,2)))
### degrees of freedom and scale matrix for the Wishart prior
nu <- 1
Theta <- diag(2)
### updated class covariance matrices (in columns)
update_Omega(beta = beta, b = b, z = z, m = m, nu = nu, Theta = Theta)
Update error term covariance matrix of multiple linear regression
Description
This function updates the error term covariance matrix of a multiple linear regression.
Usage
update_Sigma(kappa, E, N, S)
Arguments
kappa |
The degrees of freedom (a natural number greater than |
E |
The scale matrix of dimension |
N |
The draw size. |
S |
A matrix, the sum over the outer products of the residuals |
Details
This function draws from the posterior distribution of the covariance matrix \Sigma
in the linear utility
equation
U_n = X_n\beta + \epsilon_n,
where U_n
is the
(latent, but here assumed to be known) utility vector of decider n = 1,\dots,N
, X_n
is the design matrix build from the choice characteristics faced by n
,
\beta
is the coefficient vector, and \epsilon_n
is the error term assumed to be
normally distributed with mean 0
and unknown covariance matrix \Sigma
.
A priori we assume the (conjugate) Inverse Wishart distribution
\Sigma \sim W(\kappa,E)
with \kappa
degrees of freedom and scale matrix E
.
The posterior for \Sigma
is the Inverted Wishart distribution with \kappa + N
degrees of freedom
and scale matrix E^{-1}+S
, where S = \sum_{n=1}^{N} \epsilon_n \epsilon_n'
is the sum over
the outer products of the residuals (\epsilon_n = U_n - X_n\beta)_n
.
Value
A matrix, a draw from the Inverse Wishart posterior distribution of the error term covariance matrix in a multiple linear regression.
Examples
### true error term covariance matrix
(Sigma_true <- matrix(c(1,0.5,0.2,0.5,1,0.2,0.2,0.2,2), ncol=3))
### coefficient vector
beta <- matrix(c(-1,1), ncol=1)
### draw data
N <- 100
X <- replicate(N, matrix(rnorm(6), ncol=2), simplify = FALSE)
eps <- replicate(N, rmvnorm(mu = c(0,0,0), Sigma = Sigma_true), simplify = FALSE)
U <- mapply(function(X, eps) X %*% beta + eps, X, eps, SIMPLIFY = FALSE)
### prior parameters for covariance matrix
kappa <- 4
E <- diag(3)
### draw from posterior of coefficient vector
outer_prod <- function(X, U) (U - X %*% beta) %*% t(U - X %*% beta)
S <- Reduce(`+`, mapply(outer_prod, X, U, SIMPLIFY = FALSE))
Sigma_draws <- replicate(100, update_Sigma(kappa, E, N, S))
apply(Sigma_draws, 1:2, mean)
apply(Sigma_draws, 1:2, stats::sd)
Update latent utility vector
Description
This function updates the latent utility vector, where (independent across deciders and choice occasions) the utility for each alternative is updated conditional on the other utilities.
Usage
update_U(U, y, sys, Sigmainv)
Arguments
U |
The current utility vector of length |
y |
An integer from |
sys |
A vector of length |
Sigmainv |
The inverted error term covariance matrix of dimension |
Details
The key ingredient to Gibbs sampling for probit models is considering the latent utilities
as parameters themselves which can be updated (data augmentation).
Independently for all deciders n=1,\dots,N
and choice occasions t=1,...,T_n
,
the utility vectors (U_{nt})_{n,t}
in the linear utility equation U_{nt} = X_{nt} \beta + \epsilon_{nt}
follow a J-1
-dimensional truncated normal distribution, where J
is the number of alternatives,
X_{nt} \beta
the systematic (i.e. non-random) part of the utility and \epsilon_{nt} \sim N(0,\Sigma)
the error term.
The truncation points are determined by the choices y_{nt}
. To draw from a truncated multivariate
normal distribution, this function implemented the approach of Geweke (1998) to conditionally draw each component
separately from a univariate truncated normal distribution. See Oelschläger (2020) for the concrete formulas.
Value
An updated utility vector of length J-1
.
References
See Geweke (1998) Efficient Simulation from the Multivariate Normal and Student-t Distributions Subject to Linear Constraints and the Evaluation of Constraint Probabilities for Gibbs sampling from a truncated multivariate normal distribution. See Oelschläger and Bauer (2020) Bayes Estimation of Latent Class Mixed Multinomial Probit Models for its application to probit utilities.
Examples
U <- c(0,0,0)
y <- 3
sys <- c(0,0,0)
Sigmainv <- solve(diag(3))
update_U(U, y, sys, Sigmainv)
Update latent utility vector in the ranked probit case
Description
This function updates the latent utility vector in the ranked probit case.
Usage
update_U_ranked(U, sys, Sigmainv)
Arguments
U |
The current utility vector of length |
sys |
A vector of length |
Sigmainv |
The inverted error term covariance matrix of dimension
|
Details
This update is basically the same as in the non-ranked case, despite that the truncation point is zero.
Value
An updated utility vector of length J-1
.
Examples
U <- c(0,0)
sys <- c(0,0)
Sigmainv <- diag(2)
update_U_ranked(U, sys, Sigmainv)
Update class means
Description
This function updates the class means (independent from the other classes).
Usage
update_b(beta, Omega, z, m, xi, Dinv)
Arguments
beta |
The matrix of the decision-maker specific coefficient vectors of dimension
|
Omega |
The matrix of class covariance matrices as columns of dimension
|
z |
The vector of the allocation variables of length |
m |
The vector of class sizes of length |
xi |
The mean vector of length |
Dinv |
The precision matrix (i.e. the inverse of the covariance matrix) of dimension |
Details
The following holds independently for each class c
.
Let b_c
be the mean of class number c
. A priori, we assume that b_c
is normally distributed
with mean vector \xi
and covariance matrix D
.
Let (\beta_n)_{z_n=c}
be the collection of \beta_n
that are currently allocated to class c
,
m_c
the class size, and \bar{b}_c
their arithmetic mean.
Assuming independence across draws, (\beta_n)_{z_n=c}
has
a normal likelihood of
\prod_n \phi(\beta_n \mid b_c,\Omega_c),
where the product is over the values n
for which z_n=c
holds.
Due to the conjugacy of the prior, the posterior \Pr(b_c \mid (\beta_n)_{z_n=c})
follows a normal distribution
with mean
(D^{-1} + m_c\Omega_c^{-1})^{-1}(D^{-1}\xi + m_c\Omega_c^{-1}\bar{b}_c)
and covariance matrix
(D^{-1} + m_c \Omega_c^{-1})^{-1}.
Value
A matrix of updated means for each class in columns.
Examples
### N = 100 decider, P_r = 2 random coefficients, and C = 2 latent classes
N <- 100
(b_true <- cbind(c(0,0),c(1,1)))
Omega <- matrix(c(1,0.3,0.3,0.5,1,-0.3,-0.3,0.8), ncol=2)
z <- c(rep(1,N/2),rep(2,N/2))
m <- as.numeric(table(z))
beta <- sapply(z, function(z) rmvnorm(b_true[,z], matrix(Omega[,z],2,2)))
### prior mean vector and precision matrix (inverse of covariance matrix)
xi <- c(0,0)
Dinv <- diag(2)
### updated class means (in columns)
update_b(beta = beta, Omega = Omega, z = z, m = m, xi = xi, Dinv = Dinv)
Dirichlet process-based update of latent classes
Description
This function updates the latent classes based on a Dirichlet process.
Usage
update_classes_dp(
Cmax,
beta,
z,
b,
Omega,
delta,
xi,
D,
nu,
Theta,
s_desc = TRUE
)
Arguments
Cmax |
The maximum number of classes. |
beta |
The matrix of the decision-maker specific coefficient vectors of dimension
|
z |
The vector of the allocation variables of length |
b |
The matrix of class means as columns of dimension |
Omega |
The matrix of class covariance matrices as columns of dimension
|
delta |
A numeric for the concentration parameter vector |
xi |
The mean vector of length |
D |
The covariance matrix of dimension |
nu |
The degrees of freedom (a natural number greater than |
Theta |
The scale matrix of dimension |
s_desc |
If |
Details
To be added.
Value
A list of updated values for z
, b
, Omega
, s
,
and C
.
Examples
set.seed(1)
z <- c(rep(1,20),rep(2,30))
b <- matrix(c(1,1,1,-1), ncol=2)
Omega <- matrix(c(1,0.3,0.3,0.5,1,-0.3,-0.3,0.8), ncol=2)
beta <- sapply(z, function(z) rmvnorm(b[,z], matrix(Omega[,z],2,2)))
delta <- 1
xi <- numeric(2)
D <- diag(2)
nu <- 4
Theta <- diag(2)
RprobitB:::update_classes_dp(
Cmax = 10, beta = beta, z = z, b = b, Omega = Omega,
delta = delta, xi = xi, D = D, nu = nu, Theta = Theta
)
Weight-based update of latent classes
Description
This function updates the latent classes based on their class weights.
Usage
update_classes_wb(Cmax, epsmin, epsmax, distmin, s, b, Omega)
Arguments
Cmax |
The maximum number of classes. |
epsmin |
The threshold weight (between 0 and 1) for removing a class. |
epsmax |
The threshold weight (between 0 and 1) for splitting a class. |
distmin |
The (non-negative) threshold difference in class means for joining two classes. |
s |
The vector of class weights of length |
b |
The matrix of class means as columns of dimension |
Omega |
The matrix of class covariance matrices as columns of dimension
|
Details
The updating scheme bases on the following rules:
We remove class
c
, ifs_c<\epsilon_{min}
, i.e. if the class weights_c
drops below some threshold\epsilon_{min}
. This case indicates that classc
has a negligible impact on the mixing distribution.We split class
c
into two classesc_1
andc_2
, ifs_c>\epsilon_{max}
. This case indicates that classc
has a high influence on the mixing distribution whose approximation can potentially be improved by increasing the resolution in directions of high variance. Therefore, the class meansb_{c_1}
andb_{c_2}
of the new classesc_1
andc_2
are shifted in opposite directions from the class meanb_c
of the old classc
in the direction of the highest variance.We join two classes
c_1
andc_2
to one classc
, if||b_{c_1} - b_{c_2}||<\epsilon_{distmin}
, i.e. if the euclidean distance between the class meansb_{c_1}
andb_{c_2}
drops below some threshold\epsilon_{distmin}
. This case indicates location redundancy which should be repealed. The parameters ofc
are assigned by adding the values ofs
fromc_1
andc_2
and averaging the values forb
and\Omega
. The rules are executed in the above order, but only one rule per iteration and only ifCmax
is not exceeded.
Value
A list of updated values for s
, b
, and Omega
.
Examples
### parameter settings
s <- c(0.8,0.2)
b <- matrix(c(1,1,1,-1), ncol=2)
Omega <- matrix(c(0.5,0.3,0.3,0.5,1,-0.1,-0.1,0.8), ncol=2)
### Remove class 2
RprobitB:::update_classes_wb(Cmax = 10, epsmin = 0.3, epsmax = 0.9, distmin = 1,
s = s, b = b, Omega = Omega)
### Split class 1
RprobitB:::update_classes_wb(Cmax = 10, epsmin = 0.1, epsmax = 0.7, distmin = 1,
s = s, b = b, Omega = Omega)
### Join classes
RprobitB:::update_classes_wb(Cmax = 10, epsmin = 0.1, epsmax = 0.9, distmin = 3,
s = s, b = b, Omega = Omega)
Update utility threshold increments
Description
This function updates the utility threshold increments
Usage
update_d(d, y, mu, ll, zeta, Z, Tvec)
Arguments
d |
The current vector of utility threshold increments. |
y |
A matrix of the choices. |
mu |
A matrix of the systematic utilities. |
ll |
Current log-likelihood value. |
zeta |
The mean vector of the normal prior for |
Z |
The covariance matrix of the normal prior for |
Tvec |
The element |
Value
The updated value for d
.
Update class sizes
Description
This function updates the class size vector.
Usage
update_m(C, z, nozero)
Arguments
C |
The number (greater or equal 1) of latent classes of decision makers.
Set to |
z |
The vector of the allocation variables of length |
nozero |
If |
Value
An updated class size vector.
Examples
update_m(C = 3, z = c(1,1,1,2,2,3), FALSE)
Update coefficient vector of multiple linear regression
Description
This function updates the coefficient vector of a multiple linear regression.
Usage
update_reg(mu0, Tau0, XSigX, XSigU)
Arguments
mu0 |
The mean vector of the normal prior distribution for the coefficient vector. |
Tau0 |
The precision matrix (i.e. inverted covariance matrix) of the normal prior distribution for the coefficient vector. |
XSigX |
The matrix |
XSigU |
The vector |
Details
This function draws from the posterior distribution of \beta
in the linear utility
equation
U_n = X_n\beta + \epsilon_n,
where U_n
is the
(latent, but here assumed to be known) utility vector of decider n = 1,\dots,N
, X_n
is the design matrix build from the choice characteristics faced by n
,
\beta
is the unknown coefficient vector (this can be either the fixed
coefficient vector \alpha
or the decider-specific coefficient vector \beta_n
),
and \epsilon_n
is the error term assumed to be normally distributed with mean 0
and (known) covariance matrix \Sigma
.
A priori we assume the (conjugate) normal prior distribution
\beta \sim N(\mu_0,T_0)
with mean vector \mu_0
and precision matrix (i.e. inverted covariance matrix) T_0
.
The posterior distribution for \beta
is normal with
covariance matrix
\Sigma_1 = (T_0 + \sum_{n=1}^N X_n'\Sigma^{-1}X_n)^{-1}
and mean vector
\mu_1 = \Sigma_1(T_0\mu_0 + \sum_{n=1}^N X_n'\Sigma^{-1}U_n)
.
Note the analogy of \mu_1
to the generalized least squares estimator
\hat{\beta}_{GLS} = (\sum_{n=1}^N X_n'\Sigma^{-1}X_n)^{-1} \sum_{n=1}^N X_n'\Sigma^{-1}U_n
which
becomes weighted by the prior parameters \mu_0
and T_0
.
Value
A vector, a draw from the normal posterior distribution of the coefficient vector in a multiple linear regression.
Examples
### true coefficient vector
beta_true <- matrix(c(-1,1), ncol=1)
### error term covariance matrix
Sigma <- matrix(c(1,0.5,0.2,0.5,1,0.2,0.2,0.2,2), ncol=3)
### draw data
N <- 100
X <- replicate(N, matrix(rnorm(6), ncol=2), simplify = FALSE)
eps <- replicate(N, rmvnorm(mu = c(0,0,0), Sigma = Sigma), simplify = FALSE)
U <- mapply(function(X, eps) X %*% beta_true + eps, X, eps, SIMPLIFY = FALSE)
### prior parameters for coefficient vector
mu0 <- c(0,0)
Tau0 <- diag(2)
### draw from posterior of coefficient vector
XSigX <- Reduce(`+`, lapply(X, function(X) t(X) %*% solve(Sigma) %*% X))
XSigU <- Reduce(`+`, mapply(function(X, U) t(X) %*% solve(Sigma) %*% U, X, U, SIMPLIFY = FALSE))
beta_draws <- replicate(100, update_reg(mu0, Tau0, XSigX, XSigU), simplify = TRUE)
rowMeans(beta_draws)
Update class weight vector
Description
This function updates the class weight vector by drawing from its posterior distribution.
Usage
update_s(delta, m)
Arguments
delta |
A numeric for the concentration parameter vector |
m |
The vector of current class frequencies. |
Details
Let m=(m_1,\dots,m_C)
be the frequencies of C
classes.
Given the class weight (probability) vector s=(s_1,\dots,s_C)
, the distribution
of m
is multinomial and its likelihood is
L(m\mid s) \propto \prod_{i=1}^C s_i^{m_i}.
The conjugate prior p(s)
for s
is a Dirichlet distribution, which has a density function
proportional to
\prod_{i=1}^C s_i^{\delta_i-1},
where \delta = (\delta_1,\dots,\delta_C)
is the concentration parameter vector.
Note that in {RprobitB}
, \delta_1=\dots=\delta_C
. This restriction is necessary because the class number C
can change.
The posterior distribution of s
is proportional to
p(s) L(m\mid s) \propto \prod_{i=1}^C s_i^{\delta_i + m_i - 1},
which in turn is proportional to a Dirichlet distribution with parameters \delta+m
.
Value
A vector, a draw from the Dirichlet posterior distribution for s
.
Examples
### number of classes
C <- 4
### current class sizes
m <- sample.int(C)
### concentration parameter for Dirichlet prior (single-valued)
delta <- 1
### updated class weight vector
update_s(delta = 1, m = m)
Update class allocation vector
Description
This function updates the class allocation vector (independently for all observations) by drawing from its conditional distribution.
Usage
update_z(s, beta, b, Omega)
Arguments
s |
The vector of class weights of length |
beta |
The matrix of the decision-maker specific coefficient vectors of dimension
|
b |
The matrix of class means as columns of dimension |
Omega |
The matrix of class covariance matrices as columns of dimension
|
Details
Let z = (z_1,\dots,z_N)
denote the class allocation vector of the observations (mixed coefficients) \beta = (\beta_1,\dots,\beta_N)
.
Independently for each n
, the conditional probability \Pr(z_n = c \mid s,\beta_n,b,\Omega)
of having \beta_n
allocated to class c
for c=1,\dots,C
depends on the class allocation vector s
, the class means b=(b_c)_c
and the class covariance
matrices Omega=(Omega_c)_c
and is proportional to
s_c \phi(\beta_n \mid b_c,Omega_c).
Value
An updated class allocation vector.
Examples
### class weights for C = 2 classes
s <- rdirichlet(c(1,1))
### coefficient vector for N = 1 decider and P_r = 2 random coefficients
beta <- matrix(c(1,1), ncol = 1)
### class means and covariances
b <- cbind(c(0,0),c(1,1))
Omega <- cbind(c(1,0,0,1),c(1,0,0,1))
### updated class allocation vector
update_z(s = s, beta = beta, b = b, Omega = Omega)