Introduction to DUToolkit

During public health crises such as the COVID-19 pandemic, decision-makers rely on models to predict and estimate the impact of various policy alternatives on health outcomes. Often, there is a high degree of uncertainty in the evidence base underpinning these models. When there is increased uncertainty, the risk of selecting a policy option that does not align with the intended policy objective also increases; we term this decision risk. Even when models adequately capture uncertainty, the tools used to communicate their outcomes, underlying uncertainty, and the associated decision risk are important to mitigate decisions to adopt sub-optimal policies and/or critical health technologies.

The DUToolkit package provides a suite of tools and visualizations for the characterization, estimation, and communication of parameter uncertainty and decision risk. The package is designed to evaluate the impact of policy alternatives on outcomes compared to a pre-defined baseline scenario. The baseline scenario is typically defined as maintaining the status quo or a scenario where no mitigation policies are implemented (i.e. a ‘do nothing’ or ‘existing policy’ scenario). DUToolkit leverages model outputs from uncertainty analysis techniques, such as probabilistic sensitivity analysis, general uncertainty analysis, or Bayesian inference, to support decision-making.

Getting started

The DUToolkit functions fall into five main categories:

Synthetic data

The DUToolkit package includes pre-loaded synthetic model outputs stored in the R object psa_data, which serve as an example dataset. This dataset represents a hypothetical scenario where a decision-maker is selecting between two policies related to COVID-19 in 2020: (i) Baseline – do nothing/current state and (ii) Intervention 1 – close schools. Each policy is expected to impact the number of individuals in the hospital. Hospital capacity has a maximum upper bound, which is the decision threshold.

Data format

The DUToolkit functions require model outputs from multiple simulation runs using different parameter sets (e.g., probabilistic sensitivity analysis, general uncertainty analysis, or Bayesian inference). These outputs must follow a standardized format, as follows:

  1. A list of data.frames (Required)
    • The list must contain one data.frame for each policy alternative.

    • Each data.frame must have:

      • A first column representing model time, either as numeric values, (e.g., 1, 2, 3, …) or as dates in R Date format (e.g., 2021-01-01, 2021-01-02, …) with class = “Date”.

      • Subsequent columns containing predicted outputs for each simulation run at the corresponding time points (e.g., if there are 100 simulations, there will be 101 columns in the data.frame).

    • To ensure a consistent basis for comparison, the model time in the first column should be identical across all policy alternatives (i.e., the first column in every data.frame should contain the same values).

library(DUToolkit)

# example data.frame with date in first column
head(psa_data$Baseline[, 1:5])
#>          date        1        2        3        4
#> 2  2021-01-01 37.23075 36.13261 36.62189 36.85947
#> 4  2021-01-02 30.84229 27.20223 28.65276 29.45438
#> 6  2021-01-03 27.77702 21.21132 23.64767 25.18233
#> 8  2021-01-04 29.58525 17.42107 20.96771 23.48235
#> 10 2021-01-05 40.12541 15.36282 20.08010 25.14026
#> 12 2021-01-06 60.30618 14.74307 21.08585 32.24770
  1. A list of vectors containing weights (Optional)

    • Some simulation runs may be more or less likely than others. Various methods can account for this, such as calculating a log-likelihood for each simulation run and converting it into a weight. Users must choose the most appropriate method for their specific scenario.

    • Each vector in the list corresponds to a specific policy alternative and contains the weights assigned to each simulation run.

    • Each weight vector must have:

      • The same number of elements as the number of simulation run columns in the corresponding output data.frame (i.e., all columns except the first column).

      • The order of weights must match the order of simulation run columns in the corresponding data.frame.