mlr3mbo

Flexible Bayesian Optimization in R

Intro

mlr3mbo makes Bayesian Optimization (BO) available within the mlr3 ecosystem. BO can be used for optimizing any black box function, and is very suitable for hyperparameter optimization of machine learning models. mlr3mbo allows for building custom BO algorithms relying on building blocks in a modular fashion, but also provides a variety of standard single- and multi-objective BO algorithms that can be used in a straightforward manner.

We assume that the reader is somewhat familiar with black box optimization and bbotk, hyperparameter optimization and mlr3tuning and knows the basics of BO. Background material is, for example, given by Garnett (2022), Bischl, Binder, et al. (2023), and Bischl, Sonabend, et al. (2023).

Building Blocks

BO is an iterative optimization algorithm that makes use of a so-called surrogate to model the unknown black box function. After having observed an initial design of observations, the surrogate model is trained on all data points observed so far and an acquisition function is used to determine which points of the search space are promising candidates that should be evaluated next. The acquisition function relies on the prediction of the surrogate model and requires no evaluation of the true black box function and therefore is comparably cheap to optimize. After having evaluated the next candidate, the process repeats itself until a given termination criteria is met.

Most BO flavors therefore follow a simple loop:

  1. Fit the surrogate on all observations made so far.
  2. Optimize the acquisition function to find the next candidate that should be evaluated.
  3. Evaluate the next candidate.

In the following, the basic building blocks of BO and their implementation in mlr3mbo are introduced in more detail.

Loop Function

The loop_function determines the behavior of the BO algorithm on a global level, i.e., how the subroutine should look like that is performed at each iteration.

To get an overview of readily available loop_functions, the following dictionary can be inspected:

library(mlr3mbo)
library(data.table)
as.data.table(mlr_loop_functions)
#> Key: <key>
#>                key                         label    instance
#>             <char>                        <char>      <char>
#> 1:    bayesopt_ego Efficient Global Optimization single-crit
#> 2:    bayesopt_emo           Multi-Objective EGO  multi-crit
#> 3:   bayesopt_mpcl      Multipoint Constant Liar single-crit
#> 4: bayesopt_parego                        ParEGO  multi-crit
#> 5: bayesopt_smsego                       SMS-EGO  multi-crit
#>                                   man
#>                                <char>
#> 1:    mlr3mbo::mlr_loop_functions_ego
#> 2:    mlr3mbo::mlr_loop_functions_emo
#> 3:   mlr3mbo::mlr_loop_functions_mpcl
#> 4: mlr3mbo::mlr_loop_functions_parego
#> 5: mlr3mbo::mlr_loop_functions_smsego

The dictionary shows the key, i.e., ID of the loop_function, a brief description, for which optimization instance the resulting BO flavor can be used, as well as how documentation can be accessed.

Technically, all loop_functions are members of the S3 class loop_function, and are simply decorated functions (because using an R6Class class would be over the top here - but this may change in the future).

To write an own loop_function, users can get inspiration from the readily available ones, e.g., bayesopt_ego which performs sequential single-objective optimization:

After having made some assertions and safety checks, and having evaluated the initial design, bayesopt_ego essentially only performs the following steps:

  1. acq_function$surrogate$update() # update the surrogate model
  2. acq_function$update() # update the acquisition function (e.g., update the best value observed so far)
  3. acq_optimizer$optimize() # optimize the acquisition function to yield a new candidate

Surrogate

A surrogate encapsulates a regression learner that models the unknown black box function based on observed data. In mlr3mbo, SurrogateLearner and SurrogateLearnerCollection are the higher-level R6 classes which should be used to construct a surrogate, inheriting from the base Surrogate class.

As a learner, any LearnerRegr from mlr3 can be used, however, most acquisition functions require both a mean and a variance prediction (therefore not all learners are suitable for all scenarios). Typical choices include:

A SurrogateLearner can be constructed via:

library(mlr3learners)
#> Loading required package: mlr3
surrogate = SurrogateLearner$new(lrn("regr.km"))

or using syntactic sugar:

surrogate = srlrn(lrn("regr.km"))

Note that default_gp and default_rf allow for the construction of a Gaussian Process and random forest with sensible hyperparamter values already specified.

The encapsulated learner can be accessed via the $learner field:

surrogate$learner
#> <LearnerRegrKM:regr.km>: Kriging
#> * Model: -
#> * Parameters: list()
#> * Packages: mlr3, mlr3learners, DiceKriging
#> * Predict Types:  response, [se]
#> * Feature Types: logical, integer, numeric
#> * Properties: -

The surrogate itself has the following hyperparameters:

surrogate$param_set
#> <ParamSet(4)>
#> Key: <id>
#>                      id    class lower upper nlevels        default
#>                  <char>   <char> <num> <num>   <num>         <list>
#> 1: assert_insample_perf ParamLgl    NA    NA       2 <NoDefault[0]>
#> 2:         catch_errors ParamLgl    NA    NA       2 <NoDefault[0]>
#> 3:         perf_measure ParamUty    NA    NA     Inf <NoDefault[0]>
#> 4:       perf_threshold ParamDbl  -Inf   Inf     Inf <NoDefault[0]>
#>                 parents  value
#>                  <list> <list>
#> 1:                       FALSE
#> 2:                        TRUE
#> 3: assert_insample_perf       
#> 4: assert_insample_perf

assert_insample_perf = TRUE results in the insample performance of the learner being calculated and asserted against a performance threshold after each $update(). This requires the specification of a perf_measure (any regression measure, e.g., R squared) and a perf_threshold. If the threshold is not met, an error is thrown (that is caught within the optimization loop - unless catch_errors = FALSE and results in, e.g., proposing the next candidate uniformly at random). For more details on error handling, see the Safety Nets section.

Note that this insample performance assertion is not always meaningful, e.g., in the case of using a Gaussian Process with no nugget, the insample performance will always be perfect.

Internally, the learner is fitted on a regression task constructed from the Archive of the OptimInstance that is to be optimized and features and the target variable are determined automatically but can also be specified via the $cols_x and $cols_y active bindings. Ideally, the archive is already passed during construction, however, lazy initialization is also possible (i.e., the $archive field will be automatically populated within the optimization routine of an OptimizerMbo).

Important methods are $update() and $predict() with the former one typically being used within the loop_function and the latter one being used within the implementation of an acquisition function.

Depending on the choice of the loop_function, multiple targets must be modelled by (currently independent) surrogates, in which case a SurrogateLearnerCollection should be used. Construction and hyperparameters are analogous to the single target scenario described above.

To get an overview of all available regression learners within the mlr3 ecosystem, use:

library(mlr3)
library(mlr3learners)
# there are plenty of more in mlr3extralearners
# library(mlr3extralearners)
learners = as.data.table(mlr_learners)
learners[task_type == "regr"]

To use a custom learner not included in mlr3, mlr3learners, or mlr3extralearners, you can inherit from LearnerRegr and use this custom learner within the surrogate.

Acquisition Function

Based on a surrogate, an acquisition function quantifies the attractiveness of each point of the search space if it were to be evaluated in the next iteration.

A popular example is given by the Expected Improvement (Jones, Schonlau, and Welch 1998):

\[ \mathbb{E} \left[ \max \left( f_{\mathrm{min}} - Y(\mathbf{x}), 0 \right) \right] \] Here, \(Y(\mathbf{x})\) is the surrogate prediction (a random variable) for a given point \(\mathbf{x}\) (which when using a Gaussian Process follows a normal distribution) and \(f_{\mathrm{min}}\) is the currently best function value observed so far (when assuming minimization).

To get an overview of available acquisition functions, the following dictionary can be inspected:

as.data.table(mlr_acqfunctions)
#> Key: <key>
#>        key                                              label
#>     <char>                                             <char>
#>  1:    aei                     Augmented Expected Improvement
#>  2:     cb                     Lower / Upper Confidence Bound
#>  3:   ehvi                   Expected Hypervolume Improvement
#>  4: ehvigh Expected Hypervolume Improvement via GH Quadrature
#>  5:     ei                               Expected Improvement
#>  6:   eips                    Expected Improvement Per Second
#>  7:   mean                                     Posterior Mean
#>  8:     pi                         Probability Of Improvement
#>  9:     sd                       Posterior Standard Deviation
#> 10: smsego                                            SMS-EGO
#>                                  man
#>                               <char>
#>  1:    mlr3mbo::mlr_acqfunctions_aei
#>  2:     mlr3mbo::mlr_acqfunctions_cb
#>  3:   mlr3mbo::mlr_acqfunctions_ehvi
#>  4: mlr3mbo::mlr_acqfunctions_ehvigh
#>  5:     mlr3mbo::mlr_acqfunctions_ei
#>  6:   mlr3mbo::mlr_acqfunctions_eips
#>  7:   mlr3mbo::mlr_acqfunctions_mean
#>  8:     mlr3mbo::mlr_acqfunctions_pi
#>  9:     mlr3mbo::mlr_acqfunctions_sd
#> 10: mlr3mbo::mlr_acqfunctions_smsego

The dictionary shows the key, i.e., ID of the acquisition function, a brief description, and how the documentation can be accessed.

Technically, all acquisition functions inherit from the R6 class AcqFunction which itself simply inherits from the base Objective class.

Construction is straightforward via:

acq_function = AcqFunctionEI$new()

or using syntactic sugar:

acq_function = acqf("ei")

Internally, the domain and codomain are constructed based on the Archive referenced by the Surrogate and therefore the surrogate should be passed as an argument already during construction.

However, lazy initialization is also possible.

In the case of the acquisition function itself being parameterized, hyperparameters should be passed as constants, e.g.:

acqf("cb") # lower / upper confidence bound with lambda hyperparameter
#> <AcqFunctionCB:acq_cb>
#> Domain:
#> <ParamSet(0)>
#> Empty.
#> Codomain:
#> <Codomain(0)>
#> Empty.
#> Constants:
#> <ParamSet(1)>
#>        id    class lower upper nlevels default  value
#>    <char>   <char> <num> <num>   <num>  <list> <list>
#> 1: lambda ParamDbl     0   Inf     Inf       2      2

To use a custom acquisition function you should implement a new R6 class inheriting from AcqFunction.

Acquisition Function Optimizer

To find the most promising candidate for evaluation, the acquisition function itself must be optimized. Internally, an OptimInstance is constructed using the acquisition function as an Objective.

An acquisition function optimizer is then used to solve this optimization problem. Technically, this optimizer is a member of the AcqOptimizer R6 class.

Construction requires specifying an Optimizer as well as a Terminator:

library(bbotk)
#> Loading required package: paradox
acq_optimizer = AcqOptimizer$new(opt("random_search"), terminator = trm("evals"))

Syntactic sugar:

acq_optimizer = acqo(opt("random_search"), terminator = trm("evals"))

The optimizer and terminator can be accessed via the $optimizer and $terminator fields:

acq_optimizer$optimizer
#> <OptimizerBatchRandomSearch>: Random Search
#> * Parameters: batch_size=1
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl, ParamFct
#> * Properties: dependencies, single-crit, multi-crit
#> * Packages: bbotk
acq_optimizer$terminator
#> <TerminatorEvals>: Number of Evaluation
#> * Parameters: n_evals=100, k=0

Internally, the acquisition function optimizer also requires the acquisition function and therefore the acq_function argument should be specified during construction.

However, lazy initialization is also possible.

An AcqOptimizer has the following hyperparameters:

acq_optimizer$param_set
#> <ParamSet(6)>
#> Key: <id>
#>                        id    class lower upper nlevels        default   parents
#>                    <char>   <char> <num> <num>   <num>         <list>    <list>
#> 1:           catch_errors ParamLgl    NA    NA       2           TRUE          
#> 2:          logging_level ParamFct    NA    NA       6           warn          
#> 3:           n_candidates ParamInt     1   Inf     Inf              1          
#> 4: skip_already_evaluated ParamLgl    NA    NA       2           TRUE          
#> 5:              warmstart ParamLgl    NA    NA       2          FALSE          
#> 6:         warmstart_size ParamInt     1   Inf     Inf <NoDefault[0]> warmstart
#>     value
#>    <list>
#> 1:   TRUE
#> 2:   warn
#> 3:      1
#> 4:   TRUE
#> 5:  FALSE
#> 6:

catch_errors = TRUE results in catching any errors that can happen during the acquisition function optimization which allows for, e.g., proposing the next candidate uniformly at random within the loop_function. For more details on this mechanism, see the Safety Nets section. logging_level specifies the logging level during acquisition function optimization. Often it is useful to only log the progress of the BO loop and therefore logging_level is set to "warn" by default. For debugging purposes, this should be set to "info". skip_already_evaluated = TRUE will result in not proposing candidates for evaluation that were already evaluated in previous iterations. warmstart = TRUE results in the best warmstart_size points present in the archive of the OptimInstance to also be evaluated on the acquisition function OptimInstance prior to running the actual acquisition function optimization. This is especially useful in the context of using evolutionary algorithms or variants of local search as the acquisition function optimizer (as the current best points should usually be part of the initial population to further optimize local optima).

To get an overview of all Optimizers implemented in bbotk you can use:

as.data.table(mlr_optimizers)

And similarly for Terminators:

as.data.table(mlr_terminators)

Again, you can also use custom Optimizers or Terminators by implementing new R6 classes inheriting from Optimizer and Terminator respectively.

Putting it Together

Having introduced all building blocks we are now ready to put everything together in the form of an OptimizerMbo or TunerMbo.

OptimizerMbo inherits from Optimizer and requires a loop_function, surrogate, acq_function and acq_optimizer.

Construction is performed via:

optimizer = OptimizerMbo$new(bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)

or using syntactic sugar:

optimizer = opt("mbo",
  loop_function = bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)
optimizer
#> <OptimizerMbo>: Model Based Optimization
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl
#> * Properties: single-crit
#> * Packages: mlr3mbo, mlr3, mlr3learners, DiceKriging, bbotk
#> * Loop function: bayesopt_ego
#> * Surrogate: LearnerRegrKM
#> * Acquisition Function: AcqFunctionEI
#> * Acquisition Function Optimizer: (OptimizerBatchRandomSearch |
#>   TerminatorEvals)

Additional arguments, i.e., arguments of the loop_function can be passed via the args argument during construction or the $args active binding. Finally, the mechanism how the final result is obtained after the optimization process (i.e., the best point in the case of single-objective and the Pareto set in the case of multi-objective optimization) can be changed via the result_assigner argument during construction or the $result_assigner active binding. As an example, ResultAssignerSurrogate will choose the final solution based on the prediction of the surrogate instead of the evaluations logged in the archive which is sensible in the case of noisy objective functions. The default, however, is to use ResultAssignerArchive which will directly choose the final solution based on the evaluations logged in the archive. To get an overview of available result assigners, the following dictionary can be inspected:

as.data.table(mlr_result_assigners)
#> Key: <key>
#>          key                     label                                     man
#>       <char>                    <char>                                  <char>
#> 1:   archive                   Archive   mlr3mbo::mlr_result_assigners_archive
#> 2: surrogate Mean Surrogate Prediction mlr3mbo::mlr_result_assigners_surrogate

The dictionary shows the key, i.e., ID of the result assigner, a brief description, and how the documentation can be accessed.

Construction of result assigners is straightforward:

result_assigner = ResultAssignerArchive$new()

Syntactic sugar:

result_assigner = ras("archive")

Note that important fields of an OptimizerMbo such as $param_classes, $packages, $properties are automatically determined based on the choice of the loop_function, surrogate, acq_function, acq_optimizer, and result_assigner.

If arguments such as the surrogate, acq_function, acq_optimizer and result_assigner were not fully initialized during construction, e.g., the surrogate missing the archive, or the acq_function missing the surrogate, lazy initialization is completed prior to the optimizer being used for optimization.

An object of class OptimizerMbo can be used to optimize an object of class OptimInstanceBatchSingleCrit or OptimInstanceBatchMultiCrit.

For hyperparameter optimization, TunerMbo should be used (which simply relies on an OptimizerMbo that is constructed internally):

tuner = TunerMbo$new(bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)

mlr3misc::get_private(tuner)[[".optimizer"]]
#> <OptimizerMbo>: Model Based Optimization
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl
#> * Properties: single-crit
#> * Packages: mlr3mbo, mlr3, mlr3learners, DiceKriging, bbotk
#> * Loop function: bayesopt_ego
#> * Surrogate: LearnerRegrKM
#> * Acquisition Function: AcqFunctionEI
#> * Acquisition Function Optimizer: (OptimizerBatchRandomSearch |
#>   TerminatorEvals)

The Initial Design

mlr3mbo offers two different ways for specifying an initial design:

  1. One can simply evaluate points on the OptimInstance that is to be optimized prior to using an OptimizerMbo. In this case, the loop_function should skip the construction and evaluation of an initial design.
  2. If no points were already evaluated on the OptimInstance, the loop_function should construct an initial design itself and evaluate it, e.g., bayesopt_ego then constructs an initial design of size \(4D\) where \(D\) is the dimensionality of the search space by sampling points uniformly at random.

Functions for creating different initial designs are part of the paradox package, e.g.:

  1. generate_design_random: uniformly at random
  2. generate_design_grid: uniform sized grid
  3. generate_design_lhs: Latin hypercube sampling
  4. generate_design_sobol: Sobol sequence

Defaults

mlr3mbo tries to use intelligent defaults for the loop_function, surrogate, acq_function, and acq_optimizer within OptimizerMbo and TunerMbo.

For details, see mbo_defaults.

Safety Nets

mlr3mbo is quite stable in the sense that - if desired - all kinds of errors can be caught and handled appropriately within the loop_function.

As an example, let’s have a look at the inner workings of bayesopt_ego:

repeat {
  xdt = tryCatch({
    .
    .
    .
    acq_function$surrogate$update()
    acq_function$update()
    acq_optimizer$optimize()
  }, mbo_error = function(mbo_error_condition) {
    lg$info(paste0(class(mbo_error_condition), collapse = " / "))
    lg$info("Proposing a randomly sampled point")
    SamplerUnif$new(domain)$sample(1L)$data
  })
  .
  .
  .
}

In each iteration, a new candidate is chosen based on updating the surrogate and acquisition function, and optimizing the acquisition function. If any error happens during any of these steps, errors are upgraded to errors of class "mbo_error" (and "surrogate_update_error" for surrogate related errors as well as "acq_optimizer_error" for acquisition function optimization related errors). These errors are then caught and a fallback is triggered: Evaluating the next candidate chosen uniformly at random. Note that the same mechanism is actually also used to handle random interleaving.

To illustrate this error handling mechanism, consider the following scenario: We try to minimize \(f: [-1, 1] \rightarrow \mathbb{R}, x \mapsto x^2\), however, our Gaussian Process fails due to data points being too close to each other:

set.seed(2906)
domain = ps(x = p_dbl(lower = -1, upper = 1))

codomain = ps(y = p_dbl(tags = "minimize"))

objective_function = function(xs) {
  list(y = xs$x ^ 2)
}

objective = ObjectiveRFun$new(
  fun = objective_function,
  domain = domain,
  codomain = codomain)

instance = OptimInstanceBatchSingleCrit$new(
  objective = objective,
  terminator = trm("evals", n_evals = 10))

initial_design = data.table(x = rep(1, 4))
instance$eval_batch(initial_design)
#> INFO  [09:59:00.374] [bbotk] Evaluating 4 configuration(s)
#> INFO  [09:59:00.453] [bbotk] Result of batch 1:
#> INFO  [09:59:00.460] [bbotk]  x y
#> INFO  [09:59:00.460] [bbotk]  1 1
#> INFO  [09:59:00.460] [bbotk]  1 1
#> INFO  [09:59:00.460] [bbotk]  1 1
#> INFO  [09:59:00.460] [bbotk]  1 1

surrogate = srlrn(default_gp())
acq_function = acqf("ei")
acq_optimizer = acqo(opt("random_search", batch_size = 1000),
  terminator = trm("evals", n_evals = 1000))
optimizer = opt("mbo",
  loop_function = bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)

optimizer$optimize(instance)
#> INFO  [09:59:00.709] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerMbo>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> WARN  [09:59:00.905] [bbotk] missing value where TRUE/FALSE needed
#> INFO  [09:59:00.907] [bbotk] surrogate_update_error / mbo_error / error / condition
#> INFO  [09:59:00.909] [bbotk] Proposing a randomly sampled point
#> INFO  [09:59:00.952] [bbotk] Evaluating 1 configuration(s)
#> INFO  [09:59:00.989] [bbotk] Result of batch 2:
#> INFO  [09:59:00.992] [bbotk]          x         y
#> INFO  [09:59:00.992] [bbotk]  0.6742299 0.4545859
#> INFO  [09:59:01.390] [bbotk] Evaluating 1 configuration(s)
#> INFO  [09:59:01.425] [bbotk] Result of batch 3:
#> INFO  [09:59:01.428] [bbotk]          x  x_domain     acq_ei .already_evaluated         y
#> INFO  [09:59:01.428] [bbotk]  0.6686506 <list[1]> 0.04358139              FALSE 0.4470936
#> INFO  [09:59:01.689] [bbotk] Evaluating 1 configuration(s)
#> INFO  [09:59:01.705] [bbotk] Result of batch 4:
#> INFO  [09:59:01.707] [bbotk]          x  x_domain    acq_ei .already_evaluated         y
#> INFO  [09:59:01.707] [bbotk]  0.3883431 <list[1]> 0.1264431              FALSE 0.1508104
#> INFO  [09:59:01.878] [bbotk] Evaluating 1 configuration(s)
#> INFO  [09:59:01.892] [bbotk] Result of batch 5:
#> INFO  [09:59:01.894] [bbotk]           x  x_domain    acq_ei .already_evaluated         y
#> INFO  [09:59:01.894] [bbotk]  -0.5325387 <list[1]> 0.2231668              FALSE 0.2835975
#> INFO  [09:59:02.051] [bbotk] Evaluating 1 configuration(s)
#> INFO  [09:59:02.065] [bbotk] Result of batch 6:
#> INFO  [09:59:02.066] [bbotk]           x  x_domain    acq_ei .already_evaluated           y
#> INFO  [09:59:02.066] [bbotk]  0.04818857 <list[1]> 0.1256545              FALSE 0.002322138
#> INFO  [09:59:02.217] [bbotk] Evaluating 1 configuration(s)
#> INFO  [09:59:02.230] [bbotk] Result of batch 7:
#> INFO  [09:59:02.232] [bbotk]            x  x_domain      acq_ei .already_evaluated            y
#> INFO  [09:59:02.232] [bbotk]  -0.01951836 <list[1]> 0.002040261              FALSE 0.0003809664
#> INFO  [09:59:02.290] [bbotk] Finished optimizing after 10 evaluation(s)
#> INFO  [09:59:02.290] [bbotk] Result:
#> INFO  [09:59:02.292] [bbotk]            x  x_domain            y
#> INFO  [09:59:02.292] [bbotk]        <num>    <list>        <num>
#> INFO  [09:59:02.292] [bbotk]  -0.01951836 <list[1]> 0.0003809664
#>              x  x_domain            y
#>          <num>    <list>        <num>
#> 1: -0.01951836 <list[1]> 0.0003809664

The log tells us that an error happened and was caught: "surrogate_update_error / mbo_error / error / condition". We also see in the archive that the first candidate after the initial design (the fifth point) was not proposed based on optimizing the acquisition function (because the "acq_ei" column is NA here):

instance$archive$data
#>               x            y  x_domain           timestamp batch_nr      acq_ei
#>           <num>        <num>    <list>              <POSc>    <int>       <num>
#>  1:  1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00        1          NA
#>  2:  1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00        1          NA
#>  3:  1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00        1          NA
#>  4:  1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00        1          NA
#>  5:  0.67422986 0.4545859064 <list[1]> 2024-07-06 09:59:00        2          NA
#>  6:  0.66865060 0.4470936288 <list[1]> 2024-07-06 09:59:01        3 0.043581388
#>  7:  0.38834314 0.1508103944 <list[1]> 2024-07-06 09:59:01        4 0.126443136
#>  8: -0.53253872 0.2835974922 <list[1]> 2024-07-06 09:59:01        5 0.223166767
#>  9:  0.04818857 0.0023221379 <list[1]> 2024-07-06 09:59:02        6 0.125654544
#> 10: -0.01951836 0.0003809664 <list[1]> 2024-07-06 09:59:02        7 0.002040261
#>     .already_evaluated
#>                 <lgcl>
#>  1:                 NA
#>  2:                 NA
#>  3:                 NA
#>  4:                 NA
#>  5:                 NA
#>  6:              FALSE
#>  7:              FALSE
#>  8:              FALSE
#>  9:              FALSE
#> 10:              FALSE

Nevertheless, due to the safety net, the BO loop eventually worked just fine and did not simply throw an error.

If we set catch_errors = FALSE within the surrogate, we see that the error was indeed caused by the surrogate:

instance$archive$clear()
instance$eval_batch(initial_design)
#> INFO  [09:59:02.325] [bbotk] Evaluating 4 configuration(s)
#> INFO  [09:59:02.363] [bbotk] Result of batch 1:
#> INFO  [09:59:02.365] [bbotk]  x y
#> INFO  [09:59:02.365] [bbotk]  1 1
#> INFO  [09:59:02.365] [bbotk]  1 1
#> INFO  [09:59:02.365] [bbotk]  1 1
#> INFO  [09:59:02.365] [bbotk]  1 1
optimizer$surrogate$param_set$values$catch_errors = FALSE
optimizer$optimize(instance)
#> INFO  [09:59:02.375] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerMbo>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> Error in if (varinit.vario <= 1e-20) varinit.vario <- 1/2 * varinit.vario.total: missing value where TRUE/FALSE needed

In case of the error belonging to the acq_optimizer_error class, it is helpful to increase the logging level of the acquisition function optimizer (e.g., acq_optimizer$param_set$values$logging_level = "info") and also set acq_optimizer$param_set$values$catch_errors = FALSE. This allows for straightforward debugging.

To make sure that your BO loop behaved as expected, always inspect the log of the optimization process and inspect the archive and check whether the acquisition function column is populated as expected.

Writing Your Own Loop Function

Writing a custom loop_function is straightforward.

Any loop_function must be an object of the S3 class loop_function (simply a standard R function with some requirements regarding its arguments and attributes). Arguments of the function must include instance, surrogate, acq_function, and acq_optimizer and attributes must include id (id of the loop function), label (brief description), instance (“single-crit” and or “multi_crit”), and man (link to the manual page).

Technically, any loop_function therefore looks like the following:

bayesopt_custom = function(instance, surrogate, acq_function, acq_optimizer) {
  # typically some assertions

  # initial design handling

  # actual loop function
}

class(bayesopt_custom) = "loop_function"
attr(bayesopt_custom, "id") = "bayesopt_custom"
attr(bayesopt_custom, "label") = "My custom BO loop"
attr(bayesopt_custom, "instance") = "single-crit"
attr(bayesopt_custom, "man") = ""  # no man page
# if you want to add it to the dictionary: mlr_loop_functions$add("bayesopt_custom", bayesopt_custom)
bayesopt_custom
#> Loop function: bayesopt_custom
#> * Description: My custom BO loop
#> * Supported Instance: single-crit

For some inspiration on how to write actual meaningful loop_functions, see, e.g., bayesopt_ego.

Examples

In this final section, some standard examples are provided.

Single-Objective: 2D Schwefel Function

objective_function = function(xs) {
  list(y = 418.9829 * 2 - (sum(unlist(xs) * sin(sqrt(abs(unlist(xs)))))))
}
domain = ps(x1 = p_dbl(lower = -500, upper = 500),
  x2 = p_dbl(lower = -500, upper = 500))
codomain = ps(y = p_dbl(tags = "minimize"))

objective = ObjectiveRFun$new(
  fun = objective_function,
  domain = domain,
  codomain = codomain)

instance = OptimInstanceBatchSingleCrit$new(
  objective = objective,
  search_space = domain,
  terminator = trm("evals", n_evals = 60))

# Gaussian Process, EI, DIRECT
surrogate = srlrn(default_gp())
acq_function = acqf("ei")
acq_optimizer = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
  terminator = trm("stagnation", threshold = 1e-8))
optimizer = opt("mbo",
  loop_function = bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)

set.seed(2906)
optimizer$optimize(instance)
library(ggplot2)

ggplot(aes(x = batch_nr, y = cummin(y)), data = instance$archive$data) +
  geom_point() +
  geom_step() +
  labs(x = "Batch Nr.", y = "Best y") +
  theme_minimal()
xdt = generate_design_grid(instance$search_space, resolution = 101)$data
ydt = objective$eval_dt(xdt)
ggplot(aes(x = x1, y = x2, z = y), data = cbind(xdt, ydt)) +
  geom_contour_filled() +
  geom_point(aes(color = batch_nr), size = 2, data = instance$archive$data) +
  scale_color_gradient(low = "lightgrey", high = "red") +
  theme_minimal()

Multi-Objective: Schaffer Function N. 1

ParEGO

objective_function = function(xs) {
  list(y1 = xs$x^2, y2 = (xs$x - 2)^2)
}
domain = ps(x = p_dbl(lower = -10, upper = 10))
codomain = ps(y1 = p_dbl(tags = "minimize"), y2 = p_dbl(tags = "minimize"))

objective = ObjectiveRFun$new(
  fun = objective_function,
  domain = domain,
  codomain = codomain)

instance = OptimInstanceBatchMultiCrit$new(
  objective = objective,
  search_space = domain,
  terminator = trm("evals", n_evals = 30))

# Gaussian Process, EI, DIRECT
surrogate = srlrn(default_gp())
acq_function = acqf("ei")
acq_optimizer = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
  terminator = trm("stagnation", threshold = 1e-8))
optimizer = opt("mbo",
  loop_function = bayesopt_parego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)

set.seed(2906)
optimizer$optimize(instance)
ggplot(aes(x = y1, y = y2), data = instance$archive$best()) +
  geom_point() +
  theme_minimal()
library(emoa)
library(mlr3misc)
library(data.table)
anytime_hypervolume = map_dtr(unique(instance$archive$data$batch_nr), function(bnr) {
  pareto = instance$archive$best(batch = 1:bnr)[, instance$archive$cols_y, with = FALSE]
  dhv = dominated_hypervolume(t(pareto), ref = t(t(c(100, 144))))
  data.table(batch_nr = bnr, dhv = dhv)
})

ggplot(aes(x = batch_nr, y = dhv), data = anytime_hypervolume[batch_nr > 1]) +
  geom_point() +
  geom_step(direction = "vh") +
  labs(x = "Batch Nr.", y = "Dominated Hypervolume") +
  theme_minimal()

SMS-EGO

objective_function = function(xs) {
  list(y1 = xs$x^2, y2 = (xs$x - 2)^2)
}
domain = ps(x = p_dbl(lower = -10, upper = 10))
codomain = ps(y1 = p_dbl(tags = "minimize"), y2 = p_dbl(tags = "minimize"))

objective = ObjectiveRFun$new(
  fun = objective_function,
  domain = domain,
  codomain = codomain)

instance = OptimInstanceBatchMultiCrit$new(
  objective = objective,
  search_space = domain,
  terminator = trm("evals", n_evals = 30))

# Gaussian Processes, SMS-EGO, DIRECT
learner_y1 = default_gp()
learner_y2 = learner_y1$clone(deep = TRUE)
surrogate = srlrn(list(learner_y1, learner_y2))
acq_function = acqf("smsego")
acq_optimizer = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
  terminator = trm("stagnation", threshold = 1e-8))
optimizer = opt("mbo",
  loop_function = bayesopt_smsego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)

set.seed(2906)
optimizer$optimize(instance)
ggplot(aes(x = y1, y = y2), data = instance$archive$best()) +
  geom_point() +
  theme_minimal()
anytime_hypervolume = map_dtr(unique(instance$archive$data$batch_nr), function(bnr) {
  pareto = instance$archive$best(batch = 1:bnr)[, instance$archive$cols_y, with = FALSE]
  dhv = dominated_hypervolume(t(pareto), ref = t(t(c(100, 144))))
  data.table(batch_nr = bnr, dhv = dhv)
})

ggplot(aes(x = batch_nr, y = dhv), data = anytime_hypervolume[batch_nr > 1]) +
  geom_point() +
  geom_step(direction = "vh") +
  labs(x = "Batch Nr.", y = "Dominated Hypervolume") +
  theme_minimal()

Single-Objective HPO

library(mlr3)
task = tsk("wine")
learner = lrn("classif.rpart",
  cp = to_tune(lower = 1e-4, upper = 1, logscale = TRUE),
  maxdepth = to_tune(lower = 1, upper = 10),
  minbucket = to_tune(lower = 1, upper = 10),
  minsplit = to_tune(lower = 1, upper = 10))
resampling = rsmp("cv", folds = 3)
measure = msr("classif.acc")

instance = TuningInstanceBatchSingleCrit$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measure = measure,
  terminator = trm("evals", n_evals = 30))

# Gaussian Process, EI, FocusSearch
surrogate = srlrn(default_gp(noisy = TRUE))
acq_function = acqf("ei")
acq_optimizer = acqo(opt("focus_search", n_points = 100L, maxit = 9),
  terminator = trm("evals", n_evals = 3000))
tuner = tnr("mbo",
  loop_function = bayesopt_ego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)

set.seed(2906)
tuner$optimize(instance)
instance$result

Multi-Objective HPO

task = tsk("wine")
learner = lrn("classif.rpart",
  cp = to_tune(lower = 1e-4, upper = 1, logscale = TRUE),
  maxdepth = to_tune(lower = 1, upper = 10),
  minbucket = to_tune(lower = 1, upper = 10),
  minsplit = to_tune(lower = 1, upper = 10))
resampling = rsmp("cv", folds = 3)
measures = msrs(c("classif.acc", "selected_features"))

instance = TuningInstanceBatchMultiCrit$new(
  task = task,
  learner = learner,
  resampling = resampling,
  measures = measures,
  terminator = trm("evals", n_evals = 30),
  store_models = TRUE) # required due to selected features

# Gaussian Process, EI, FocusSearch
surrogate = srlrn(default_gp(noisy = TRUE))
acq_function = acqf("ei")
acq_optimizer = acqo(opt("focus_search", n_points = 100L, maxit = 9),
  terminator = trm("evals", n_evals = 3000))
tuner = tnr("mbo",
  loop_function = bayesopt_parego,
  surrogate = surrogate,
  acq_function = acq_function,
  acq_optimizer = acq_optimizer)

set.seed(2906)
tuner$optimize(instance)
instance$result
Bischl, Bernd, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, et al. 2023. “Hyperparameter Optimization: Foundations, Algorithms, Best Practices, and Open Challenges.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1484.
Bischl, Bernd, Raphael Sonabend, Lars Kotthoff, and Michel Lang, eds. 2023. Flexible and Robust Machine Learning Using mlr3 in R. https://mlr3book.mlr-org.com.
Garnett, Roman. 2022. Bayesian Optimization. Cambridge University Press. https://bayesoptbook.com/.
Jones, Donald R., Matthias Schonlau, and William J. Welch. 1998. “Efficient Global Optimization of Expensive Black-Box Functions.” Journal of Global Optimization 13 (4): 455–92. https://doi.org/10.1023/A:1008306431147.