mlr3mbo makes Bayesian Optimization (BO) available within the mlr3 ecosystem. BO can be used for optimizing any black box function, and is very suitable for hyperparameter optimization of machine learning models. mlr3mbo allows for building custom BO algorithms relying on building blocks in a modular fashion, but also provides a variety of standard single- and multi-objective BO algorithms that can be used in a straightforward manner.

We assume that the reader is somewhat familiar with black box optimization and bbotk, hyperparameter optimization and mlr3tuning and knows the basics of BO. Background material is, for example, given by Garnett (2022), Bischl, Binder, et al. (2023), and Bischl, Sonabend, et al. (2023).

BO is an iterative optimization algorithm that makes use of a so-called surrogate to model the unknown black box function. After having observed an initial design of observations, the surrogate model is trained on all data points observed so far and an acquisition function is used to determine which points of the search space are promising candidates that should be evaluated next. The acquisition function relies on the prediction of the surrogate model and requires no evaluation of the true black box function and therefore is comparably cheap to optimize. After having evaluated the next candidate, the process repeats itself until a given termination criteria is met.

Most BO flavors therefore follow a simple loop:

- Fit the surrogate on all observations made so far.
- Optimize the acquisition function to find the next candidate that should be evaluated.
- Evaluate the next candidate.

In the following, the basic building blocks of BO and their implementation in mlr3mbo are introduced in more detail.

The `loop_function`

determines the behavior of the BO algorithm on a global level, i.e., how
the subroutine should look like that is performed at each iteration.

To get an overview of readily available `loop_function`

s,
the following dictionary can be inspected:

```
library(mlr3mbo)
library(data.table)
as.data.table(mlr_loop_functions)
#> Key: <key>
#> key label instance
#> <char> <char> <char>
#> 1: bayesopt_ego Efficient Global Optimization single-crit
#> 2: bayesopt_emo Multi-Objective EGO multi-crit
#> 3: bayesopt_mpcl Multipoint Constant Liar single-crit
#> 4: bayesopt_parego ParEGO multi-crit
#> 5: bayesopt_smsego SMS-EGO multi-crit
#> man
#> <char>
#> 1: mlr3mbo::mlr_loop_functions_ego
#> 2: mlr3mbo::mlr_loop_functions_emo
#> 3: mlr3mbo::mlr_loop_functions_mpcl
#> 4: mlr3mbo::mlr_loop_functions_parego
#> 5: mlr3mbo::mlr_loop_functions_smsego
```

The dictionary shows the `key`

, i.e., ID of the
`loop_function`

, a brief description, for which optimization
instance the resulting BO flavor can be used, as well as how
documentation can be accessed.

Technically, all `loop_function`

s are members of the
`S3`

class `loop_function`

,
and are simply decorated `functions`

(because using an `R6Class`

class would be over the top here - but this may change in the
future).

To write an own `loop_function`

, users can get inspiration
from the readily available ones, e.g., `bayesopt_ego`

which performs sequential single-objective optimization:

After having made some assertions and safety checks, and having
evaluated the initial design, `bayesopt_ego`

essentially only
performs the following steps:

`acq_function$surrogate$update()`

# update the surrogate model`acq_function$update()`

# update the acquisition function (e.g., update the best value observed so far)`acq_optimizer$optimize()`

# optimize the acquisition function to yield a new candidate

A surrogate encapsulates a regression learner that models the unknown
black box function based on observed data. In mlr3mbo, `SurrogateLearner`

and `SurrogateLearnerCollection`

are the higher-level `R6`

classes which should be used to
construct a surrogate, inheriting from the base `Surrogate`

class.

As a learner, any `LearnerRegr`

from mlr3 can be used, however,
most acquisition functions require both a mean and a variance prediction
(therefore not all learners are suitable for all scenarios). Typical
choices include:

- A
`Gaussian Process`

for low dimensional numeric search spaces - A
`Random Forest`

for higher dimensional mixed (and / or hierarchical) search spaces

A `SurrogateLearner`

can be constructed via:

```
library(mlr3learners)
#> Loading required package: mlr3
```

`= SurrogateLearner$new(lrn("regr.km")) surrogate `

or using syntactic sugar:

`= srlrn(lrn("regr.km")) surrogate `

Note that `default_gp`

and `default_rf`

allow for the construction of a Gaussian Process and random forest with
sensible hyperparamter values already specified.

The encapsulated learner can be accessed via the
`$learner`

field:

```
$learner
surrogate#> <LearnerRegrKM:regr.km>: Kriging
#> * Model: -
#> * Parameters: list()
#> * Packages: mlr3, mlr3learners, DiceKriging
#> * Predict Types: response, [se]
#> * Feature Types: logical, integer, numeric
#> * Properties: -
```

The surrogate itself has the following hyperparameters:

```
$param_set
surrogate#> <ParamSet(4)>
#> Key: <id>
#> id class lower upper nlevels default
#> <char> <char> <num> <num> <num> <list>
#> 1: assert_insample_perf ParamLgl NA NA 2 <NoDefault[0]>
#> 2: catch_errors ParamLgl NA NA 2 <NoDefault[0]>
#> 3: perf_measure ParamUty NA NA Inf <NoDefault[0]>
#> 4: perf_threshold ParamDbl -Inf Inf Inf <NoDefault[0]>
#> parents value
#> <list> <list>
#> 1: FALSE
#> 2: TRUE
#> 3: assert_insample_perf
#> 4: assert_insample_perf
```

`assert_insample_perf = TRUE`

results in the insample
performance of the learner being calculated and asserted against a
performance threshold after each `$update()`

. This requires
the specification of a `perf_measure`

(any regression
measure, e.g., `R squared`

)
and a `perf_threshold`

. If the threshold is not met, an error
is thrown (that is caught within the optimization loop - unless
`catch_errors = FALSE`

and results in, e.g., proposing the
next candidate uniformly at random). For more details on error handling,
see the Safety Nets section.

Note that this insample performance assertion is not always meaningful, e.g., in the case of using a Gaussian Process with no nugget, the insample performance will always be perfect.

Internally, the learner is fitted on a `regression task`

constructed from the `Archive`

of the `OptimInstance`

that is to be optimized and features and the target variable are
determined automatically but can also be specified via the
`$cols_x`

and `$cols_y`

active bindings. Ideally,
the `archive`

is already passed during construction, however,
lazy initialization is also possible (i.e., the `$archive`

field will be automatically populated within the optimization routine of
an `OptimizerMbo`

).

Important methods are `$update()`

and
`$predict()`

with the former one typically being used within
the `loop_function`

and the latter one being used within the
implementation of an acquisition function.

Depending on the choice of the `loop_function`

, multiple
targets must be modelled by (currently independent) surrogates, in which
case a `SurrogateLearnerCollection`

should be used.
Construction and hyperparameters are analogous to the single target
scenario described above.

To get an overview of all available regression learners within the mlr3 ecosystem, use:

```
library(mlr3)
library(mlr3learners)
# there are plenty of more in mlr3extralearners
# library(mlr3extralearners)
= as.data.table(mlr_learners)
learners == "regr"] learners[task_type
```

To use a custom learner not included in mlr3, mlr3learners, or mlr3extralearners, you
can inherit from `LearnerRegr`

and use this custom learner within the surrogate.

Based on a surrogate, an acquisition function quantifies the attractiveness of each point of the search space if it were to be evaluated in the next iteration.

A popular example is given by the Expected Improvement (Jones, Schonlau, and Welch 1998):

\[ \mathbb{E} \left[ \max \left( f_{\mathrm{min}} - Y(\mathbf{x}), 0 \right) \right] \] Here, \(Y(\mathbf{x})\) is the surrogate prediction (a random variable) for a given point \(\mathbf{x}\) (which when using a Gaussian Process follows a normal distribution) and \(f_{\mathrm{min}}\) is the currently best function value observed so far (when assuming minimization).

To get an overview of available acquisition functions, the following dictionary can be inspected:

```
as.data.table(mlr_acqfunctions)
#> Key: <key>
#> key label
#> <char> <char>
#> 1: aei Augmented Expected Improvement
#> 2: cb Lower / Upper Confidence Bound
#> 3: ehvi Expected Hypervolume Improvement
#> 4: ehvigh Expected Hypervolume Improvement via GH Quadrature
#> 5: ei Expected Improvement
#> 6: eips Expected Improvement Per Second
#> 7: mean Posterior Mean
#> 8: pi Probability Of Improvement
#> 9: sd Posterior Standard Deviation
#> 10: smsego SMS-EGO
#> man
#> <char>
#> 1: mlr3mbo::mlr_acqfunctions_aei
#> 2: mlr3mbo::mlr_acqfunctions_cb
#> 3: mlr3mbo::mlr_acqfunctions_ehvi
#> 4: mlr3mbo::mlr_acqfunctions_ehvigh
#> 5: mlr3mbo::mlr_acqfunctions_ei
#> 6: mlr3mbo::mlr_acqfunctions_eips
#> 7: mlr3mbo::mlr_acqfunctions_mean
#> 8: mlr3mbo::mlr_acqfunctions_pi
#> 9: mlr3mbo::mlr_acqfunctions_sd
#> 10: mlr3mbo::mlr_acqfunctions_smsego
```

The dictionary shows the `key`

, i.e., ID of the
acquisition function, a brief description, and how the documentation can
be accessed.

Technically, all acquisition functions inherit from the
`R6`

class `AcqFunction`

which itself simply inherits from the base `Objective`

class.

Construction is straightforward via:

`= AcqFunctionEI$new() acq_function `

or using syntactic sugar:

`= acqf("ei") acq_function `

Internally, the `domain`

and `codomain`

are
constructed based on the `Archive`

referenced by the `Surrogate`

and therefore the `surrogate`

should be passed as an argument
already during construction.

However, lazy initialization is also possible.

In the case of the acquisition function itself being parameterized, hyperparameters should be passed as constants, e.g.:

```
acqf("cb") # lower / upper confidence bound with lambda hyperparameter
#> <AcqFunctionCB:acq_cb>
#> Domain:
#> <ParamSet(0)>
#> Empty.
#> Codomain:
#> <Codomain(0)>
#> Empty.
#> Constants:
#> <ParamSet(1)>
#> id class lower upper nlevels default value
#> <char> <char> <num> <num> <num> <list> <list>
#> 1: lambda ParamDbl 0 Inf Inf 2 2
```

To use a custom acquisition function you should implement a new
`R6`

class inheriting from `AcqFunction`

.

To find the most promising candidate for evaluation, the acquisition
function itself must be optimized. Internally, an `OptimInstance`

is constructed using the acquisition function as an `Objective`

.

An acquisition function optimizer is then used to solve this
optimization problem. Technically, this optimizer is a member of the `AcqOptimizer`

`R6`

class.

Construction requires specifying an `Optimizer`

as well as a `Terminator`

:

```
library(bbotk)
#> Loading required package: paradox
```

`= AcqOptimizer$new(opt("random_search"), terminator = trm("evals")) acq_optimizer `

Syntactic sugar:

`= acqo(opt("random_search"), terminator = trm("evals")) acq_optimizer `

The optimizer and terminator can be accessed via the
`$optimizer`

and `$terminator`

fields:

```
$optimizer
acq_optimizer#> <OptimizerBatchRandomSearch>: Random Search
#> * Parameters: batch_size=1
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl, ParamFct
#> * Properties: dependencies, single-crit, multi-crit
#> * Packages: bbotk
```

```
$terminator
acq_optimizer#> <TerminatorEvals>: Number of Evaluation
#> * Parameters: n_evals=100, k=0
```

Internally, the acquisition function optimizer also requires the
acquisition function and therefore the `acq_function`

argument should be specified during construction.

However, lazy initialization is also possible.

An `AcqOptimizer`

has the following hyperparameters:

```
$param_set
acq_optimizer#> <ParamSet(6)>
#> Key: <id>
#> id class lower upper nlevels default parents
#> <char> <char> <num> <num> <num> <list> <list>
#> 1: catch_errors ParamLgl NA NA 2 TRUE
#> 2: logging_level ParamFct NA NA 6 warn
#> 3: n_candidates ParamInt 1 Inf Inf 1
#> 4: skip_already_evaluated ParamLgl NA NA 2 TRUE
#> 5: warmstart ParamLgl NA NA 2 FALSE
#> 6: warmstart_size ParamInt 1 Inf Inf <NoDefault[0]> warmstart
#> value
#> <list>
#> 1: TRUE
#> 2: warn
#> 3: 1
#> 4: TRUE
#> 5: FALSE
#> 6:
```

`catch_errors = TRUE`

results in catching any errors that
can happen during the acquisition function optimization which allows
for, e.g., proposing the next candidate uniformly at random within the
`loop_function`

. For more details on this mechanism, see the
Safety Nets section.
`logging_level`

specifies the logging level during
acquisition function optimization. Often it is useful to only log the
progress of the BO loop and therefore `logging_level`

is set
to `"warn"`

by default. For debugging purposes, this should
be set to `"info"`

.
`skip_already_evaluated = TRUE`

will result in not proposing
candidates for evaluation that were already evaluated in previous
iterations. `warmstart = TRUE`

results in the best
`warmstart_size`

points present in the `archive`

of the `OptimInstance`

to also be evaluated on the
acquisition function `OptimInstance`

prior to running the
actual acquisition function optimization. This is especially useful in
the context of using evolutionary algorithms or variants of local search
as the acquisition function optimizer (as the current best points should
usually be part of the initial population to further optimize local
optima).

To get an overview of all `Optimizer`

s implemented in bbotk you can use:

`as.data.table(mlr_optimizers)`

And similarly for `Terminator`

s:

`as.data.table(mlr_terminators)`

Again, you can also use custom `Optimizer`

s or
`Terminator`

s by implementing new `R6`

classes
inheriting from `Optimizer`

and `Terminator`

respectively.

Having introduced all building blocks we are now ready to put
everything together in the form of an `OptimizerMbo`

or `TunerMbo`

.

`OptimizerMbo`

inherits from `Optimizer`

and requires a `loop_function`

, `surrogate`

,
`acq_function`

and `acq_optimizer`

.

Construction is performed via:

```
= OptimizerMbo$new(bayesopt_ego,
optimizer surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
```

or using syntactic sugar:

```
= opt("mbo",
optimizer loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
```

```
optimizer#> <OptimizerMbo>: Model Based Optimization
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl
#> * Properties: single-crit
#> * Packages: mlr3mbo, mlr3, mlr3learners, DiceKriging, bbotk
#> * Loop function: bayesopt_ego
#> * Surrogate: LearnerRegrKM
#> * Acquisition Function: AcqFunctionEI
#> * Acquisition Function Optimizer: (OptimizerBatchRandomSearch |
#> TerminatorEvals)
```

Additional arguments, i.e., arguments of the
`loop_function`

can be passed via the `args`

argument during construction or the `$args`

active binding.
Finally, the mechanism how the final result is obtained after the
optimization process (i.e., the best point in the case of
single-objective and the Pareto set in the case of multi-objective
optimization) can be changed via the `result_assigner`

argument during construction or the `$result_assigner`

active
binding. As an example, `ResultAssignerSurrogate`

will choose the final solution based on the prediction of the surrogate
instead of the evaluations logged in the `archive`

which is
sensible in the case of noisy objective functions. The default, however,
is to use `ResultAssignerArchive`

which will directly choose the final solution based on the evaluations
logged in the `archive`

. To get an overview of available
result assigners, the following dictionary can be inspected:

```
as.data.table(mlr_result_assigners)
#> Key: <key>
#> key label man
#> <char> <char> <char>
#> 1: archive Archive mlr3mbo::mlr_result_assigners_archive
#> 2: surrogate Mean Surrogate Prediction mlr3mbo::mlr_result_assigners_surrogate
```

The dictionary shows the `key`

, i.e., ID of the result
assigner, a brief description, and how the documentation can be
accessed.

Construction of result assigners is straightforward:

`= ResultAssignerArchive$new() result_assigner `

Syntactic sugar:

`= ras("archive") result_assigner `

Note that important fields of an `OptimizerMbo`

such as
`$param_classes`

, `$packages`

,
`$properties`

are automatically determined based on the
choice of the `loop_function`

, `surrogate`

,
`acq_function`

, `acq_optimizer`

, and
`result_assigner`

.

If arguments such as the `surrogate`

,
`acq_function`

, `acq_optimizer`

and
`result_assigner`

were not fully initialized during
construction, e.g., the `surrogate`

missing the
`archive`

, or the `acq_function`

missing the
`surrogate`

, lazy initialization is completed prior to the
optimizer being used for optimization.

An object of class `OptimizerMbo`

can be used to optimize
an object of class `OptimInstanceBatchSingleCrit`

or
`OptimInstanceBatchMultiCrit`

.

For hyperparameter optimization, `TunerMbo`

should be used (which simply relies on an `OptimizerMbo`

that
is constructed internally):

```
= TunerMbo$new(bayesopt_ego,
tuner surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
::get_private(tuner)[[".optimizer"]]
mlr3misc#> <OptimizerMbo>: Model Based Optimization
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl
#> * Properties: single-crit
#> * Packages: mlr3mbo, mlr3, mlr3learners, DiceKriging, bbotk
#> * Loop function: bayesopt_ego
#> * Surrogate: LearnerRegrKM
#> * Acquisition Function: AcqFunctionEI
#> * Acquisition Function Optimizer: (OptimizerBatchRandomSearch |
#> TerminatorEvals)
```

mlr3mbo offers two different ways for specifying an initial design:

- One can simply evaluate points on the
`OptimInstance`

that is to be optimized prior to using an`OptimizerMbo`

. In this case, the`loop_function`

should skip the construction and evaluation of an initial design. - If no points were already evaluated on the
`OptimInstance`

, the`loop_function`

should construct an initial design itself and evaluate it, e.g.,`bayesopt_ego`

then constructs an initial design of size \(4D\) where \(D\) is the dimensionality of the search space by sampling points uniformly at random.

Functions for creating different initial designs are part of the paradox package, e.g.:

`generate_design_random`

: uniformly at random`generate_design_grid`

: uniform sized grid`generate_design_lhs`

: Latin hypercube sampling`generate_design_sobol`

: Sobol sequence

mlr3mbo tries to use
intelligent defaults for the `loop_function`

,
`surrogate`

, `acq_function`

, and
`acq_optimizer`

within `OptimizerMbo`

and
`TunerMbo`

.

For details, see `mbo_defaults`

.

mlr3mbo is quite stable in
the sense that - if desired - all kinds of errors can be caught and
handled appropriately within the `loop_function`

.

As an example, let’s have a look at the inner workings of `bayesopt_ego`

:

```
repeat {
= tryCatch({
xdt
.
.
.$surrogate$update()
acq_function$update()
acq_function$optimize()
acq_optimizermbo_error = function(mbo_error_condition) {
}, $info(paste0(class(mbo_error_condition), collapse = " / "))
lg$info("Proposing a randomly sampled point")
lg$new(domain)$sample(1L)$data
SamplerUnif
})
.
.
. }
```

In each iteration, a new candidate is chosen based on updating the
surrogate and acquisition function, and optimizing the acquisition
function. If any error happens during any of these steps, errors are
upgraded to errors of class `"mbo_error"`

(and
`"surrogate_update_error"`

for surrogate related errors as
well as `"acq_optimizer_error"`

for acquisition function
optimization related errors). These errors are then caught and a
fallback is triggered: Evaluating the next candidate chosen uniformly at
random. Note that the same mechanism is actually also used to handle
random interleaving.

To illustrate this error handling mechanism, consider the following scenario: We try to minimize \(f: [-1, 1] \rightarrow \mathbb{R}, x \mapsto x^2\), however, our Gaussian Process fails due to data points being too close to each other:

```
set.seed(2906)
= ps(x = p_dbl(lower = -1, upper = 1))
domain
= ps(y = p_dbl(tags = "minimize"))
codomain
= function(xs) {
objective_function list(y = xs$x ^ 2)
}
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain)
= OptimInstanceBatchSingleCrit$new(
instance objective = objective,
terminator = trm("evals", n_evals = 10))
= data.table(x = rep(1, 4))
initial_design $eval_batch(initial_design)
instance#> INFO [09:59:00.374] [bbotk] Evaluating 4 configuration(s)
#> INFO [09:59:00.453] [bbotk] Result of batch 1:
#> INFO [09:59:00.460] [bbotk] x y
#> INFO [09:59:00.460] [bbotk] 1 1
#> INFO [09:59:00.460] [bbotk] 1 1
#> INFO [09:59:00.460] [bbotk] 1 1
#> INFO [09:59:00.460] [bbotk] 1 1
```

```
= srlrn(default_gp())
surrogate = acqf("ei")
acq_function = acqo(opt("random_search", batch_size = 1000),
acq_optimizer terminator = trm("evals", n_evals = 1000))
= opt("mbo",
optimizer loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
$optimize(instance)
optimizer#> INFO [09:59:00.709] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerMbo>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> WARN [09:59:00.905] [bbotk] missing value where TRUE/FALSE needed
#> INFO [09:59:00.907] [bbotk] surrogate_update_error / mbo_error / error / condition
#> INFO [09:59:00.909] [bbotk] Proposing a randomly sampled point
#> INFO [09:59:00.952] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:00.989] [bbotk] Result of batch 2:
#> INFO [09:59:00.992] [bbotk] x y
#> INFO [09:59:00.992] [bbotk] 0.6742299 0.4545859
#> INFO [09:59:01.390] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:01.425] [bbotk] Result of batch 3:
#> INFO [09:59:01.428] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:01.428] [bbotk] 0.6686506 <list[1]> 0.04358139 FALSE 0.4470936
#> INFO [09:59:01.689] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:01.705] [bbotk] Result of batch 4:
#> INFO [09:59:01.707] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:01.707] [bbotk] 0.3883431 <list[1]> 0.1264431 FALSE 0.1508104
#> INFO [09:59:01.878] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:01.892] [bbotk] Result of batch 5:
#> INFO [09:59:01.894] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:01.894] [bbotk] -0.5325387 <list[1]> 0.2231668 FALSE 0.2835975
#> INFO [09:59:02.051] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:02.065] [bbotk] Result of batch 6:
#> INFO [09:59:02.066] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:02.066] [bbotk] 0.04818857 <list[1]> 0.1256545 FALSE 0.002322138
#> INFO [09:59:02.217] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:02.230] [bbotk] Result of batch 7:
#> INFO [09:59:02.232] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:02.232] [bbotk] -0.01951836 <list[1]> 0.002040261 FALSE 0.0003809664
#> INFO [09:59:02.290] [bbotk] Finished optimizing after 10 evaluation(s)
#> INFO [09:59:02.290] [bbotk] Result:
#> INFO [09:59:02.292] [bbotk] x x_domain y
#> INFO [09:59:02.292] [bbotk] <num> <list> <num>
#> INFO [09:59:02.292] [bbotk] -0.01951836 <list[1]> 0.0003809664
#> x x_domain y
#> <num> <list> <num>
#> 1: -0.01951836 <list[1]> 0.0003809664
```

The log tells us that an error happened and was caught:
`"surrogate_update_error / mbo_error / error / condition"`

.
We also see in the `archive`

that the first candidate after
the initial design (the fifth point) was not proposed based on
optimizing the acquisition function (because the `"acq_ei"`

column is NA here):

```
$archive$data
instance#> x y x_domain timestamp batch_nr acq_ei
#> <num> <num> <list> <POSc> <int> <num>
#> 1: 1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00 1 NA
#> 2: 1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00 1 NA
#> 3: 1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00 1 NA
#> 4: 1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00 1 NA
#> 5: 0.67422986 0.4545859064 <list[1]> 2024-07-06 09:59:00 2 NA
#> 6: 0.66865060 0.4470936288 <list[1]> 2024-07-06 09:59:01 3 0.043581388
#> 7: 0.38834314 0.1508103944 <list[1]> 2024-07-06 09:59:01 4 0.126443136
#> 8: -0.53253872 0.2835974922 <list[1]> 2024-07-06 09:59:01 5 0.223166767
#> 9: 0.04818857 0.0023221379 <list[1]> 2024-07-06 09:59:02 6 0.125654544
#> 10: -0.01951836 0.0003809664 <list[1]> 2024-07-06 09:59:02 7 0.002040261
#> .already_evaluated
#> <lgcl>
#> 1: NA
#> 2: NA
#> 3: NA
#> 4: NA
#> 5: NA
#> 6: FALSE
#> 7: FALSE
#> 8: FALSE
#> 9: FALSE
#> 10: FALSE
```

Nevertheless, due to the safety net, the BO loop eventually worked just fine and did not simply throw an error.

If we set `catch_errors = FALSE`

within the surrogate, we
see that the error was indeed caused by the surrogate:

```
$archive$clear()
instance$eval_batch(initial_design)
instance#> INFO [09:59:02.325] [bbotk] Evaluating 4 configuration(s)
#> INFO [09:59:02.363] [bbotk] Result of batch 1:
#> INFO [09:59:02.365] [bbotk] x y
#> INFO [09:59:02.365] [bbotk] 1 1
#> INFO [09:59:02.365] [bbotk] 1 1
#> INFO [09:59:02.365] [bbotk] 1 1
#> INFO [09:59:02.365] [bbotk] 1 1
```

```
$surrogate$param_set$values$catch_errors = FALSE
optimizer$optimize(instance)
optimizer#> INFO [09:59:02.375] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerMbo>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> Error in if (varinit.vario <= 1e-20) varinit.vario <- 1/2 * varinit.vario.total: missing value where TRUE/FALSE needed
```

In case of the error belonging to the
`acq_optimizer_error`

class, it is helpful to increase the
logging level of the acquisition function optimizer (e.g.,
`acq_optimizer$param_set$values$logging_level = "info"`

) and
also set
`acq_optimizer$param_set$values$catch_errors = FALSE`

. This
allows for straightforward debugging.

To make sure that your BO loop behaved as expected, always inspect
the log of the optimization process and inspect the `archive`

and check whether the acquisition function column is populated as
expected.

Writing a custom `loop_function`

is straightforward.

Any `loop_function`

must be an object of the S3 class `loop_function`

(simply a standard R function with some requirements regarding its
arguments and attributes). Arguments of the function must include
`instance`

, `surrogate`

,
`acq_function`

, and `acq_optimizer`

and attributes
must include `id`

(id of the loop function),
`label`

(brief description), `instance`

(“single-crit” and or “multi_crit”), and `man`

(link to the
manual page).

Technically, any `loop_function`

therefore looks like the
following:

```
= function(instance, surrogate, acq_function, acq_optimizer) {
bayesopt_custom # typically some assertions
# initial design handling
# actual loop function
}
class(bayesopt_custom) = "loop_function"
attr(bayesopt_custom, "id") = "bayesopt_custom"
attr(bayesopt_custom, "label") = "My custom BO loop"
attr(bayesopt_custom, "instance") = "single-crit"
attr(bayesopt_custom, "man") = "" # no man page
# if you want to add it to the dictionary: mlr_loop_functions$add("bayesopt_custom", bayesopt_custom)
```

```
bayesopt_custom#> Loop function: bayesopt_custom
#> * Description: My custom BO loop
#> * Supported Instance: single-crit
```

For some inspiration on how to write actual meaningful
`loop_function`

s, see, e.g., `bayesopt_ego`

.

In this final section, some standard examples are provided.

```
= function(xs) {
objective_function list(y = 418.9829 * 2 - (sum(unlist(xs) * sin(sqrt(abs(unlist(xs)))))))
}= ps(x1 = p_dbl(lower = -500, upper = 500),
domain x2 = p_dbl(lower = -500, upper = 500))
= ps(y = p_dbl(tags = "minimize"))
codomain
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain)
= OptimInstanceBatchSingleCrit$new(
instance objective = objective,
search_space = domain,
terminator = trm("evals", n_evals = 60))
# Gaussian Process, EI, DIRECT
= srlrn(default_gp())
surrogate = acqf("ei")
acq_function = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
acq_optimizer terminator = trm("stagnation", threshold = 1e-8))
= opt("mbo",
optimizer loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance) optimizer
```

```
library(ggplot2)
ggplot(aes(x = batch_nr, y = cummin(y)), data = instance$archive$data) +
geom_point() +
geom_step() +
labs(x = "Batch Nr.", y = "Best y") +
theme_minimal()
```

```
= generate_design_grid(instance$search_space, resolution = 101)$data
xdt = objective$eval_dt(xdt)
ydt ggplot(aes(x = x1, y = x2, z = y), data = cbind(xdt, ydt)) +
geom_contour_filled() +
geom_point(aes(color = batch_nr), size = 2, data = instance$archive$data) +
scale_color_gradient(low = "lightgrey", high = "red") +
theme_minimal()
```

```
= function(xs) {
objective_function list(y1 = xs$x^2, y2 = (xs$x - 2)^2)
}= ps(x = p_dbl(lower = -10, upper = 10))
domain = ps(y1 = p_dbl(tags = "minimize"), y2 = p_dbl(tags = "minimize"))
codomain
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain)
= OptimInstanceBatchMultiCrit$new(
instance objective = objective,
search_space = domain,
terminator = trm("evals", n_evals = 30))
# Gaussian Process, EI, DIRECT
= srlrn(default_gp())
surrogate = acqf("ei")
acq_function = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
acq_optimizer terminator = trm("stagnation", threshold = 1e-8))
= opt("mbo",
optimizer loop_function = bayesopt_parego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance) optimizer
```

```
ggplot(aes(x = y1, y = y2), data = instance$archive$best()) +
geom_point() +
theme_minimal()
```

```
library(emoa)
library(mlr3misc)
library(data.table)
= map_dtr(unique(instance$archive$data$batch_nr), function(bnr) {
anytime_hypervolume = instance$archive$best(batch = 1:bnr)[, instance$archive$cols_y, with = FALSE]
pareto = dominated_hypervolume(t(pareto), ref = t(t(c(100, 144))))
dhv data.table(batch_nr = bnr, dhv = dhv)
})
ggplot(aes(x = batch_nr, y = dhv), data = anytime_hypervolume[batch_nr > 1]) +
geom_point() +
geom_step(direction = "vh") +
labs(x = "Batch Nr.", y = "Dominated Hypervolume") +
theme_minimal()
```

```
= function(xs) {
objective_function list(y1 = xs$x^2, y2 = (xs$x - 2)^2)
}= ps(x = p_dbl(lower = -10, upper = 10))
domain = ps(y1 = p_dbl(tags = "minimize"), y2 = p_dbl(tags = "minimize"))
codomain
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain)
= OptimInstanceBatchMultiCrit$new(
instance objective = objective,
search_space = domain,
terminator = trm("evals", n_evals = 30))
# Gaussian Processes, SMS-EGO, DIRECT
= default_gp()
learner_y1 = learner_y1$clone(deep = TRUE)
learner_y2 = srlrn(list(learner_y1, learner_y2))
surrogate = acqf("smsego")
acq_function = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
acq_optimizer terminator = trm("stagnation", threshold = 1e-8))
= opt("mbo",
optimizer loop_function = bayesopt_smsego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance) optimizer
```

```
ggplot(aes(x = y1, y = y2), data = instance$archive$best()) +
geom_point() +
theme_minimal()
```

```
= map_dtr(unique(instance$archive$data$batch_nr), function(bnr) {
anytime_hypervolume = instance$archive$best(batch = 1:bnr)[, instance$archive$cols_y, with = FALSE]
pareto = dominated_hypervolume(t(pareto), ref = t(t(c(100, 144))))
dhv data.table(batch_nr = bnr, dhv = dhv)
})
ggplot(aes(x = batch_nr, y = dhv), data = anytime_hypervolume[batch_nr > 1]) +
geom_point() +
geom_step(direction = "vh") +
labs(x = "Batch Nr.", y = "Dominated Hypervolume") +
theme_minimal()
```

```
library(mlr3)
= tsk("wine")
task = lrn("classif.rpart",
learner cp = to_tune(lower = 1e-4, upper = 1, logscale = TRUE),
maxdepth = to_tune(lower = 1, upper = 10),
minbucket = to_tune(lower = 1, upper = 10),
minsplit = to_tune(lower = 1, upper = 10))
= rsmp("cv", folds = 3)
resampling = msr("classif.acc")
measure
= TuningInstanceBatchSingleCrit$new(
instance task = task,
learner = learner,
resampling = resampling,
measure = measure,
terminator = trm("evals", n_evals = 30))
# Gaussian Process, EI, FocusSearch
= srlrn(default_gp(noisy = TRUE))
surrogate = acqf("ei")
acq_function = acqo(opt("focus_search", n_points = 100L, maxit = 9),
acq_optimizer terminator = trm("evals", n_evals = 3000))
= tnr("mbo",
tuner loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance)
tuner$result instance
```

```
= tsk("wine")
task = lrn("classif.rpart",
learner cp = to_tune(lower = 1e-4, upper = 1, logscale = TRUE),
maxdepth = to_tune(lower = 1, upper = 10),
minbucket = to_tune(lower = 1, upper = 10),
minsplit = to_tune(lower = 1, upper = 10))
= rsmp("cv", folds = 3)
resampling = msrs(c("classif.acc", "selected_features"))
measures
= TuningInstanceBatchMultiCrit$new(
instance task = task,
learner = learner,
resampling = resampling,
measures = measures,
terminator = trm("evals", n_evals = 30),
store_models = TRUE) # required due to selected features
# Gaussian Process, EI, FocusSearch
= srlrn(default_gp(noisy = TRUE))
surrogate = acqf("ei")
acq_function = acqo(opt("focus_search", n_points = 100L, maxit = 9),
acq_optimizer terminator = trm("evals", n_evals = 3000))
= tnr("mbo",
tuner loop_function = bayesopt_parego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance)
tuner$result instance
```

Bischl, Bernd, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter,
Stefan Coors, Janek Thomas, et al. 2023. “Hyperparameter
Optimization: Foundations, Algorithms, Best Practices, and Open
Challenges.” *Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery*, e1484.

Bischl, Bernd, Raphael Sonabend, Lars Kotthoff, and Michel Lang, eds.
2023. *Flexible and Robust Machine Learning
Using mlr3 in R*. https://mlr3book.mlr-org.com.

Garnett, Roman. 2022. *Bayesian Optimization*.
Cambridge University Press. https://bayesoptbook.com/.

Jones, Donald R., Matthias Schonlau, and William J. Welch. 1998.
“Efficient Global Optimization of Expensive Black-Box
Functions.” *Journal of Global Optimization* 13 (4):
455–92. https://doi.org/10.1023/A:1008306431147.