mlr3mbo makes Bayesian Optimization (BO) available within the mlr3 ecosystem. BO can be used for optimizing any black box function, and is very suitable for hyperparameter optimization of machine learning models. mlr3mbo allows for building custom BO algorithms relying on building blocks in a modular fashion, but also provides a variety of standard single- and multi-objective BO algorithms that can be used in a straightforward manner.
We assume that the reader is somewhat familiar with black box optimization and bbotk, hyperparameter optimization and mlr3tuning and knows the basics of BO. Background material is, for example, given by Garnett (2022), Bischl, Binder, et al. (2023), and Bischl, Sonabend, et al. (2023).
BO is an iterative optimization algorithm that makes use of a so-called surrogate to model the unknown black box function. After having observed an initial design of observations, the surrogate model is trained on all data points observed so far and an acquisition function is used to determine which points of the search space are promising candidates that should be evaluated next. The acquisition function relies on the prediction of the surrogate model and requires no evaluation of the true black box function and therefore is comparably cheap to optimize. After having evaluated the next candidate, the process repeats itself until a given termination criteria is met.
Most BO flavors therefore follow a simple loop:
In the following, the basic building blocks of BO and their implementation in mlr3mbo are introduced in more detail.
The loop_function
determines the behavior of the BO algorithm on a global level, i.e., how
the subroutine should look like that is performed at each iteration.
To get an overview of readily available loop_function
s,
the following dictionary can be inspected:
library(mlr3mbo)
library(data.table)
as.data.table(mlr_loop_functions)
#> Key: <key>
#> key label instance
#> <char> <char> <char>
#> 1: bayesopt_ego Efficient Global Optimization single-crit
#> 2: bayesopt_emo Multi-Objective EGO multi-crit
#> 3: bayesopt_mpcl Multipoint Constant Liar single-crit
#> 4: bayesopt_parego ParEGO multi-crit
#> 5: bayesopt_smsego SMS-EGO multi-crit
#> man
#> <char>
#> 1: mlr3mbo::mlr_loop_functions_ego
#> 2: mlr3mbo::mlr_loop_functions_emo
#> 3: mlr3mbo::mlr_loop_functions_mpcl
#> 4: mlr3mbo::mlr_loop_functions_parego
#> 5: mlr3mbo::mlr_loop_functions_smsego
The dictionary shows the key
, i.e., ID of the
loop_function
, a brief description, for which optimization
instance the resulting BO flavor can be used, as well as how
documentation can be accessed.
Technically, all loop_function
s are members of the
S3
class loop_function
,
and are simply decorated functions
(because using an R6Class
class would be over the top here - but this may change in the
future).
To write an own loop_function
, users can get inspiration
from the readily available ones, e.g., bayesopt_ego
which performs sequential single-objective optimization:
After having made some assertions and safety checks, and having
evaluated the initial design, bayesopt_ego
essentially only
performs the following steps:
acq_function$surrogate$update()
# update the surrogate
modelacq_function$update()
# update the acquisition function
(e.g., update the best value observed so far)acq_optimizer$optimize()
# optimize the acquisition
function to yield a new candidateA surrogate encapsulates a regression learner that models the unknown
black box function based on observed data. In mlr3mbo, SurrogateLearner
and SurrogateLearnerCollection
are the higher-level R6
classes which should be used to
construct a surrogate, inheriting from the base Surrogate
class.
As a learner, any LearnerRegr
from mlr3 can be used, however,
most acquisition functions require both a mean and a variance prediction
(therefore not all learners are suitable for all scenarios). Typical
choices include:
Gaussian Process
for low dimensional numeric search spacesRandom Forest
for higher dimensional mixed (and / or hierarchical) search spacesA SurrogateLearner
can be constructed via:
library(mlr3learners)
#> Loading required package: mlr3
= SurrogateLearner$new(lrn("regr.km")) surrogate
or using syntactic sugar:
= srlrn(lrn("regr.km")) surrogate
Note that default_gp
and default_rf
allow for the construction of a Gaussian Process and random forest with
sensible hyperparamter values already specified.
The encapsulated learner can be accessed via the
$learner
field:
$learner
surrogate#> <LearnerRegrKM:regr.km>: Kriging
#> * Model: -
#> * Parameters: list()
#> * Packages: mlr3, mlr3learners, DiceKriging
#> * Predict Types: response, [se]
#> * Feature Types: logical, integer, numeric
#> * Properties: -
The surrogate itself has the following hyperparameters:
$param_set
surrogate#> <ParamSet(4)>
#> Key: <id>
#> id class lower upper nlevels default
#> <char> <char> <num> <num> <num> <list>
#> 1: assert_insample_perf ParamLgl NA NA 2 <NoDefault[0]>
#> 2: catch_errors ParamLgl NA NA 2 <NoDefault[0]>
#> 3: perf_measure ParamUty NA NA Inf <NoDefault[0]>
#> 4: perf_threshold ParamDbl -Inf Inf Inf <NoDefault[0]>
#> parents value
#> <list> <list>
#> 1: FALSE
#> 2: TRUE
#> 3: assert_insample_perf
#> 4: assert_insample_perf
assert_insample_perf = TRUE
results in the insample
performance of the learner being calculated and asserted against a
performance threshold after each $update()
. This requires
the specification of a perf_measure
(any regression
measure, e.g., R squared
)
and a perf_threshold
. If the threshold is not met, an error
is thrown (that is caught within the optimization loop - unless
catch_errors = FALSE
and results in, e.g., proposing the
next candidate uniformly at random). For more details on error handling,
see the Safety Nets section.
Note that this insample performance assertion is not always meaningful, e.g., in the case of using a Gaussian Process with no nugget, the insample performance will always be perfect.
Internally, the learner is fitted on a regression task
constructed from the Archive
of the OptimInstance
that is to be optimized and features and the target variable are
determined automatically but can also be specified via the
$cols_x
and $cols_y
active bindings. Ideally,
the archive
is already passed during construction, however,
lazy initialization is also possible (i.e., the $archive
field will be automatically populated within the optimization routine of
an OptimizerMbo
).
Important methods are $update()
and
$predict()
with the former one typically being used within
the loop_function
and the latter one being used within the
implementation of an acquisition function.
Depending on the choice of the loop_function
, multiple
targets must be modelled by (currently independent) surrogates, in which
case a SurrogateLearnerCollection
should be used.
Construction and hyperparameters are analogous to the single target
scenario described above.
To get an overview of all available regression learners within the mlr3 ecosystem, use:
library(mlr3)
library(mlr3learners)
# there are plenty of more in mlr3extralearners
# library(mlr3extralearners)
= as.data.table(mlr_learners)
learners == "regr"] learners[task_type
To use a custom learner not included in mlr3, mlr3learners, or mlr3extralearners, you
can inherit from LearnerRegr
and use this custom learner within the surrogate.
Based on a surrogate, an acquisition function quantifies the attractiveness of each point of the search space if it were to be evaluated in the next iteration.
A popular example is given by the Expected Improvement (Jones, Schonlau, and Welch 1998):
\[ \mathbb{E} \left[ \max \left( f_{\mathrm{min}} - Y(\mathbf{x}), 0 \right) \right] \] Here, \(Y(\mathbf{x})\) is the surrogate prediction (a random variable) for a given point \(\mathbf{x}\) (which when using a Gaussian Process follows a normal distribution) and \(f_{\mathrm{min}}\) is the currently best function value observed so far (when assuming minimization).
To get an overview of available acquisition functions, the following dictionary can be inspected:
as.data.table(mlr_acqfunctions)
#> Key: <key>
#> key label
#> <char> <char>
#> 1: aei Augmented Expected Improvement
#> 2: cb Lower / Upper Confidence Bound
#> 3: ehvi Expected Hypervolume Improvement
#> 4: ehvigh Expected Hypervolume Improvement via GH Quadrature
#> 5: ei Expected Improvement
#> 6: eips Expected Improvement Per Second
#> 7: mean Posterior Mean
#> 8: pi Probability Of Improvement
#> 9: sd Posterior Standard Deviation
#> 10: smsego SMS-EGO
#> man
#> <char>
#> 1: mlr3mbo::mlr_acqfunctions_aei
#> 2: mlr3mbo::mlr_acqfunctions_cb
#> 3: mlr3mbo::mlr_acqfunctions_ehvi
#> 4: mlr3mbo::mlr_acqfunctions_ehvigh
#> 5: mlr3mbo::mlr_acqfunctions_ei
#> 6: mlr3mbo::mlr_acqfunctions_eips
#> 7: mlr3mbo::mlr_acqfunctions_mean
#> 8: mlr3mbo::mlr_acqfunctions_pi
#> 9: mlr3mbo::mlr_acqfunctions_sd
#> 10: mlr3mbo::mlr_acqfunctions_smsego
The dictionary shows the key
, i.e., ID of the
acquisition function, a brief description, and how the documentation can
be accessed.
Technically, all acquisition functions inherit from the
R6
class AcqFunction
which itself simply inherits from the base Objective
class.
Construction is straightforward via:
= AcqFunctionEI$new() acq_function
or using syntactic sugar:
= acqf("ei") acq_function
Internally, the domain
and codomain
are
constructed based on the Archive
referenced by the Surrogate
and therefore the surrogate
should be passed as an argument
already during construction.
However, lazy initialization is also possible.
In the case of the acquisition function itself being parameterized, hyperparameters should be passed as constants, e.g.:
acqf("cb") # lower / upper confidence bound with lambda hyperparameter
#> <AcqFunctionCB:acq_cb>
#> Domain:
#> <ParamSet(0)>
#> Empty.
#> Codomain:
#> <Codomain(0)>
#> Empty.
#> Constants:
#> <ParamSet(1)>
#> id class lower upper nlevels default value
#> <char> <char> <num> <num> <num> <list> <list>
#> 1: lambda ParamDbl 0 Inf Inf 2 2
To use a custom acquisition function you should implement a new
R6
class inheriting from AcqFunction
.
To find the most promising candidate for evaluation, the acquisition
function itself must be optimized. Internally, an OptimInstance
is constructed using the acquisition function as an Objective
.
An acquisition function optimizer is then used to solve this
optimization problem. Technically, this optimizer is a member of the AcqOptimizer
R6
class.
Construction requires specifying an Optimizer
as well as a Terminator
:
library(bbotk)
#> Loading required package: paradox
= AcqOptimizer$new(opt("random_search"), terminator = trm("evals")) acq_optimizer
Syntactic sugar:
= acqo(opt("random_search"), terminator = trm("evals")) acq_optimizer
The optimizer and terminator can be accessed via the
$optimizer
and $terminator
fields:
$optimizer
acq_optimizer#> <OptimizerBatchRandomSearch>: Random Search
#> * Parameters: batch_size=1
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl, ParamFct
#> * Properties: dependencies, single-crit, multi-crit
#> * Packages: bbotk
$terminator
acq_optimizer#> <TerminatorEvals>: Number of Evaluation
#> * Parameters: n_evals=100, k=0
Internally, the acquisition function optimizer also requires the
acquisition function and therefore the acq_function
argument should be specified during construction.
However, lazy initialization is also possible.
An AcqOptimizer
has the following hyperparameters:
$param_set
acq_optimizer#> <ParamSet(6)>
#> Key: <id>
#> id class lower upper nlevels default parents
#> <char> <char> <num> <num> <num> <list> <list>
#> 1: catch_errors ParamLgl NA NA 2 TRUE
#> 2: logging_level ParamFct NA NA 6 warn
#> 3: n_candidates ParamInt 1 Inf Inf 1
#> 4: skip_already_evaluated ParamLgl NA NA 2 TRUE
#> 5: warmstart ParamLgl NA NA 2 FALSE
#> 6: warmstart_size ParamInt 1 Inf Inf <NoDefault[0]> warmstart
#> value
#> <list>
#> 1: TRUE
#> 2: warn
#> 3: 1
#> 4: TRUE
#> 5: FALSE
#> 6:
catch_errors = TRUE
results in catching any errors that
can happen during the acquisition function optimization which allows
for, e.g., proposing the next candidate uniformly at random within the
loop_function
. For more details on this mechanism, see the
Safety Nets section.
logging_level
specifies the logging level during
acquisition function optimization. Often it is useful to only log the
progress of the BO loop and therefore logging_level
is set
to "warn"
by default. For debugging purposes, this should
be set to "info"
.
skip_already_evaluated = TRUE
will result in not proposing
candidates for evaluation that were already evaluated in previous
iterations. warmstart = TRUE
results in the best
warmstart_size
points present in the archive
of the OptimInstance
to also be evaluated on the
acquisition function OptimInstance
prior to running the
actual acquisition function optimization. This is especially useful in
the context of using evolutionary algorithms or variants of local search
as the acquisition function optimizer (as the current best points should
usually be part of the initial population to further optimize local
optima).
To get an overview of all Optimizer
s implemented in bbotk you can use:
as.data.table(mlr_optimizers)
And similarly for Terminator
s:
as.data.table(mlr_terminators)
Again, you can also use custom Optimizer
s or
Terminator
s by implementing new R6
classes
inheriting from Optimizer
and Terminator
respectively.
Having introduced all building blocks we are now ready to put
everything together in the form of an OptimizerMbo
or TunerMbo
.
OptimizerMbo
inherits from Optimizer
and requires a loop_function
, surrogate
,
acq_function
and acq_optimizer
.
Construction is performed via:
= OptimizerMbo$new(bayesopt_ego,
optimizer surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
or using syntactic sugar:
= opt("mbo",
optimizer loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
optimizer#> <OptimizerMbo>: Model Based Optimization
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl
#> * Properties: single-crit
#> * Packages: mlr3mbo, mlr3, mlr3learners, DiceKriging, bbotk
#> * Loop function: bayesopt_ego
#> * Surrogate: LearnerRegrKM
#> * Acquisition Function: AcqFunctionEI
#> * Acquisition Function Optimizer: (OptimizerBatchRandomSearch |
#> TerminatorEvals)
Additional arguments, i.e., arguments of the
loop_function
can be passed via the args
argument during construction or the $args
active binding.
Finally, the mechanism how the final result is obtained after the
optimization process (i.e., the best point in the case of
single-objective and the Pareto set in the case of multi-objective
optimization) can be changed via the result_assigner
argument during construction or the $result_assigner
active
binding. As an example, ResultAssignerSurrogate
will choose the final solution based on the prediction of the surrogate
instead of the evaluations logged in the archive
which is
sensible in the case of noisy objective functions. The default, however,
is to use ResultAssignerArchive
which will directly choose the final solution based on the evaluations
logged in the archive
. To get an overview of available
result assigners, the following dictionary can be inspected:
as.data.table(mlr_result_assigners)
#> Key: <key>
#> key label man
#> <char> <char> <char>
#> 1: archive Archive mlr3mbo::mlr_result_assigners_archive
#> 2: surrogate Mean Surrogate Prediction mlr3mbo::mlr_result_assigners_surrogate
The dictionary shows the key
, i.e., ID of the result
assigner, a brief description, and how the documentation can be
accessed.
Construction of result assigners is straightforward:
= ResultAssignerArchive$new() result_assigner
Syntactic sugar:
= ras("archive") result_assigner
Note that important fields of an OptimizerMbo
such as
$param_classes
, $packages
,
$properties
are automatically determined based on the
choice of the loop_function
, surrogate
,
acq_function
, acq_optimizer
, and
result_assigner
.
If arguments such as the surrogate
,
acq_function
, acq_optimizer
and
result_assigner
were not fully initialized during
construction, e.g., the surrogate
missing the
archive
, or the acq_function
missing the
surrogate
, lazy initialization is completed prior to the
optimizer being used for optimization.
An object of class OptimizerMbo
can be used to optimize
an object of class OptimInstanceBatchSingleCrit
or
OptimInstanceBatchMultiCrit
.
For hyperparameter optimization, TunerMbo
should be used (which simply relies on an OptimizerMbo
that
is constructed internally):
= TunerMbo$new(bayesopt_ego,
tuner surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
::get_private(tuner)[[".optimizer"]]
mlr3misc#> <OptimizerMbo>: Model Based Optimization
#> * Parameter classes: ParamLgl, ParamInt, ParamDbl
#> * Properties: single-crit
#> * Packages: mlr3mbo, mlr3, mlr3learners, DiceKriging, bbotk
#> * Loop function: bayesopt_ego
#> * Surrogate: LearnerRegrKM
#> * Acquisition Function: AcqFunctionEI
#> * Acquisition Function Optimizer: (OptimizerBatchRandomSearch |
#> TerminatorEvals)
mlr3mbo offers two different ways for specifying an initial design:
OptimInstance
that is to be optimized prior to using an OptimizerMbo
. In
this case, the loop_function
should skip the construction
and evaluation of an initial design.OptimInstance
, the loop_function
should
construct an initial design itself and evaluate it, e.g.,
bayesopt_ego
then constructs an initial design of size
\(4D\) where \(D\) is the dimensionality of the search
space by sampling points uniformly at random.Functions for creating different initial designs are part of the paradox package, e.g.:
generate_design_random
:
uniformly at randomgenerate_design_grid
:
uniform sized gridgenerate_design_lhs
:
Latin hypercube samplinggenerate_design_sobol
:
Sobol sequencemlr3mbo tries to use
intelligent defaults for the loop_function
,
surrogate
, acq_function
, and
acq_optimizer
within OptimizerMbo
and
TunerMbo
.
For details, see mbo_defaults
.
mlr3mbo is quite stable in
the sense that - if desired - all kinds of errors can be caught and
handled appropriately within the loop_function
.
As an example, let’s have a look at the inner workings of bayesopt_ego
:
repeat {
= tryCatch({
xdt
.
.
.$surrogate$update()
acq_function$update()
acq_function$optimize()
acq_optimizermbo_error = function(mbo_error_condition) {
}, $info(paste0(class(mbo_error_condition), collapse = " / "))
lg$info("Proposing a randomly sampled point")
lg$new(domain)$sample(1L)$data
SamplerUnif
})
.
.
. }
In each iteration, a new candidate is chosen based on updating the
surrogate and acquisition function, and optimizing the acquisition
function. If any error happens during any of these steps, errors are
upgraded to errors of class "mbo_error"
(and
"surrogate_update_error"
for surrogate related errors as
well as "acq_optimizer_error"
for acquisition function
optimization related errors). These errors are then caught and a
fallback is triggered: Evaluating the next candidate chosen uniformly at
random. Note that the same mechanism is actually also used to handle
random interleaving.
To illustrate this error handling mechanism, consider the following scenario: We try to minimize \(f: [-1, 1] \rightarrow \mathbb{R}, x \mapsto x^2\), however, our Gaussian Process fails due to data points being too close to each other:
set.seed(2906)
= ps(x = p_dbl(lower = -1, upper = 1))
domain
= ps(y = p_dbl(tags = "minimize"))
codomain
= function(xs) {
objective_function list(y = xs$x ^ 2)
}
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain)
= OptimInstanceBatchSingleCrit$new(
instance objective = objective,
terminator = trm("evals", n_evals = 10))
= data.table(x = rep(1, 4))
initial_design $eval_batch(initial_design)
instance#> INFO [09:59:00.374] [bbotk] Evaluating 4 configuration(s)
#> INFO [09:59:00.453] [bbotk] Result of batch 1:
#> INFO [09:59:00.460] [bbotk] x y
#> INFO [09:59:00.460] [bbotk] 1 1
#> INFO [09:59:00.460] [bbotk] 1 1
#> INFO [09:59:00.460] [bbotk] 1 1
#> INFO [09:59:00.460] [bbotk] 1 1
= srlrn(default_gp())
surrogate = acqf("ei")
acq_function = acqo(opt("random_search", batch_size = 1000),
acq_optimizer terminator = trm("evals", n_evals = 1000))
= opt("mbo",
optimizer loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
$optimize(instance)
optimizer#> INFO [09:59:00.709] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerMbo>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> WARN [09:59:00.905] [bbotk] missing value where TRUE/FALSE needed
#> INFO [09:59:00.907] [bbotk] surrogate_update_error / mbo_error / error / condition
#> INFO [09:59:00.909] [bbotk] Proposing a randomly sampled point
#> INFO [09:59:00.952] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:00.989] [bbotk] Result of batch 2:
#> INFO [09:59:00.992] [bbotk] x y
#> INFO [09:59:00.992] [bbotk] 0.6742299 0.4545859
#> INFO [09:59:01.390] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:01.425] [bbotk] Result of batch 3:
#> INFO [09:59:01.428] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:01.428] [bbotk] 0.6686506 <list[1]> 0.04358139 FALSE 0.4470936
#> INFO [09:59:01.689] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:01.705] [bbotk] Result of batch 4:
#> INFO [09:59:01.707] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:01.707] [bbotk] 0.3883431 <list[1]> 0.1264431 FALSE 0.1508104
#> INFO [09:59:01.878] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:01.892] [bbotk] Result of batch 5:
#> INFO [09:59:01.894] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:01.894] [bbotk] -0.5325387 <list[1]> 0.2231668 FALSE 0.2835975
#> INFO [09:59:02.051] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:02.065] [bbotk] Result of batch 6:
#> INFO [09:59:02.066] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:02.066] [bbotk] 0.04818857 <list[1]> 0.1256545 FALSE 0.002322138
#> INFO [09:59:02.217] [bbotk] Evaluating 1 configuration(s)
#> INFO [09:59:02.230] [bbotk] Result of batch 7:
#> INFO [09:59:02.232] [bbotk] x x_domain acq_ei .already_evaluated y
#> INFO [09:59:02.232] [bbotk] -0.01951836 <list[1]> 0.002040261 FALSE 0.0003809664
#> INFO [09:59:02.290] [bbotk] Finished optimizing after 10 evaluation(s)
#> INFO [09:59:02.290] [bbotk] Result:
#> INFO [09:59:02.292] [bbotk] x x_domain y
#> INFO [09:59:02.292] [bbotk] <num> <list> <num>
#> INFO [09:59:02.292] [bbotk] -0.01951836 <list[1]> 0.0003809664
#> x x_domain y
#> <num> <list> <num>
#> 1: -0.01951836 <list[1]> 0.0003809664
The log tells us that an error happened and was caught:
"surrogate_update_error / mbo_error / error / condition"
.
We also see in the archive
that the first candidate after
the initial design (the fifth point) was not proposed based on
optimizing the acquisition function (because the "acq_ei"
column is NA here):
$archive$data
instance#> x y x_domain timestamp batch_nr acq_ei
#> <num> <num> <list> <POSc> <int> <num>
#> 1: 1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00 1 NA
#> 2: 1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00 1 NA
#> 3: 1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00 1 NA
#> 4: 1.00000000 1.0000000000 <list[1]> 2024-07-06 09:59:00 1 NA
#> 5: 0.67422986 0.4545859064 <list[1]> 2024-07-06 09:59:00 2 NA
#> 6: 0.66865060 0.4470936288 <list[1]> 2024-07-06 09:59:01 3 0.043581388
#> 7: 0.38834314 0.1508103944 <list[1]> 2024-07-06 09:59:01 4 0.126443136
#> 8: -0.53253872 0.2835974922 <list[1]> 2024-07-06 09:59:01 5 0.223166767
#> 9: 0.04818857 0.0023221379 <list[1]> 2024-07-06 09:59:02 6 0.125654544
#> 10: -0.01951836 0.0003809664 <list[1]> 2024-07-06 09:59:02 7 0.002040261
#> .already_evaluated
#> <lgcl>
#> 1: NA
#> 2: NA
#> 3: NA
#> 4: NA
#> 5: NA
#> 6: FALSE
#> 7: FALSE
#> 8: FALSE
#> 9: FALSE
#> 10: FALSE
Nevertheless, due to the safety net, the BO loop eventually worked just fine and did not simply throw an error.
If we set catch_errors = FALSE
within the surrogate, we
see that the error was indeed caused by the surrogate:
$archive$clear()
instance$eval_batch(initial_design)
instance#> INFO [09:59:02.325] [bbotk] Evaluating 4 configuration(s)
#> INFO [09:59:02.363] [bbotk] Result of batch 1:
#> INFO [09:59:02.365] [bbotk] x y
#> INFO [09:59:02.365] [bbotk] 1 1
#> INFO [09:59:02.365] [bbotk] 1 1
#> INFO [09:59:02.365] [bbotk] 1 1
#> INFO [09:59:02.365] [bbotk] 1 1
$surrogate$param_set$values$catch_errors = FALSE
optimizer$optimize(instance)
optimizer#> INFO [09:59:02.375] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerMbo>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> Error in if (varinit.vario <= 1e-20) varinit.vario <- 1/2 * varinit.vario.total: missing value where TRUE/FALSE needed
In case of the error belonging to the
acq_optimizer_error
class, it is helpful to increase the
logging level of the acquisition function optimizer (e.g.,
acq_optimizer$param_set$values$logging_level = "info"
) and
also set
acq_optimizer$param_set$values$catch_errors = FALSE
. This
allows for straightforward debugging.
To make sure that your BO loop behaved as expected, always inspect
the log of the optimization process and inspect the archive
and check whether the acquisition function column is populated as
expected.
Writing a custom loop_function
is straightforward.
Any loop_function
must be an object of the S3 class loop_function
(simply a standard R function with some requirements regarding its
arguments and attributes). Arguments of the function must include
instance
, surrogate
,
acq_function
, and acq_optimizer
and attributes
must include id
(id of the loop function),
label
(brief description), instance
(“single-crit” and or “multi_crit”), and man
(link to the
manual page).
Technically, any loop_function
therefore looks like the
following:
= function(instance, surrogate, acq_function, acq_optimizer) {
bayesopt_custom # typically some assertions
# initial design handling
# actual loop function
}
class(bayesopt_custom) = "loop_function"
attr(bayesopt_custom, "id") = "bayesopt_custom"
attr(bayesopt_custom, "label") = "My custom BO loop"
attr(bayesopt_custom, "instance") = "single-crit"
attr(bayesopt_custom, "man") = "" # no man page
# if you want to add it to the dictionary: mlr_loop_functions$add("bayesopt_custom", bayesopt_custom)
bayesopt_custom#> Loop function: bayesopt_custom
#> * Description: My custom BO loop
#> * Supported Instance: single-crit
For some inspiration on how to write actual meaningful
loop_function
s, see, e.g., bayesopt_ego
.
In this final section, some standard examples are provided.
= function(xs) {
objective_function list(y = 418.9829 * 2 - (sum(unlist(xs) * sin(sqrt(abs(unlist(xs)))))))
}= ps(x1 = p_dbl(lower = -500, upper = 500),
domain x2 = p_dbl(lower = -500, upper = 500))
= ps(y = p_dbl(tags = "minimize"))
codomain
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain)
= OptimInstanceBatchSingleCrit$new(
instance objective = objective,
search_space = domain,
terminator = trm("evals", n_evals = 60))
# Gaussian Process, EI, DIRECT
= srlrn(default_gp())
surrogate = acqf("ei")
acq_function = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
acq_optimizer terminator = trm("stagnation", threshold = 1e-8))
= opt("mbo",
optimizer loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance) optimizer
library(ggplot2)
ggplot(aes(x = batch_nr, y = cummin(y)), data = instance$archive$data) +
geom_point() +
geom_step() +
labs(x = "Batch Nr.", y = "Best y") +
theme_minimal()
= generate_design_grid(instance$search_space, resolution = 101)$data
xdt = objective$eval_dt(xdt)
ydt ggplot(aes(x = x1, y = x2, z = y), data = cbind(xdt, ydt)) +
geom_contour_filled() +
geom_point(aes(color = batch_nr), size = 2, data = instance$archive$data) +
scale_color_gradient(low = "lightgrey", high = "red") +
theme_minimal()
= function(xs) {
objective_function list(y1 = xs$x^2, y2 = (xs$x - 2)^2)
}= ps(x = p_dbl(lower = -10, upper = 10))
domain = ps(y1 = p_dbl(tags = "minimize"), y2 = p_dbl(tags = "minimize"))
codomain
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain)
= OptimInstanceBatchMultiCrit$new(
instance objective = objective,
search_space = domain,
terminator = trm("evals", n_evals = 30))
# Gaussian Process, EI, DIRECT
= srlrn(default_gp())
surrogate = acqf("ei")
acq_function = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
acq_optimizer terminator = trm("stagnation", threshold = 1e-8))
= opt("mbo",
optimizer loop_function = bayesopt_parego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance) optimizer
ggplot(aes(x = y1, y = y2), data = instance$archive$best()) +
geom_point() +
theme_minimal()
library(emoa)
library(mlr3misc)
library(data.table)
= map_dtr(unique(instance$archive$data$batch_nr), function(bnr) {
anytime_hypervolume = instance$archive$best(batch = 1:bnr)[, instance$archive$cols_y, with = FALSE]
pareto = dominated_hypervolume(t(pareto), ref = t(t(c(100, 144))))
dhv data.table(batch_nr = bnr, dhv = dhv)
})
ggplot(aes(x = batch_nr, y = dhv), data = anytime_hypervolume[batch_nr > 1]) +
geom_point() +
geom_step(direction = "vh") +
labs(x = "Batch Nr.", y = "Dominated Hypervolume") +
theme_minimal()
= function(xs) {
objective_function list(y1 = xs$x^2, y2 = (xs$x - 2)^2)
}= ps(x = p_dbl(lower = -10, upper = 10))
domain = ps(y1 = p_dbl(tags = "minimize"), y2 = p_dbl(tags = "minimize"))
codomain
= ObjectiveRFun$new(
objective fun = objective_function,
domain = domain,
codomain = codomain)
= OptimInstanceBatchMultiCrit$new(
instance objective = objective,
search_space = domain,
terminator = trm("evals", n_evals = 30))
# Gaussian Processes, SMS-EGO, DIRECT
= default_gp()
learner_y1 = learner_y1$clone(deep = TRUE)
learner_y2 = srlrn(list(learner_y1, learner_y2))
surrogate = acqf("smsego")
acq_function = acqo(opt("nloptr", algorithm = "NLOPT_GN_DIRECT_L"),
acq_optimizer terminator = trm("stagnation", threshold = 1e-8))
= opt("mbo",
optimizer loop_function = bayesopt_smsego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance) optimizer
ggplot(aes(x = y1, y = y2), data = instance$archive$best()) +
geom_point() +
theme_minimal()
= map_dtr(unique(instance$archive$data$batch_nr), function(bnr) {
anytime_hypervolume = instance$archive$best(batch = 1:bnr)[, instance$archive$cols_y, with = FALSE]
pareto = dominated_hypervolume(t(pareto), ref = t(t(c(100, 144))))
dhv data.table(batch_nr = bnr, dhv = dhv)
})
ggplot(aes(x = batch_nr, y = dhv), data = anytime_hypervolume[batch_nr > 1]) +
geom_point() +
geom_step(direction = "vh") +
labs(x = "Batch Nr.", y = "Dominated Hypervolume") +
theme_minimal()
library(mlr3)
= tsk("wine")
task = lrn("classif.rpart",
learner cp = to_tune(lower = 1e-4, upper = 1, logscale = TRUE),
maxdepth = to_tune(lower = 1, upper = 10),
minbucket = to_tune(lower = 1, upper = 10),
minsplit = to_tune(lower = 1, upper = 10))
= rsmp("cv", folds = 3)
resampling = msr("classif.acc")
measure
= TuningInstanceBatchSingleCrit$new(
instance task = task,
learner = learner,
resampling = resampling,
measure = measure,
terminator = trm("evals", n_evals = 30))
# Gaussian Process, EI, FocusSearch
= srlrn(default_gp(noisy = TRUE))
surrogate = acqf("ei")
acq_function = acqo(opt("focus_search", n_points = 100L, maxit = 9),
acq_optimizer terminator = trm("evals", n_evals = 3000))
= tnr("mbo",
tuner loop_function = bayesopt_ego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance)
tuner$result instance
= tsk("wine")
task = lrn("classif.rpart",
learner cp = to_tune(lower = 1e-4, upper = 1, logscale = TRUE),
maxdepth = to_tune(lower = 1, upper = 10),
minbucket = to_tune(lower = 1, upper = 10),
minsplit = to_tune(lower = 1, upper = 10))
= rsmp("cv", folds = 3)
resampling = msrs(c("classif.acc", "selected_features"))
measures
= TuningInstanceBatchMultiCrit$new(
instance task = task,
learner = learner,
resampling = resampling,
measures = measures,
terminator = trm("evals", n_evals = 30),
store_models = TRUE) # required due to selected features
# Gaussian Process, EI, FocusSearch
= srlrn(default_gp(noisy = TRUE))
surrogate = acqf("ei")
acq_function = acqo(opt("focus_search", n_points = 100L, maxit = 9),
acq_optimizer terminator = trm("evals", n_evals = 3000))
= tnr("mbo",
tuner loop_function = bayesopt_parego,
surrogate = surrogate,
acq_function = acq_function,
acq_optimizer = acq_optimizer)
set.seed(2906)
$optimize(instance)
tuner$result instance