The package includes several weighting schemes which can be parameterized, as well as custom configuration options. Furthermore, users can decide whether they wish to positively or negatively affect the accuracy score as a result of applying weights to the confusion matrix. “wconf” integrates well with the “caret” package, but it can also work standalone when provided data in matrix form.

Applying a weighting scheme to the confusion matrix can be useful in applications such as performance evaluation, where characteristics such as “underperforming”, “acceptable”, “overperforming” and “worker of the year” may represent gradations that are far apart and unevenly spaced. Similarly, where the objective is to classify geographic regions and proximity of the prediction to the actual region constitutes an advantage in terms of the model’s performance, applying a weighting scheme facilitates the model selection process.

Functions are included to calculate accuracy metrics for imbalanced data. Specifically, the package allows users to compute the Starovoitov-Golub sine-accuracy function, as well as the balanced accuracy function and the standard accuracy indicator.

**wconf** consists of the following functions:

This function allows users to choose from different weighting schemes and experiment with parametrizations and custom configurations.

**weightmatrix( n, weight.type,
weight.penalty, standard.deviation,
geometric.multiplier, interval.high,
interval.low, custom.weights,
plot.weights)**

*n* – the number of classes contained in the confusion
matrix.

*weight.type* – the weighting schema to be used. Can be one
of: “arithmetic” - a decreasing arithmetic progression weighting scheme,
“geometric” - a decreasing geometric progression weighting scheme,
“normal” - weights drawn from the right tail of a normal distribution,
“interval” - weights contained on a user-defined interval, “custom” -
custom weight vector defined by the user.

*weight.penalty* – determines whether the weights associated
with non-diagonal elements generated by the “normal”, “arithmetic” and
“geometric” weight types are positive or negative values. By default,
the value is set to FALSE, which means that generated weights will be
positive values.

*standard.deviation* – standard deviation of the normal
distribution, if the normal distribution weighting schema is used.

*geometric.multiplier* – the multiplier used to construct the
geometric progression series, if the geometric progression weighting
scheme is used.

*interval.high* – the upper bound of the weight interval, if
the interval weighting scheme is used.

*interval.low* – the lower bound of the weight interval, if
the interval weighting scheme is used.

*custom.weights* – the vector of custom weights to be applied,
is the custom weighting scheme was selected. The vector should be equal
to “n”, but can be larger, with excess values being ignored.

*plot.weights* – optional setting to enable plotting of weight
vector, corresponding to the first column of the weight matrix

This function calculates the weighted confusion matrix by multiplying, element-by-element, a weight matrix with a supplied confusion matrix object.

**wconfusionmatrix( m, weight.type,
weight.penalty, standard.deviation,
geometric.multiplier, interval.high,
interval.low, custom.weights,
print.weighted.accuracy)**

*m* – the caret confusion matrix object or simple matrix.

*weight.type* – the weighting schema to be used. Can be one
of: “arithmetic” - a decreasing arithmetic progression weighting scheme,
“geometric” - a decreasing geometric progression weighting scheme,
“normal” - weights drawn from the right tail of a normal distribution,
“interval” - weights contained on a user-defined interval, “custom” -
custom weight vector defined by the user.

*weight.penalty* – determines whether the weights associated
with non-diagonal elements generated by the “normal”, “arithmetic” and
“geometric” weight types are positive or negative values. By default,
the value is set to FALSE, which means that generated weights will be
positive values.

*standard.deviation* – standard deviation of the normal
distribution, if the normal distribution weighting schema is used.

*geometric.multiplier* – the multiplier used to construct the
geometric progression series, if the geometric progression weighting
scheme is used.

*interval.high* – the upper bound of the weight interval, if
the interval weighting scheme is used.

*interval.low* – the lower bound of the weight interval, if
the interval weighting scheme is used.

*custom.weights* – the vector of custom weights to be applied,
is the custom weighting scheme was selected. The vector should be equal
to “n”, but can be larger, with excess values being ignored.

*print.weighted.accuracy* – optional setting to print the
weighted accuracy metric, which represents the sum of all weighted
confusion matrix cells divided by the total number of observations.

This function calculates the redistributed confusion matrix by reallocating observations classified in the vicinity of the true category to the confusion matrix diagonal, according to a user-specified weighting scheme which determines the proportion of observations to reassign.

**rconfusionmatrix( m, custom.weights,
print.weighted.accuracy)**

*m* – the caret confusion matrix object or simple matrix.

*custom.weights* – the vector of custom weights to be applied.
The vector should be equal to “n”, but can be larger, with the first
value and all excess values being ignored.

*print.weighted.accuracy* – optional setting to print the
standard redistributed accuracy metric, which represents the sum of all
observations on the diagonal divided by the total number of
observations.

This function calculates classification accuracy scores using the sine-based formulas proposed by Starovoitov and Golub (2020). The advantage of the new method consists in producing improved results when compared with the standard balanced accuracy function, by taking into account the class distribution of errors. This feature renders the method useful when confronted with imbalanced data.

**balancedaccuracy( m,
print.scores)**

The function takes as input:

*m* - the caret confusion matrix object or simple matrix.

*print.scores* - used to display the accuracy scores when set
to TRUE.

For custom specifications, since the interval of variation of the weights is not bound to any given interval, depending on the user configuration, it is possible to obtain negative accuracy scores.

You can download **wconf** directly from Github. To do
so, you need to have the **devtools** package installed and
loaded. Once you are in **R**, run the following
commands:

install.packages(“devtools”)

library(“devtools”)

install_github(“alexandrumonahov/wconf”)

You may face downloading errors from Github if you are behind a firewall or there are https download restrictions. To avoid this, you can try running the following commands:

options(download.file.method = “libcurl”)

options(download.file.method = “wininet”)

Once the package is installed, you can run it using the:
**library(wconf)** command.

Alexandru Monahov, 2024