Using DImodelsVis with regression models not fit using the DImodels package

DImodelsVis is very convenient to use with regression models fit using the DImodels or DImodelsMulti R packages. However, there may be situations when users need to work with highly customised and complex models that are not possible to fit using either DImodels or DImodelsMulti. Additionally, users may also desire finer control over the visualisation pipeline and seek to modify both, the plot appearance as well as the underlying data. To accommodate these needs, each visualisation function in DImodelsVis is paired with a separate data preparation (suffixed by _data) and plotting (suffixed by _plot) function. This separation provides great flexibility, as the data preparation functions allow users to create, modify, or extend the prepared data before visualising it, while the plotting functions enable customisation of the plot aesthetics independently. This document provides examples demonstrating how users can leverage this separation to create visualisations in a modular manner and derive insights from any regression model with compositional predictors using DImodelsVis. The examples shown here are created with a regression model fit using glm but the exact same principles would apply for creating visualisations using any other regression modelling tool kit.

Visualisations

The model object stored in cust_model will be used for creating all the visualisations shown in this section. We will follow the “Modular plotting” workflow shown on the right side in Figure 3 from the main manuscript where the data needed for each visualisation will be created first and modified as per our needs and then passed on to the respective plotting function to create the visualisation.

We first define a common set of variables which will be used across the board for ease of use

# Names of the compositional predictors (i.e., species) in the model
species <- c("G1", "G2", "L1", "L2", "H1", "H2")

# Functional groupings of the species
FG <- c("G", "G", "L", "L", "H", "H")

# Colours to be used for the six compostional variables (species) in plots
var_cols <- get_colours(vars = species, FG = FG)

Model diagnostics plot

Let’s first examine the model diagnostics plots for our fitted model to ensure there are no severe problems with the model assumptions.

The data needed for creating the diagnostics plot is prepared using the model_diagnostics_data() function. The primary difference here (and in all other functions) compared to using the “Direct plotting” functions (workflow on left site of Figure 3 in main manuscript) is that the names of the compositional predictors should to be specified in the prop parameter.

diag_data <- model_diagnostics_data(
  model = cust_model,  # Regression model object
  prop = species)      # Names of the compositional predictors

The model_diagnostics_data() function returns the original data fitted to the model with additional columns containing diagnostic information for each observation. The returned data looks as follows.

head(diag_data, 8)
#>   response G1 G2 L1 L2 H1 H2 bfg_GL bfg_GH bfg_LH wfg_G wfg_L wfg_H
#> 1   26.325  1  0  0  0  0  0      0      0      0     0     0     0
#> 2   29.083  1  0  0  0  0  0      0      0      0     0     0     0
#> 3   27.581  1  0  0  0  0  0      0      0      0     0     0     0
#> 4   17.391  0  1  0  0  0  0      0      0      0     0     0     0
#> 5   15.678  0  1  0  0  0  0      0      0      0     0     0     0
#> 6   14.283  0  1  0  0  0  0      0      0      0     0     0     0
#> 7   16.317  0  0  1  0  0  0      0      0      0     0     0     0
#> 8   19.553  0  0  1  0  0  0      0      0      0     0     0     0
#>   treatment150 treatment250      .hat   .sigma      .cooksd  .fitted     .resid
#> 1            0            0 0.1058936 2.374202 7.426182e-04 26.98775 -0.6627540
#> 2            0            0 0.1058936 2.366805 7.422170e-03 26.98775  2.0952460
#> 3            0            0 0.1058936 2.374365 5.950185e-04 26.98775  0.5932460
#> 4            0            0 0.1058936 2.374675 3.142566e-04 16.95987  0.4311335
#> 5            0            0 0.1058936 2.371950 2.778091e-03 16.95987 -1.2818665
#> 6            0            0 0.1058936 2.361594 1.211475e-02 16.95987 -2.6768665
#> 7            0            0 0.1058936 2.357422 1.586474e-02 19.38028 -3.0632750
#> 8            0            0 0.1058936 2.374967 5.043954e-05 19.38028  0.1727250
#>     .stdresid Obs Label        .qq weights
#> 1 -0.29628289   1     1 -0.4016478       1
#> 2  0.93667566   2     2  1.0564117       1
#> 3  0.26520947   3     3  0.3633920       1
#> 4  0.19273739   4     4  0.2699041       1
#> 5 -0.57305593   5     5 -0.5614833       1
#> 6 -1.19668798   6     6 -1.2655513       1
#> 7 -1.36943116   7     7 -1.5523965       1
#> 8  0.07721636   8     8  0.1606864       1

The data-preparation functions also add additional attributes to the prepared data which are accessed by the plotting functions for creating the visualisations. In this case, the primary attribute added to the data is the names of the compositional predictors with the key "prop" and can be accessed as follows.

attr(diag_data, "prop")
#> [1] "G1" "G2" "L1" "L2" "H1" "H2"

The prepared data-frame can be updated if the user desires to add any additional columns or modify any existing ones and then be passed to the model_diagnostics_plot() function for creating the visualisation as follows.

model_diagnostics_plot(
    data = diag_data, # Output data-frame returned by model_diagnostics_data()
    which = c(1, 2),  # Only print first 2 plots to save space
    pie_colours = var_cols # Colours for the pie-slices
  )
#> ✔ Created all plots.

Output from model_diagnostics() function

It might be desirable to reduce over-plotting by showing only the extreme observations as pie-glyphs and rendering the remaining as points. The only_extremes in conjunction with the npoints argument can be used.

model_diagnostics_plot(
    data = diag_data, 
    which = c(1, 2),  
    pie_colours = var_cols,
    only_extremes = TRUE, # Show only extreme points as pie-glyphs
    npoints = 5) # Show the five most extreme points as pie-glyphs (default is 3)
#> ✔ Created all plots.

Output from model_diagnostics() function

The “Residual vs Fitted” plot in the above figure indicates that the residuals are centered around the y = 0 line and have a relatively constant variance across the board. The “QQ-Plot” indicates that the normality assumption doesn’t seem to violated. The observations highlighted with pie-glyphs are primarily containing 'H2' which indicates that the model may not be capturing information about 'H2' very well and we might have to investigate it further.

The figure can also be customised further if needed. For example, suppose the user wishes to overlay the “Residual vs Fitted” plot with pie-glyphs but only for those observations receiving the 250 kg nitrogen treatment. This can be achieved with the following code with the output shown below:

# Extract data for containing observations recieving 250 N treatment
data_250 <- diag_data %>% 
  filter(treatment250 == 1)

# We first create the diagnostic plot
diag_plot <- model_diagnostics_plot(
    data = diag_data, 
    which = c(1),   # Only show residual vs fitted plot
    pie_colours = var_cols,
    only_extremes = TRUE, # Show only extreme points as pie-glyphs
    npoints = 0, # Show the no extreme points as pie-glyphs
  )
#> ✔ Created all plots.

# Now we can use traditional ggplot machinery to modify the plot
diag_plot + 
  # Manually add pie-glyphs using the subsetted data from above
  geom_pie_glyph(data = data_250, slices = species, colour = "black")

Output from model_diagnostics() function

Prediction contributions

The prediction_contributions_data() function can be called to create the template data needed for visualising the contributions of the various terms in the model to the predicted response.

We first create a subset of the data for which the predictions contributions should be visualised. For this example we will visualise the six monocultures, 50-50 mixtures of the two grasses, legumes and herb along with the six-species centroid mixture receiving the 250kg nitrogen treatment. These communities can be subsetted from the data as follows

subset_data <- model_data %>% 
  # Filter only those observations recieving 250 N treatment
  filter(treatment250 == 1) %>% 
  distinct(G1, G2, L1, L2, H1, H2, .keep_all = TRUE) %>% 
  # Filter specific communities
  filter(
    # Six monocultures
    (G1 == 1 | G2 == 1 | L1 == 1 | L2 == 1 | H1 == 1 | H2 == 1) | 
    # Two species mixtures
    ((G1 == 0.5 & G2 == 0.5) | (L1 == 0.5 & L2 == 0.5) | (H1 == 0.5 & H2 == 0.5)) |
    # Six species centroid
    (near(G1, 0.16667, tol = 0.01)))
  
head(subset_data)
#>   richness treatment G1 G2 L1 L2 H1 H2 G L H wfg_G wfg_L wfg_H bfg_GL bfg_GH
#> 1        1       250  1  0  0  0  0  0 1 0 0     0     0     0      0      0
#> 2        1       250  0  1  0  0  0  0 1 0 0     0     0     0      0      0
#> 3        1       250  0  0  1  0  0  0 0 1 0     0     0     0      0      0
#> 4        1       250  0  0  0  1  0  0 0 1 0     0     0     0      0      0
#> 5        1       250  0  0  0  0  1  0 0 0 1     0     0     0      0      0
#> 6        1       250  0  0  0  0  0  1 0 0 1     0     0     0      0      0
#>   bfg_LH treatment50 treatment150 treatment250 response
#> 1      0           0            0            1   32.538
#> 2      0           0            0            1   23.644
#> 3      0           0            0            1   25.242
#> 4      0           0            0            1   24.641
#> 5      0           0            0            1   16.019
#> 6      0           0            0            1   15.449

This data containing the specific commmunities of interest can be passed to the prediction_contributions_data() function to create the template necessary for visualising the contributions.

Note: All predictors variables from the model should be present in the data or the function would throw an error.

pred_cont_data <- prediction_contributions_data(
    model = cust_model, # Model used for making predicitons 
    data = subset_data,  # Data containing the observation for which to make prediction
    # X-axis labels to be used for the bars
    bar_labs = c("G1", "G2", "L1", "L2", "H1", "H2", "G-G", "L-L", "H-H", "Cent.")
  ) 
#> ✔ Finished data preparation.
head(pred_cont_data)
#> # A tibble: 6 × 14
#>   .x_labs .Community  richness treatment     G     L     H treatment50 response
#>   <chr>   <fct>          <int>     <dbl> <dbl> <dbl> <dbl>       <dbl>    <dbl>
#> 1 G1      Community 1        1       250     1     0     0           0     32.5
#> 2 G1      Community 1        1       250     1     0     0           0     32.5
#> 3 G1      Community 1        1       250     1     0     0           0     32.5
#> 4 G1      Community 1        1       250     1     0     0           0     32.5
#> 5 G1      Community 1        1       250     1     0     0           0     32.5
#> 6 G1      Community 1        1       250     1     0     0           0     32.5
#> # ℹ 5 more variables: .Pred <dbl>, .Lower <dbl>, .Upper <dbl>,
#> #   .Contributions <chr>, .Value <dbl>

The prepared data now contains additional columns (.Contributions and .Value) which describe the contributions of each model term to the response for each community. The predicted response (.Pred) along with the uncertainty (95% by default) around it is also present (.Upper and .Lower). Finally, there is additional information about the grouping and labelling for each community (.Community and .x_labs).

Note: If the data passed to this function had \(n\) rows and the model had \(p\) coefficients, the resultant data will have \(n*p\) rows.

If the model was not fit in R, or for some reason if the user doesn’t have access to the fitted model object, it is also possible to obtain the same data by using the model coefficients.

Suppose we only had access to the following coefficients.

(mod_coeffs <- coef(cust_model))
#>           G1           G2           L1           L2           H1           H2 
#>    26.987754    16.959867    19.380275    21.118863    10.447271    12.517346 
#>       bfg_GL       bfg_GH       bfg_LH        wfg_G        wfg_L        wfg_H 
#>    16.174287    21.134709    26.054987    40.878915    15.672343    16.210285 
#> treatment150 treatment250 
#>     3.298532     5.853617

The same data as before could be generated as follows. We specify an additional argument coeff_cols which gives the names of columns corresponding to the coefficients.

pred_cont_data <- prediction_contributions_data(
    coefficients = mod_coeffs, # Model coeffecients 
    coeff_cols = names(mod_coeffs), # Column names corresponding to the coefficients
    data = subset_data,  # Data containing the observation for which to make prediction
    # X-axis labels to be used for the bars
    bar_labs = c("G1", "G2", "L1", "L2", "H1", "H2", "G-G", "L-L", "H-H", "Cent.")
  ) 
#> ✔ Finished data preparation.
head(pred_cont_data)
#> # A tibble: 6 × 14
#>   .x_labs .Community  richness treatment     G     L     H treatment50 response
#>   <chr>   <fct>          <int>     <dbl> <dbl> <dbl> <dbl>       <dbl>    <dbl>
#> 1 G1      Community 1        1       250     1     0     0           0     32.5
#> 2 G1      Community 1        1       250     1     0     0           0     32.5
#> 3 G1      Community 1        1       250     1     0     0           0     32.5
#> 4 G1      Community 1        1       250     1     0     0           0     32.5
#> 5 G1      Community 1        1       250     1     0     0           0     32.5
#> 6 G1      Community 1        1       250     1     0     0           0     32.5
#> # ℹ 5 more variables: .Pred <dbl>, .Lower <dbl>, .Upper <dbl>,
#> #   .Contributions <chr>, .Value <dbl>

Note: When using model coefficients for generating the data, the uncertainty information around the prediction won’t be available by default. However, this information can be obtained as well if the user passes the variance-covariance matrix of the coefficients in the vcov argument.

Using the prepared data, the prediction-contributions plot can be generated using the prediction_contributions_plot() function. By default, only the data argument is needed, however, it often helps to manually specify the colours for each contribution via the 'colours' argument to make the plot more informative.

prediction_contributions_plot(
  data = pred_cont_data, # Output data from prediction_contributions_data()
  colours = c(# Colours for the six species contributions
              var_cols, 
              # Colours for between FG interactions
              get_shades("#DDCC77", shades = 3)[[1]], 
              # Colours for within FG interactions
              get_shades("#D55E00", shades = 3)[[1]],
              # Colours for nitrogen treatment
              get_shades("#505050", shades = 2)[[1]]))
#> ✔ Created plot.

Output from prediction_contributions_plot() function

The above figure shows the G1 monoculture performs best among monocultures and has comparable performance to the six-species centroid mixture. Additionally, the within grass interaction (“wfg_G”) is stronger than the within legume (“wfg_L”) and within herb (“wfg_H”) interactions. However, the between FG interactions (“bfg_*“) contribute more towards the predicted response compared to within FG interactions.

For situations when there are too many predictors in the model, it is also possible to group similar terms so that their total contribution is shown using a single segment on the bar by using the groups argument in the prediction_contributions_data() function. The output visualisation with grouped FG interaction terms is shown below.

pred_cont_data_grouped <- prediction_contributions_data(
    model = cust_model, 
    data = subset_data,
    # We group the three between and within FG interactions into single terms
    groups = list("Between FG ints." = c("bfg_GL", "bfg_GH", "bfg_LH"),
                  "Within FG ints." = c("wfg_G", "wfg_L", "wfg_H")),
    # X-axis labels to be used for the bars
    bar_labs = c("G1", "G2", "L1", "L2", "H1", "H2", "G-G", "L-L", "H-H", "Cent.")
  ) 
#> ✔ Finished data preparation.
# Notice the .Contributions column
head(pred_cont_data_grouped, 10)
#> # A tibble: 10 × 20
#>    .x_labs .Community  richness treatment     G     L     H wfg_G wfg_L wfg_H
#>    <chr>   <fct>          <int>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 G1      Community 1        1       250     1     0     0     0     0     0
#>  2 G1      Community 1        1       250     1     0     0     0     0     0
#>  3 G1      Community 1        1       250     1     0     0     0     0     0
#>  4 G1      Community 1        1       250     1     0     0     0     0     0
#>  5 G1      Community 1        1       250     1     0     0     0     0     0
#>  6 G1      Community 1        1       250     1     0     0     0     0     0
#>  7 G1      Community 1        1       250     1     0     0     0     0     0
#>  8 G1      Community 1        1       250     1     0     0     0     0     0
#>  9 G1      Community 1        1       250     1     0     0     0     0     0
#> 10 G1      Community 1        1       250     1     0     0     0     0     0
#> # ℹ 10 more variables: bfg_GL <dbl>, bfg_GH <dbl>, bfg_LH <dbl>,
#> #   treatment50 <dbl>, response <dbl>, .Pred <dbl>, .Lower <dbl>, .Upper <dbl>,
#> #   .Contributions <chr>, .Value <dbl>

Notice how the .Contributions column now has the combined between and within FG interactions as opposed to those at the individual functional group level.

grouped_cont_plot <- prediction_contributions_plot(
  data = pred_cont_data_grouped, # Output data from prediction_contributions_data()
  colours = c(# Colours for the six species contributions
              var_cols, 
              # Colours for nitrogen treatment
              get_shades("#505050", shades = 2)[[1]],
              # Colours for between FG interactions
              "#DDCC77", 
              # Colours for within FG interactions
              "#0072B2"))
#> ✔ Created plot.
grouped_cont_plot

Output from prediction_contributions_plot() function

Finally, the output visualisation can also be customised with additional annotations if desired. For example, if the user wishes to overlay a reference line on the plot for comparisons purposes and separate by bars by species richness, they can do so using as follows. The output visualisation with these aesthetic changes is shown below

grouped_cont_plot +
  # Add dashed reference line
  geom_hline(yintercept = 25, linewidth = 1, 
             colour = "black", linetype = "dashed") + 
  facet_grid(~richness, labeller = label_both,
             space = "free_x", scale = "free_x")

Plot output

Conditional ternary diagrams

The conditional_ternary_data() creates function creates a template data.frame which can be passed to conditional_ternary_plot() for creating conditional ternary diagrams. For simple models, calling conditional_ternary_data() would also add the predicted response across the ternary surface. However, for complex models with additional non-compositional predictors, by default the function would try to make predictions for the most reasonable values for these non-compositional predictors (base level for categorical and median value for continuous) but that won’t always be desirable. So users have the option to generate a template data frame without predictions by setting prediction = FALSE. This template can then be modified to include any additional variables required for predictions before being passed to the appropriate plotting function for visualisation.

The following code shows an example of creating such a template where we fix G2 and L2 at specific values and allow G1, L1 and H1 to vary between 0 and \(G2 + L2\).

Note: Since H2 is not specified here, it is assumed to be zero.

cond_tern_template <- conditional_ternary_data(
  prop = species, # Names of the compositional predictors
  tern_vars = c("G1", "L1", "H1"), # Names for the three predictors to show in the ternary
  # The values at which to fix the remaining predictors
  # Any compositional predictors not specified here (e.g. H2) would be assumed to be 0 
  conditional = data.frame("G2" = c(0.2, 0.1), 
                           "L2" = c(0.1, 0.3)),
  # Do not add predictions
  prediction = FALSE)
#> ✔ Finished data preparation.

head(cond_tern_template)
#>   G1        L1          H1          .x .y  G2  L2 H2    .Sp   .Value
#> 1  0 0.7000000 0.000000000 0.000000000  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 2  0 0.6988314 0.001168614 0.001669449  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 3  0 0.6976628 0.002337229 0.003338898  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 4  0 0.6964942 0.003505843 0.005008347  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 5  0 0.6953255 0.004674457 0.006677796  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 6  0 0.6941569 0.005843072 0.008347245  0 0.2 0.1  0 G2, L2 0.2, 0.1
#>               .Facet
#> 1 G2 = 0.2; L2 = 0.1
#> 2 G2 = 0.2; L2 = 0.1
#> 3 G2 = 0.2; L2 = 0.1
#> 4 G2 = 0.2; L2 = 0.1
#> 5 G2 = 0.2; L2 = 0.1
#> 6 G2 = 0.2; L2 = 0.1

The template has all the necessary columns along with additional attributes needed for creating conditional ternary diagrams.

attributes(cond_tern_template)[-2]
#> $names
#>  [1] "G1"     "L1"     "H1"     ".x"     ".y"     "G2"     "L2"     "H2"    
#>  [9] ".Sp"    ".Value" ".Facet"
#> 
#> $class
#> [1] "data.frame"
#> 
#> $prop
#> [1] "G1" "G2" "L1" "L2" "H1" "H2"
#> 
#> $tern_vars
#> [1] "G1" "L1" "H1"
#> 
#> $x_proj
#> [1] ".x"
#> 
#> $y_proj
#> [1] ".y"

The necessary additional columns needed for making the predictions can be added using a dplyr pipeline or base R.

Note: If any specific transformation were made to any columns in the raw data before model fitting, the same transformation should be also be added to this template to ensure predictions are accurate.

We create datasets for making two ternaries, one for communities at 150 kg nitrogen treatment and other at 250 kg nitrogen treatment. The between- and within- functional group interaction terms are also added to the data.

Predicting response for communities at 150 kg nitrogen treatment

cond_tern_template150 <- cond_tern_template %>% 
  # Add values for the interaction columns to data
  mutate(wfg_G = G1*G2, wfg_L = L1*L2, wfg_H = H1*H2,
         bfg_GL = (G1 + G2) * (L1 + L2),
         bfg_GH = (G1 + G2) * (H1 + H2),
         bfg_LH = (L1 + L2) * (H1 + H2)) %>%
  # We first want ternary for treatment150
  mutate(treatment150 = 1, treatment250 = 0)

head(cond_tern_template150)
#>   G1        L1          H1          .x .y  G2  L2 H2    .Sp   .Value
#> 1  0 0.7000000 0.000000000 0.000000000  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 2  0 0.6988314 0.001168614 0.001669449  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 3  0 0.6976628 0.002337229 0.003338898  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 4  0 0.6964942 0.003505843 0.005008347  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 5  0 0.6953255 0.004674457 0.006677796  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 6  0 0.6941569 0.005843072 0.008347245  0 0.2 0.1  0 G2, L2 0.2, 0.1
#>               .Facet wfg_G      wfg_L wfg_H    bfg_GL       bfg_GH       bfg_LH
#> 1 G2 = 0.2; L2 = 0.1     0 0.07000000     0 0.1600000 0.0000000000 0.0000000000
#> 2 G2 = 0.2; L2 = 0.1     0 0.06988314     0 0.1597663 0.0002337229 0.0009335258
#> 3 G2 = 0.2; L2 = 0.1     0 0.06976628     0 0.1595326 0.0004674457 0.0018643203
#> 4 G2 = 0.2; L2 = 0.1     0 0.06964942     0 0.1592988 0.0007011686 0.0027923835
#> 5 G2 = 0.2; L2 = 0.1     0 0.06953255     0 0.1590651 0.0009348915 0.0037177154
#> 6 G2 = 0.2; L2 = 0.1     0 0.06941569     0 0.1588314 0.0011686144 0.0046403159
#>   treatment150 treatment250
#> 1            1            0
#> 2            1            0
#> 3            1            0
#> 4            1            0
#> 5            1            0
#> 6            1            0

Predicting response for communities at 150 kg nitrogen treatment

cond_tern_template250 <- cond_tern_template %>% 
  # Add additional interaction columns to data
  mutate(wfg_G = G1*G2, wfg_L = L1*L2, wfg_H = H1*H2,
         bfg_GL = (G1 + G2) * (L1 + L2),
         bfg_GH = (G1 + G2) * (H1 + H2),
         bfg_LH = (L1 + L2) * (H1 + H2)) %>%
  # This will create ternary for treatment250
  mutate(treatment150 = 0, treatment250 = 1)

head(cond_tern_template250)
#>   G1        L1          H1          .x .y  G2  L2 H2    .Sp   .Value
#> 1  0 0.7000000 0.000000000 0.000000000  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 2  0 0.6988314 0.001168614 0.001669449  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 3  0 0.6976628 0.002337229 0.003338898  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 4  0 0.6964942 0.003505843 0.005008347  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 5  0 0.6953255 0.004674457 0.006677796  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 6  0 0.6941569 0.005843072 0.008347245  0 0.2 0.1  0 G2, L2 0.2, 0.1
#>               .Facet wfg_G      wfg_L wfg_H    bfg_GL       bfg_GH       bfg_LH
#> 1 G2 = 0.2; L2 = 0.1     0 0.07000000     0 0.1600000 0.0000000000 0.0000000000
#> 2 G2 = 0.2; L2 = 0.1     0 0.06988314     0 0.1597663 0.0002337229 0.0009335258
#> 3 G2 = 0.2; L2 = 0.1     0 0.06976628     0 0.1595326 0.0004674457 0.0018643203
#> 4 G2 = 0.2; L2 = 0.1     0 0.06964942     0 0.1592988 0.0007011686 0.0027923835
#> 5 G2 = 0.2; L2 = 0.1     0 0.06953255     0 0.1590651 0.0009348915 0.0037177154
#> 6 G2 = 0.2; L2 = 0.1     0 0.06941569     0 0.1588314 0.0011686144 0.0046403159
#>   treatment150 treatment250
#> 1            0            1
#> 2            0            1
#> 3            0            1
#> 4            0            1
#> 5            0            1
#> 6            0            1

Once we have all the necessary columns needed for making predictions, they can be added either using the base R predict() function or the add_prediction() function from DImodelsVis. Both examples are shown here. Additionally, we also show an example of how to generate the predictions using model coefficients and their variance-covariance matrix instead of the model object.

Note: If using the base R predict() function, please ensure that the predictions are stored in a column called “.Pred” as this is the column that the visualisation function accepts the predictions to be in.

# Add predictions using helper function from DImodelsVis
cond_tern_template150 <- cond_tern_template150 %>% 
  add_prediction(model = cust_model)

# Add predictions using model coefficients
mod_coeffs <- coefficients(cust_model) # Coefficients
vcov_mat <- vcov(cust_model)           # vcov matrix of coefficients

cond_tern_template150 <- cond_tern_template150 %>% 
  add_prediction(coefficients = mod_coeffs,
                 vcov = vcov_mat)

# Add predictions using base R predict function
cond_tern_template250$.Pred <- predict(cust_model,
                                       newdata = cond_tern_template250)

head(cond_tern_template150)
#>   G1        L1          H1          .x .y  G2  L2 H2    .Sp   .Value
#> 1  0 0.7000000 0.000000000 0.000000000  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 2  0 0.6988314 0.001168614 0.001669449  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 3  0 0.6976628 0.002337229 0.003338898  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 4  0 0.6964942 0.003505843 0.005008347  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 5  0 0.6953255 0.004674457 0.006677796  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 6  0 0.6941569 0.005843072 0.008347245  0 0.2 0.1  0 G2, L2 0.2, 0.1
#>               .Facet wfg_G      wfg_L wfg_H    bfg_GL       bfg_GH       bfg_LH
#> 1 G2 = 0.2; L2 = 0.1     0 0.07000000     0 0.1600000 0.0000000000 0.0000000000
#> 2 G2 = 0.2; L2 = 0.1     0 0.06988314     0 0.1597663 0.0002337229 0.0009335258
#> 3 G2 = 0.2; L2 = 0.1     0 0.06976628     0 0.1595326 0.0004674457 0.0018643203
#> 4 G2 = 0.2; L2 = 0.1     0 0.06964942     0 0.1592988 0.0007011686 0.0027923835
#> 5 G2 = 0.2; L2 = 0.1     0 0.06953255     0 0.1590651 0.0009348915 0.0037177154
#> 6 G2 = 0.2; L2 = 0.1     0 0.06941569     0 0.1588314 0.0011686144 0.0046403159
#>   treatment150 treatment250    .Pred
#> 1            1            0 26.05353
#> 2            1            0 26.06675
#> 3            1            0 26.07989
#> 4            1            0 26.09296
#> 5            1            0 26.10595
#> 6            1            0 26.11888
head(cond_tern_template250)
#>   G1        L1          H1          .x .y  G2  L2 H2    .Sp   .Value
#> 1  0 0.7000000 0.000000000 0.000000000  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 2  0 0.6988314 0.001168614 0.001669449  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 3  0 0.6976628 0.002337229 0.003338898  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 4  0 0.6964942 0.003505843 0.005008347  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 5  0 0.6953255 0.004674457 0.006677796  0 0.2 0.1  0 G2, L2 0.2, 0.1
#> 6  0 0.6941569 0.005843072 0.008347245  0 0.2 0.1  0 G2, L2 0.2, 0.1
#>               .Facet wfg_G      wfg_L wfg_H    bfg_GL       bfg_GH       bfg_LH
#> 1 G2 = 0.2; L2 = 0.1     0 0.07000000     0 0.1600000 0.0000000000 0.0000000000
#> 2 G2 = 0.2; L2 = 0.1     0 0.06988314     0 0.1597663 0.0002337229 0.0009335258
#> 3 G2 = 0.2; L2 = 0.1     0 0.06976628     0 0.1595326 0.0004674457 0.0018643203
#> 4 G2 = 0.2; L2 = 0.1     0 0.06964942     0 0.1592988 0.0007011686 0.0027923835
#> 5 G2 = 0.2; L2 = 0.1     0 0.06953255     0 0.1590651 0.0009348915 0.0037177154
#> 6 G2 = 0.2; L2 = 0.1     0 0.06941569     0 0.1588314 0.0011686144 0.0046403159
#>   treatment150 treatment250    .Pred
#> 1            0            1 28.60862
#> 2            0            1 28.62183
#> 3            0            1 28.63497
#> 4            0            1 28.64804
#> 5            0            1 28.66104
#> 6            0            1 28.67397

These datasets with predictions can then be passed to the conditional_ternary_plot() function to create the conditional ternary diagrams. The output visualisation for 150 kg and 250 kg nitrogen treatment are shown below.

To ensure the contours across the two plots are comparable we add the same upper and lower limits across the two using the lower_lim and upper_lim arguments.

lwr_lim <- floor(min(cond_tern_template150$.Pred, cond_tern_template250$.Pred))
upr_lim <- ceiling(max(cond_tern_template150$.Pred, cond_tern_template250$.Pred))

conditional_ternary_plot(cond_tern_template150,
                         contour_text = FALSE,
                         lower_lim = lwr_lim,
                         upper_lim = upr_lim)
#> ✔ Created plot.

Output from conditional_ternary_plot() function

conditional_ternary_plot(cond_tern_template250,
                         contour_text = FALSE,
                         lower_lim = lwr_lim,
                         upper_lim = upr_lim)
#> ✔ Created plot.

Output from conditional_ternary_plot() function

The contours appear to have the same structure across the two treatment levels, but the predicted yields are higher for the communities receiving 250 kg of N/ha/annum. Within a given level of the treatment, the predicted yield is maximised by increasing the proportion of G1.

Grouped ternary diagrams

The grouped_ternary_data() creates function creates a template data.frame which can be passed to grouped_ternary_plot() for creating grouped ternary diagrams. All functionality of this function is similar to that of the data-preparation and plotting functions for conditional ternary diagrams.

As an example, we first call the grouped_ternary_data() function to create a template data.frame for grouped ternary diagram to which we will add the predictions later. Additionally, we use the add_var argument to specify that we want to fix treatment to be at 50 and 250 kg/annum. These values can still be added later to the data as we did for the example in the conditional ternary section. However, the benefit for adding them using add_var is that the function adds a special column to the data which will notify the plotting function to show the ternaries as separate panels within the same plot which allows us to avoid having to create separate plots for the two treatment levels.

Note: The add_var argument is available for all data-preparation functions in the package.}

group_tern_data <- grouped_ternary_data(
  prediction = FALSE,                         # Do not make predictions now
  prop = species,                             # Compositional predictors
  FG = FG,                                    # Functional grouping
  # Relative split within each FG. Equal split in this example
  values = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5),    
  # Values for additional non-compsitional predictors
  add_var = list(treatment150 = c(0, 0),      
                 treatment250 = c(0, 1))
)
#> ✔ Finished data preparation.
head(group_tern_data)
#>   G         L           H          .x .y G1 G2        L1        L2           H1
#> 1 0 1.0000000 0.000000000 0.000000000  0  0  0 0.5000000 0.5000000 0.0000000000
#> 2 0 0.9983306 0.001669449 0.001669449  0  0  0 0.4991653 0.4991653 0.0008347245
#> 3 0 0.9966611 0.003338898 0.003338898  0  0  0 0.4983306 0.4983306 0.0016694491
#> 4 0 0.9949917 0.005008347 0.005008347  0  0  0 0.4974958 0.4974958 0.0025041736
#> 5 0 0.9933222 0.006677796 0.006677796  0  0  0 0.4966611 0.4966611 0.0033388982
#> 6 0 0.9916528 0.008347245 0.008347245  0  0  0 0.4958264 0.4958264 0.0041736227
#>             H2 treatment150 treatment250                        .add_str_ID
#> 1 0.0000000000            0            0 treatment150: 0; \ttreatment250: 0
#> 2 0.0008347245            0            0 treatment150: 0; \ttreatment250: 0
#> 3 0.0016694491            0            0 treatment150: 0; \ttreatment250: 0
#> 4 0.0025041736            0            0 treatment150: 0; \ttreatment250: 0
#> 5 0.0033388982            0            0 treatment150: 0; \ttreatment250: 0
#> 6 0.0041736227            0            0 treatment150: 0; \ttreatment250: 0

The .add_str_ID column in the data notifies the plotting function that we wish to have separate ternaries for each unique value in this column. Optionally, we can edit the values in this column to get more informative labels.

group_tern_data <- group_tern_data %>% 
  mutate(.add_str_ID = ifelse(.add_str_ID == "treatment150: 0; \ttreatment250: 0",
                              "Treatment: 50", "Treatment: 250"))

We can now add any interaction columns needed for making predictions and generate the predictions using any of the previously highlighted methods. The prepared data with the predictions added can then be passed to the grouped_ternary_plot() function to generate the visualisation. Additionally, we also show how the plot object can be further updated to include the best-performing community in each ternary using the geom_point() function from ggplot2.

# Add interaction columns and the prediction
group_tern_data <- group_tern_data %>% 
  # Add additional interaction columns to data
  mutate(wfg_G = G1*G2, wfg_L = L1*L2, wfg_H = H1*H2,
         bfg_GL = (G1 + G2) * (L1 + L2),
         bfg_GH = (G1 + G2) * (H1 + H2),
         bfg_LH = (L1 + L2) * (H1 + H2)) %>%
  # Add predictions
  add_prediction(model = cust_model)

# Create visualisation
group_tern_plot <- grouped_ternary_plot(
  data = group_tern_data,
  lower_lim = 15,
  upper_lim = 40,
  nlevels = 5)
#> ✔ Created all plots.

# Update plot to highlight the best-performing community in each ternary
group_tern_plot +
  # Add point highlighting the best performing community in each ternary
  geom_point(data = function(x) x %>% filter(.Pred == max(.Pred)),
             colour = "#505050", shape = 18, size = 4.5) +
  # Reduce size of legend so they fit
  theme(legend.key.width = unit(0.07, "npc"))

Output from grouped_ternary_plot() function

The contours appear to have the same structure across the two treatment levels, but the predicted yields are higher for the communities receiving 250 kg of N/annum. Within a given level of the treatment, the predicted yield is maximised by increasing the proportion of the grasses (G). The best performing community at each treatment level contains 100% grasses with no other functional group. Since the FG proportion is split equally between the component species in this example, the best performing community contains 50% each of G1 and G2.

Simplex path plot

For this example, the simplex-path plots would be useful to visualise the effect of increasing the functional group proportion on the predicted yield. We will visualise the change in the predicted yield between the six-species centroid mixture and the binary mixtures containing the two species from each functional group.

We prepare the data for the visualisation using the simplex_path_data() function. As before, first the template without predictions is prepared and then predictions are added in manually after adding in any additional columns needed for the predictions.

starts <- tribble(~G1, ~G2, ~L1, ~L2, ~H1, ~H2,
                  1/6, 1/6, 1/6, 1/6, 1/6, 1/6)
ends <- tribble(~G1, ~G2, ~L1, ~L2, ~H1, ~H2,
                1/2, 1/2,   0,   0,   0,  0,
                  0,   0, 1/2, 1/2,   0,  0,
                  0,   0,   0,   0, 1/2, 1/2)

path_data <- simplex_path_data(
  starts = starts,               # Starting compositions
  ends = ends,                   # Ending compositions
  prop = species,                # Names of compositional predictors
  prediction = FALSE,            # Don't make predictions now
  # We will visualise the paths at 150 and 250 level of treatment
  add_var = list(treatment = c("150", "250"))
)
#> ✔ Finished data preparation.

Let’s now add the interaction terms and convert the treatments into dummy variables.

path_data <- path_data %>% 
  # Add additional interaction columns to data
  mutate(wfg_G = G1*G2, wfg_L = L1*L2, wfg_H = H1*H2,
         bfg_GL = (G1 + G2) * (L1 + L2),
         bfg_GH = (G1 + G2) * (H1 + H2),
         bfg_LH = (L1 + L2) * (H1 + H2)) %>% 
  # Add dummy variables for treatment
  mutate(treatment50 = ifelse(treatment == 50, 1, 0),
         treatment150 = ifelse(treatment == 150, 1, 0),
         treatment250 = ifelse(treatment == 250, 1, 0)) %>% 
  # Add predictions and uncertainty around it
  add_prediction(model = cust_model, interval = "confidence")

head(path_data)
#>          G1        G2        L1        L2        H1        H2 .InterpConst
#> 1 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667         0.00
#> 2 0.1700000 0.1700000 0.1650000 0.1650000 0.1650000 0.1650000         0.01
#> 3 0.1733333 0.1733333 0.1633333 0.1633333 0.1633333 0.1633333         0.02
#> 4 0.1766667 0.1766667 0.1616667 0.1616667 0.1616667 0.1616667         0.03
#> 5 0.1800000 0.1800000 0.1600000 0.1600000 0.1600000 0.1600000         0.04
#> 6 0.1833333 0.1833333 0.1583333 0.1583333 0.1583333 0.1583333         0.05
#>   .Group treatment    .add_str_ID      wfg_G      wfg_L      wfg_H    bfg_GL
#> 1      1       150 treatment: 150 0.02777778 0.02777778 0.02777778 0.1111111
#> 2      1       150 treatment: 150 0.02890000 0.02722500 0.02722500 0.1122000
#> 3      1       150 treatment: 150 0.03004444 0.02667778 0.02667778 0.1132444
#> 4      1       150 treatment: 150 0.03121111 0.02613611 0.02613611 0.1142444
#> 5      1       150 treatment: 150 0.03240000 0.02560000 0.02560000 0.1152000
#> 6      1       150 treatment: 150 0.03361111 0.02506944 0.02506944 0.1161111
#>      bfg_GH    bfg_LH treatment50 treatment150 treatment250    .Pred   .Lower
#> 1 0.1111111 0.1111111           0            1            0 30.26202 29.35551
#> 2 0.1122000 0.1089000           0            1            0 30.31401 29.40732
#> 3 0.1132444 0.1067111           0            1            0 30.36600 29.45878
#> 4 0.1142444 0.1045444           0            1            0 30.41800 29.50990
#> 5 0.1152000 0.1024000           0            1            0 30.47000 29.56069
#> 6 0.1161111 0.1002778           0            1            0 30.52201 29.61114
#>     .Upper
#> 1 31.16854
#> 2 31.22070
#> 3 31.27322
#> 4 31.32609
#> 5 31.37932
#> 6 31.43289

The data is expanded to include the following columns * .InterpConst: The interpolation constant between the start and end points (will be between 0 and 1). * .Group: An identifier for creating the effects curve. * .Pred: The predicted response for a community. * .Lower: The lower interval (at \(\alpha = 0.05\) level) for the predicted response. * .Upper: The upper interval (at \(\alpha = 0.05\) level) for the predicted response.

Note: As mentioned earlier, any columns added to the data before model-fitting should also be added to the template generated from the data-preparation functions to ensure we get accurate predictions. One could also add these columns directly to the starts and ends data.frames as well. However, it is recommended to not do so for two reasons. First, using the add_var argument will notify the corresponding plotting function to show the plots in separate panels for values specified in it. Secondly, the data-preparation functions might not always calculate the interaction terms properly if they are added before generating the template.}

The prepared data can then be passed to the simplex_path_plot() function to create the visualisation. We also demonstrate how certain plot aesthetics can be modified using the arguments from the simplex_path_plot() function as well as the labs() function from ggplot2.

simplex_path_plot(
  data = path_data,       # Data
  pie_colours = var_cols, # Colours for the pie-glyphs
  pie_radius = 0.35,      # Radius of pie-glyphs
  se = TRUE,              # Show confidence band around predictions
  # Show pie-glyphs at 0%, 25%, 50%, 75% and 100% along each path
  pie_positions = c(0, 0.25, 0.5, 0.75, 1), 
  nrow = 1, ncol = 2 # Arrange all plots in a 1 row and 2 columns
) +
  labs(y = "Predicted yield", fill = "Species") +
  ylim(min(path_data$.Lower), max(path_data$.Upper))
#> ✔ Created all plots.
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

Output from simplex_path_plot() function

The predicted yield for every community is higher at the 250 treatment level compared to the 150 level. At each level of treatment, compared to the centroid community the predicted yield increases as the sown proportion of grasses increases and decreases as the sown proportion of legumes and herbs increases.

Effect plots

For generating effects plot, just like the previous examples, we first prepare the data for the visualisation using the visualise_effects_data() function. We use the same subset-data that used earlier with the prediction_contributions_data() function and included the six monocultures, 50-50 mixtures of the two grasses, legumes and herb along with the six-species centroid mixture receiving the 250kg nitrogen treatment.

effects_data <- visualise_effects_data(
  data = subset_data,            # Data containing the communities to use for plot
  prop = species,                # Names of compositional predictors
  var_interest = species,        # Generate effects plot for all species
  prediction = FALSE             # Don't make predictions now
)
#> ✔ Finished data preparation.

Let’s add the interaction terms and generate the predictions.

effects_data <- effects_data %>% 
  # Add additional interaction columns to data
  mutate(wfg_G = G1*G2, wfg_L = L1*L2, wfg_H = H1*H2,
         bfg_GL = (G1 + G2) * (L1 + L2),
         bfg_GH = (G1 + G2) * (H1 + H2),
         bfg_LH = (L1 + L2) * (H1 + H2)) %>% 
  # Add predictions
  add_prediction(model = cust_model)
head(effects_data)
#>     G1   G2 L1 L2 H1 H2 richness treatment G L H  wfg_G wfg_L wfg_H bfg_GL
#> 1 0.00 1.00  0  0  0  0        1       250 1 0 0 0.0000     0     0      0
#> 2 0.01 0.99  0  0  0  0        1       250 1 0 0 0.0099     0     0      0
#> 3 0.02 0.98  0  0  0  0        1       250 1 0 0 0.0196     0     0      0
#> 4 0.03 0.97  0  0  0  0        1       250 1 0 0 0.0291     0     0      0
#> 5 0.04 0.96  0  0  0  0        1       250 1 0 0 0.0384     0     0      0
#> 6 0.05 0.95  0  0  0  0        1       250 1 0 0 0.0475     0     0      0
#>   bfg_GH bfg_LH treatment50 treatment150 treatment250 response .Sp .Proportion
#> 1      0      0           0            0            1   23.644  G1        0.00
#> 2      0      0           0            0            1   23.644  G1        0.01
#> 3      0      0           0            0            1   23.644  G1        0.02
#> 4      0      0           0            0            1   23.644  G1        0.03
#> 5      0      0           0            0            1   23.644  G1        0.04
#> 6      0      0           0            0            1   23.644  G1        0.05
#>   .Group  .Effect    .Pred
#> 1      1 increase 22.81348
#> 2      1 increase 23.31846
#> 3      1 increase 23.81527
#> 4      1 increase 24.30390
#> 5      1 increase 24.78435
#> 6      1 increase 25.25663

The data is expanded to include the following columns

.Sp: The variable of interest who’s proportion is changed.
.Proportion: The proportion of the variable of interest in a community.
.Group: An identifier for creating the effects curve.
.Effect: A string describing whether the proportion of the variable of interest is increased or decreased.
.Pred: The predicted response for a community.
.Lower: The lower interval (at \(% confidence level = 0.95\) level) for the predicted response.
.Upper: The upper interval (at \(% confidence level = 0.95\) level) for the predicted response.

The prepared data with the predicted yields can then be passed to the visualise_effects_plot() to generate the effects plot for each compositional predictor.

visualise_effects_plot(
  data = effects_data,    # Data
  pie_colours = var_cols, # Colours for the pie-glyphs
  pie_radius = 0.35       # Radius of pie-glyphs
) +
  labs(y = "Predicted yield", fill = "Species")
#> ✔ Created plot.

Output from visualise_effects_plot() function

The average effect of increasing the proportion of each species follows a quadratic curve (solid black line). The average effect of increasing the sown proportion of G1 is stronger compared to the all other species. The other species have similar performance as their sown proportion is increased. Moreover, increasing the sown proportion of any species besides G1 over 50% results in a sharp decline in the predicted yield.

Using DImodelsVis with regression models not fit using the DImodels package

Load libraries

Load data

Model fitting

Fitting the model

Visualisations

Model diagnostics plot

Prediction contributions

Conditional ternary diagrams

Grouped ternary diagrams

Simplex path plot

Effect plots