This vignette illustrates the standard use of the `PLNLDA`

function and the methods accompanying the R6 Classes `PLNLDA`

and `PLNLDAfit`

.

The packages required for the analysis are **PLNmodels**
plus some others for data manipulation and representation:

We illustrate our point with the trichoptera data set, a full description of which can be found in the corresponding vignette. Data preparation is also detailed in the specific vignette.

The `trichoptera`

data frame stores a matrix of counts
(`trichoptera$Abundance`

), a matrix of offsets
(`trichoptera$Offset`

) and some vectors of covariates
(`trichoptera$Wind`

, `trichoptera$Temperature`

,
etc.) In the following, we’re particularly interested in the
`trichoptera$Group`

**discrete** covariate which
corresponds to disjoint time spans during which the catching took place.
The correspondence between group label and time spans is:

Label | Number.of.Consecutive.Nights | Date |
---|---|---|

1 | 12 | June 59 |

2 | 5 | June 59 |

3 | 5 | June 59 |

4 | 4 | June 59 |

5 | 4 | July 59 |

6 | 1 | June 59 |

7 | 3 | June 60 |

8 | 4 | June 60 |

9 | 5 | June 60 |

10 | 4 | June 60 |

11 | 1 | June 60 |

12 | 1 | July 60 |

In the vein of Fisher (1936) and Rao (1948), we introduce a multi-class LDA model for multivariate count data which is a variant of the Poisson Lognormal model of Aitchison and Ho (1989) (see the PLN vignette as a reminder). Indeed, it can viewed as a PLN model with a discrete group structure in the latent Gaussian space.

This PLN-LDA model can be written in a hierarchical framework where a sample of \(p\)-dimensional observation vectors \(\mathbf{Y}_i\) is related to some \(p\)-dimensional vectors of latent variables \(\mathbf{Z}_i\) and a discrete structure with \(K\) groups in the following way: \[\begin{equation} \begin{array}{rcl} \text{group structure } & \mathbf{\mu}_i = \mu_{g_i} & g_i \in \{1, \dots, K\} \\ \text{latent space } & \mathbf{Z}_i \quad \text{indep.} & \mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol{\Sigma}) & \\ \text{observation space } & Y_{ij} | Z_{ij} \quad \text{indep.} & Y_{ij} | Z_{ij} \sim \mathcal{P}\left(\exp\{Z_{ij}\}\right) \end{array} \end{equation}\] where \(g_i\) denotes the group sample \(i\) belongs to.

The different parameters \({\boldsymbol\mu}_k \in\mathbb{R}^p\) corresponds to the group-specific main effects and the variance matrix \(\boldsymbol{\Sigma}\) is shared among groups. An equivalent way of writing this model is the following: \[\begin{equation} \begin{array}{rcl} \text{latent space } & \mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i,\boldsymbol\Sigma) & \boldsymbol{\mu}_i = \mathbf{g}_i^\top \mathbf{M} \\ \text{observation space } & Y_{ij} | Z_{ij} \quad \text{indep.} & Y_{ij} | Z_{ij} \sim \mathcal{P}\left(\exp\{Z_{ij}\}\right), \end{array} \end{equation}\] where, with a slight abuse of notation, \(\mathbf{g}_i\) is a group-indicator vector of length \(K\) (\(g_{ik} = 1 \Leftrightarrow g_i = k\)) and \(\mathbf{M} = [\boldsymbol{\mu}_1^\top, \dots, \boldsymbol{\mu}_K^\top]^\top\) is a \(K \times p\) matrix collecting the group-specific main effects.

Just like PLN, PLN-LDA generalizes to a formulation close to a multivariate generalized linear model where the main effect is due to a linear combination of the discrete group structure, \(d\) covariates \(\mathbf{x}_i\) and a vector \(\mathbf{o}_i\) of \(p\) offsets in sample \(i\). The latent layer then reads \[\begin{equation} \mathbf{Z}_i \sim \mathcal{N}({\mathbf{o}_i + \mathbf{g}_i^\top \mathbf{M} + \mathbf{x}_i^\top\mathbf{B}},\boldsymbol\Sigma) \end{equation}\] where \(\mathbf{B}\) is a \(d\times p\) matrix of regression parameters.

Given:

- a new observation \(\mathbf{Y}\) with associated offset \(\mathbf{o}\) and covariates \(\mathbf{x}\)
- a model with estimated parameters \(\hat{\boldsymbol{\Sigma}}\), \(\hat{\mathbf{B}}\), \(\hat{\mathbf{M}}\) and group counts \((n_1, \dots, n_K)\)

We can predict the observation’s group using Bayes rule as follows: for \(k \in {1, \dots, K}\), compute \[\begin{equation} \begin{aligned} f_k(\mathbf{Y}) & = p(\mathbf{Y} | \mathbf{g} = k, \mathbf{o}, \mathbf{x}, \hat{\mathbf{B}}, \hat{\boldsymbol{\Sigma}}) \\ & = \boldsymbol{\Phi}_{PLN}(\mathbf{Y}; \mathbf{o} + \boldsymbol{\mu}_k + \mathbf{x}^\top \hat{\mathbf{B}}, \hat{\boldsymbol{\Sigma}}) \\ p_k & = \frac{n_k}{\sum_{k' = 1}^K n_{k'}} \end{aligned} \end{equation}\] where \(\boldsymbol{\Phi}_{PLN}(\bullet; \boldsymbol{\mu}, \boldsymbol{\Sigma})\) is the density function of a PLN distribution with parameters \((\boldsymbol{\mu}, \boldsymbol{\Sigma})\). \(f_k(\mathbf{Y})\) and \(p_k\) are respectively plug-in estimates of (i) the probability of observing counts \(\mathbf{Y}\) in a sample from group \(k\) and (ii) the probability that a sample originates from group \(k\).

The posterior probability \(\hat{\pi}_k(\mathbf{Y})\) that observation \(\mathbf{Y}\) belongs to group \(k\) and most likely group \(\hat{k}(\mathbf{Y})\) can thus be defined as \[\begin{equation} \begin{aligned} \hat{\pi}_k(\mathbf{Y}) & = \frac{p_k f_k(\mathbf{Y})}{\sum_{k' = 1}^K p_{k'} f_{k'}(\mathbf{Y})} \\ \hat{k}(\mathbf{Y}) & = \underset{k \in \{1, \dots, K\}}{\arg\max} \hat{\pi}_k(\mathbf{Y}) \end{aligned} \end{equation}\]

Classification and prediction are the main objectives in (PLN-)LDA.
To reach this goal, we first need to estimate the model parameters.
Inference in PLN-LDA focuses on the group-specific main effects \(\mathbf{M}\), the regression parameters
\(\mathbf{B}\) and the covariance
matrix \(\boldsymbol\Sigma\).
Technically speaking, we can treat \(\mathbf{g}_i\) as a discrete covariate and
estimate \([\mathbf{M}, \mathbf{B}]\)
using the same strategy as for the standard **PLN** model.
Briefly, we adopt a variational strategy to approximate the
log-likelihood function and optimize the consecutive variational
surrogate of the log-likelihood with a gradient-ascent-based approach.
To this end, we rely on the CCSA algorithm of Svanberg (2002)
implemented in the C++ library (Johnson 2011), which we link to the
package.

In the package, the PLN-LDA model is adjusted with the function
`PLNLDA`

, which we review in this section. This function
adjusts the model and stores it in a object of class
`PLNLDAfit`

which inherits from the class
`PLNfit`

, so we strongly recommend the reader to be somehow
comfortable with `PLN`

and `PLNfit`

before using
`PLNLDA`

(see the PLN vignette).

We start by adjusting the above model to the Trichoptera data set. We
use `Group`

, the catching time spans, as a discrete structure
and use log as an offset to capture differences in sampling luck.

The model can be fitted with the function `PLNLDA`

as
follows:

```
##
## Performing discriminant Analysis...
## DONE!
```

Note that `PLNLDA`

uses the standard `formula`

interface, like every other model in the **PLNmodels**
package.

`PLNLDAfit`

The `myLDA_nocov`

variable is an `R6`

object
with class `PLNLDAfit`

, which comes with a couple of methods.
The most basic is the `show/print`

method, which sends a
brief summary of the estimation process and available methods:

```
## Linear Discriminant Analysis for Poisson Lognormal distribution
## ==================================================================
## nb_param loglik BIC ICL
## 357 -799.824 -1494.514 -1082.492
## ==================================================================
## * Useful fields
## $model_par, $latent, $latent_pos, $var_par, $optim_par
## $loglik, $BIC, $ICL, $loglik_vec, $nb_param, $criteria
## * Useful S3 methods
## print(), coef(), sigma(), vcov(), fitted()
## predict(), predict_cond(), standard_error()
## * Additional fields for LDA
## $percent_var, $corr_map, $scores, $group_means
## * Additional S3 methods for LDA
## plot.PLNLDAfit(), predict.PLNLDAfit()
```

Comprehensive information about `PLNLDAfit`

is available
via `?PLNLDAfit`

.

The user can easily access several fields of the
`PLNLDAfit`

object using `S3`

methods, the most
interesting ones are

- the \(p \times p\) covariance matrix \(\hat{\boldsymbol{\Sigma}}\):

- the regression coefficient matrix \(\hat{\mathbf{B}}\) (in this case
`NULL`

as there are no covariates)

`## NULL`

- the \(p \times K\) matrix of group means \(\mathbf{M}\)

grouping1 | grouping2 | grouping3 | grouping4 | grouping5 | grouping6 | grouping7 | grouping8 | grouping9 | grouping10 | grouping11 | grouping12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-26.99 | -6.82 | -26.56 | -25.38 | -28.34 | -28.92 | -27.23 | -26.37 | -26.45 | -5.64 | -3.46 | -25.37 |

-26.98 | -27.72 | -5.67 | -25.37 | -7.44 | -28.90 | -27.22 | -26.36 | -26.43 | -26.50 | -24.34 | -4.48 |

-2.38 | -3.96 | -2.36 | -2.92 | -6.51 | -5.54 | -2.70 | -2.03 | -2.71 | -2.65 | -24.44 | -3.85 |

-26.99 | -27.74 | -5.69 | -4.52 | -28.28 | -28.96 | -27.25 | -5.51 | -5.59 | -4.91 | -24.39 | -25.41 |

-0.28 | -0.26 | -0.62 | -1.09 | -0.61 | -0.11 | -0.66 | -0.80 | -0.39 | -0.45 | -0.61 | -1.16 |

-4.00 | -3.32 | -2.93 | -4.47 | -6.72 | -8.00 | -2.85 | -3.06 | -5.53 | -3.99 | -24.32 | -25.34 |

The `PLNLDAfit`

class also benefits from two important
methods: `plot`

and `predict`

.

`plot`

methodThe `plot`

methods provides easy to interpret graphics
which reveals here that the groups are well separated:

By default, `plot`

shows the first 3 axis of the LDA when
there are 4 or more groups and uses special representations for the edge
cases of 3 or less groups.

`ggplot2`

-savvy users who want to make their own
representations can extracts the \(n \times
(K-1)\) matrix of sample scores from the `PLNLDAfit`

object …

LD1 | LD2 | LD3 | LD4 | LD5 | LD6 | LD7 | LD8 | LD9 | LD10 | LD11 |
---|---|---|---|---|---|---|---|---|---|---|

4323755 | -258930.5 | -86710.46 | -95694.11 | -28601.92 | 15856.26 | 2050.06 | 2377.53 | 160.19 | 51.09 | -165.24 |

4323566 | -263460.8 | -86587.85 | -96223.82 | -28408.38 | 15864.20 | 2072.05 | 2435.32 | 153.66 | 53.12 | -166.51 |

4323404 | -305794.6 | -85065.32 | -99745.12 | -26724.08 | 15650.09 | 1846.05 | 2387.27 | 183.75 | 57.62 | -159.26 |

4322958 | -353775.7 | -83386.55 | -103532.26 | -24844.54 | 15344.36 | 1476.14 | 2140.68 | 230.84 | 60.06 | -149.95 |

4323121 | -326011.6 | -84396.06 | -101269.07 | -25953.61 | 15490.85 | 1644.05 | 2197.54 | 208.88 | 58.04 | -154.87 |

4323379 | -290829.9 | -85633.04 | -98443.34 | -27331.31 | 15700.71 | 1882.34 | 2328.87 | 178.17 | 55.01 | -161.42 |

…or the \(p \times (K-1)\) matrix of correlations between scores and (latent) variables

LD1 | LD2 | LD3 | LD4 | LD5 | LD6 | LD7 | LD8 | LD9 | LD10 | LD11 | |
---|---|---|---|---|---|---|---|---|---|---|---|

Che | -0.28 | 0.23 | 0.60 | 0.22 | 0.59 | -0.40 | 0.15 | 0.39 | 0.84 | -0.50 | 0.02 |

Hyc | -0.08 | -0.44 | 0.02 | 0.39 | 0.03 | -0.40 | -0.65 | -0.94 | -0.05 | -0.17 | 0.57 |

Hym | 0.07 | -0.19 | -0.21 | -0.76 | -0.33 | -0.15 | 0.34 | 0.01 | -0.32 | 0.03 | -0.60 |

Hys | -0.51 | 0.19 | -0.49 | -0.05 | -0.30 | -0.49 | 0.33 | -0.01 | -0.30 | -0.07 | 0.25 |

Psy | 0.28 | 0.23 | 0.23 | -0.28 | 0.40 | 0.20 | 0.23 | 0.08 | 0.49 | -0.58 | -0.37 |

Aga | 0.10 | -0.15 | -0.24 | -0.89 | -0.05 | -0.21 | 0.15 | 0.15 | -0.12 | -0.20 | -0.52 |

`predict`

methodThe `predict`

method has a slightly different behavior
than its siblings in other models of the **PLNmodels**. The
goal of `predict`

is to predict the discrete class based on
observed *species counts* (rather than predicting counts from
known covariates).

By default, the `predict`

use the argument
`type = "posterior"`

to output the matrix of log-posterior
probabilities \(\log(\hat{\pi})_k\)

```
predicted.class <- predict(myLDA_nocov, newdata = trichoptera)
## equivalent to
## predicted.class <- predict(myLDA_nocov, newdata = trichoptera, type = "posterior")
predicted.class %>% head() %>% knitr::kable(digits = 2)
```

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-17.45 | -46.50 | -43.54 | -25.48 | -54.71 | -32.73 | -25.92 | -87.53 | -84.59 | -20.60 | -207.84 | -74.88 |

-9.67 | -17.04 | -59.96 | -21.33 | -72.21 | -27.29 | -14.70 | -15.58 | -15.21 | -14.70 | -121.96 | -66.44 |

-12.94 | -22.57 | -115.18 | -39.94 | -120.03 | -30.78 | -22.54 | -50.11 | -44.51 | -22.94 | -149.15 | -120.62 |

-18.43 | -31.03 | -122.03 | -115.41 | -144.04 | -54.03 | -41.92 | -114.68 | -103.31 | -44.66 | -313.94 | -227.41 |

-13.84 | -20.01 | -52.73 | -60.79 | -72.69 | -38.18 | -26.89 | -51.84 | -45.29 | -25.88 | -196.75 | -115.64 |

-9.25 | -13.49 | -37.24 | -24.40 | -48.20 | -23.73 | -14.62 | -15.91 | -15.30 | -14.32 | -101.14 | -67.77 |

You can also show them in the standard (and human-friendly) \([0, 1]\) scale with
`scale = "prob"`

to get the matrix \(\hat{\pi}_k\)

```
predicted.class <- predict(myLDA_nocov, newdata = trichoptera, scale = "prob")
predicted.class %>% head() %>% knitr::kable(digits = 3)
```

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|

0.959 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.041 | 0 | 0 |

0.980 | 0.001 | 0 | 0 | 0 | 0 | 0.006 | 0.003 | 0.004 | 0.006 | 0 | 0 |

1.000 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0 | 0 |

1.000 | 0.000 | 0 | 0 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0 | 0 |

0.998 | 0.002 | 0 | 0 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0 | 0 |

0.972 | 0.014 | 0 | 0 | 0 | 0 | 0.005 | 0.001 | 0.002 | 0.006 | 0 | 0 |

Setting `type = "response"`

, we can predict the most
likely group \(\hat{k}\) instead:

```
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 1 1 1 1 1 1 9 2 1 1 1 1 2 3 2 2 2 3 3 3 9 3 4 1 4 4
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
## 12 5 4 5 6 7 7 7 8 8 8 8 8 1 9 9 9 10 10 10 10 11 12
## Levels: 1 2 3 4 5 6 7 8 9 10 11 12
```

We can assess that the predictions are quite similar to the real
group (*this is not a proper validation of the method as we used data
set for both model fitting and prediction and are thus at risk of
overfitting*).

```
## true
## predicted 1 2 3 4 5 6 7 8 9 10 11 12
## 1 10 0 0 1 0 0 0 0 1 0 0 0
## 2 1 4 0 0 0 0 0 0 0 0 0 0
## 3 0 1 4 0 0 0 0 0 0 0 0 0
## 4 0 0 0 3 1 0 0 0 0 0 0 0
## 5 0 0 0 0 2 0 0 0 0 0 0 0
## 6 0 0 0 0 0 1 0 0 0 0 0 0
## 7 0 0 0 0 0 0 3 0 0 0 0 0
## 8 0 0 0 0 0 0 0 4 1 0 0 0
## 9 1 0 1 0 0 0 0 0 3 0 0 0
## 10 0 0 0 0 0 0 0 0 0 4 0 0
## 11 0 0 0 0 0 0 0 0 0 0 1 0
## 12 0 0 0 0 1 0 0 0 0 0 0 1
```

Finally, we can get the coordinates of the new data on the same graph
at the original ones with `type = "scores"`

. This is done by
averaging the latent positions \(\hat{\mathbf{Z}}_i + \boldsymbol{\mu}_k\)
(found when the sample is assumed to come from group \(k\)) and weighting them with the \(\hat{\pi}_k\). Some samples, have
compositions that put them very far from their group mean.

```
library(ggplot2)
predicted.scores <- predict(myLDA_nocov, newdata = trichoptera, type = "scores")
colnames(predicted.scores) <- paste0("Axis.", 1:ncol(predicted.scores))
predicted.scores <- as.data.frame(predicted.scores)
predicted.scores$group <- trichoptera$Group
plot(myLDA_nocov, map = "individual", nb_axes = 2, plot = FALSE) +
geom_point(data = predicted.scores,
aes(x = Axis.1, y = Axis.2, color = group, label = NULL))
```

It is possible to correct for other covariates before finding the LDA
axes that best separate well the groups. In our case ,we’re going to use
`Wind`

as a covariate and illustrate the main differences
with before :

```
myLDA_cov <- PLNLDA(Abundance ~ Wind + 0 + offset(log(Offset)),
grouping = Group,
data = trichoptera)
```

```
##
## Performing discriminant Analysis...
## DONE!
```

All fields of our new `PLNLDA`

fit can be accessed as
before with similar results. The only important difference is the result
of `coef`

: since we included a covariate in the model,
`coef`

now returns a 1-column matrix for \(\hat{\mathbf{B}}\) instead of
`NULL`

Wind | |
---|---|

Che | -0.3301236 |

Hyc | 1.4197887 |

Hym | -0.1927953 |

Hys | -0.4544197 |

Psy | 0.0365670 |

Aga | -0.0611589 |

The group-specific main effects can still be accessed with
`$group_means`

grouping1 | grouping2 | grouping3 | grouping4 | grouping5 | grouping6 | grouping7 | grouping8 | grouping9 | grouping10 | grouping11 | grouping12 |
---|---|---|---|---|---|---|---|---|---|---|---|

-20.37 | -10.84 | -24.28 | -19.17 | -25.82 | -43.78 | -29.97 | -28.34 | -19.42 | 4.58 | 11.73 | -24.00 |

-23.96 | -43.19 | -1.38 | -16.70 | -10.50 | -48.18 | -33.36 | -30.16 | -15.34 | -17.79 | -6.64 | 1.88 |

-1.07 | -4.47 | -1.70 | -1.68 | -5.75 | -6.99 | -2.65 | -1.82 | -1.45 | -1.44 | -25.75 | -3.26 |

-24.76 | -33.10 | -2.80 | 0.42 | -27.47 | -38.28 | -30.20 | -5.07 | -0.26 | 0.33 | -19.55 | -26.16 |

-0.05 | -0.42 | -0.47 | -0.85 | -0.44 | -0.43 | -0.59 | -0.71 | -0.07 | -0.15 | -0.19 | -1.03 |

-1.73 | -4.14 | -1.67 | -2.32 | -5.45 | -10.63 | -2.71 | -2.73 | -3.21 | -1.66 | -24.12 | -27.53 |

`plot`

methodOnce again, the `plot`

method is very useful to get a
quick look at the results.

`predict`

methodWe can again predict the most likely group for each sample :

and check that we recover the correct class in most cases (again, we used the same data set for model fitting and group prediction only for ease of exposition):

```
## true
## predicted 1 2 3 4 5 6 7 8 9 10 11 12
## 1 11 0 0 1 0 0 0 0 1 0 0 0
## 2 1 4 0 0 0 0 0 0 0 0 0 0
## 3 0 1 4 0 0 0 0 0 0 0 0 0
## 4 0 0 0 3 1 0 0 0 0 0 0 0
## 5 0 0 0 0 2 0 0 0 0 0 0 0
## 6 0 0 0 0 0 1 0 0 0 0 0 0
## 7 0 0 0 0 0 0 3 0 0 0 0 0
## 8 0 0 0 0 0 0 0 4 1 0 0 0
## 9 0 0 1 0 0 0 0 0 3 0 0 0
## 10 0 0 0 0 0 0 0 0 0 4 0 0
## 11 0 0 0 0 0 0 0 0 0 0 1 0
## 12 0 0 0 0 1 0 0 0 0 0 0 1
```

Aitchison, J., and C. H. Ho. 1989. “The Multivariate Poisson-Log
Normal Distribution.” *Biometrika* 76 (4): 643–53.

Fisher, R. A. 1936. “The Use of Multiple Measurements in Taxonomic
Problems.” *Annals of Eugenics* 7 (2): 179–88.

Johnson, Steven G. 2011. *The NLopt Nonlinear-Optimization
Package*. https://nlopt.readthedocs.io/en/latest/.

Rao, C. Radhakrishna. 1948. “The Utilization of Multiple
Measurements in Problems of Biological Classification.”
*Journal of the Royal Statistical Society. Series B
(Methodological)* 10 (2): 159–203.

Svanberg, Krister. 2002. “A Class of Globally Convergent
Optimization Methods Based on Conservative Convex Separable
Approximations.” *SIAM Journal on Optimization* 12 (2):
555–73.