This notebook provides a detailed overview over the
`plasso`

package and its two main functions
`plasso`

and `cv.plasso`

which were developed in
the course of Knaus (2022). This package is strongly oriented
around the `glmnet`

package and rests on its standard
function `glmnet`

in its very basis. Related theory and
algorithms are described in Friedman, Hastie, and
Tibshirani (2010).

The very latest version of the package can be installed from its Github page. For the
installation you will need the `devtools`

package. The latest
‘official’ version can be installed from CRAN using
`install.packages()’. We recommend the latter.

General dependencies are: `glmnet`

, `Matrix`

,
`methods`

, `parallel`

, `doParallel`

,
`foreach`

and `iterators`

.

```
library(devtools)
::install_github("stefan-1997/plasso")
devtools
install.packages("plasso")
```

Load `plasso`

using `library()`

.

`library(plasso)`

The package generally provides two functions `plasso`

and
`cv.plasso`

which are both built on top of the
`glmnet`

functionality. Specifically, a `glmnet`

object lives within both functions and also in their outputs (list item
`lasso_full`

).

The term `plasso`

refers to a Post-Lasso model which
estimates a least squares algorithm only for the active (i.e. non-zero)
coefficients of a previously estimated Lasso models. This follows the
idea that we want to do selection but without shrinkage.

The package comes with some simulated data representing the following DGP:

The covariates matrix \(X\) consists of 10 variables whose effect size one the target \(Y\) is defined by the vector \(\boldsymbol{\pi} = [1, -0.83, 0.67, -0.5, 0.33, -0.17, 0, ..., 0]'\) where the first six effect sizes decrease in absolute terms continuously from 1 to 0 and alternate in their sign. The true causal effect of all other covariates is 0. The variables in \(X\) follow a normal distribution with mean zero while the covariance matrix follows a Toeplitz matrix, which is characterized by having constant diagonals: \[ \boldsymbol{\Sigma} = \begin{bmatrix} 1 & 0.7 & 0.7^2 & ... & 0.7^{9} \\ 0.7 & 1 & 0.7 & ... & 0.7^{8} \\ 0.7^2 & 0.7 & 1 & ... & 0.7^{7} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0.7^{9} & 0.7^{8} & 0.7^{7} & ... & 1 \end{bmatrix} \]

The target \(\boldsymbol{y}\) is then a linear transformation of \(\boldsymbol{X}\) plus a vector of standard normal random variables. Each element of \(\boldsymbol{y}\) is given by: \[ y_i = \boldsymbol{X}_i \boldsymbol{\pi} + \varepsilon_i \] where \(\varepsilon_i \sim \mathcal{N}(0,4)\).

```
data(toeplitz)
= as.matrix(toeplitz[,1])
y = toeplitz[,-1] X
```

`plasso`

returns least squares estimates for all lambda
values of a standard `glmnet`

object for both a simple Lasso
and a Post-Lasso model.

`= plasso::plasso(X,y) p `

You can plot the coefficient paths for both the Post-Lasso model as well as the underlying ‘original’ Lasso model. This nicely illustrates the difference between the Lasso and Post-Lasso models where the latter is characterized by jumps in its coefficient paths every time a new variable enters the active set.

`plot(p, lasso=FALSE, xvar="lambda")`

`plot(p, lasso=TRUE, xvar="lambda")`

We can also have a look at which coefficients are active for a chosen lambda value. Here, the difference between Post-Lasso and Lasso becomes clearly visible. For the Lasso model, there is not only feature selection but shrinkage which results in the active coefficients being smaller than for the Post-Lasso model:

```
= coef(p, s=0.01)
coef_p
as.vector(coef_p$plasso)
```

```
## [1] 0.1438137 1.0187628 -0.6214926 0.4673645 -0.2300834 -0.3575276
## [7] 0.2180390 0.1180676 -0.2138268 0.1975462 -0.1047983
```

`as.vector(coef_p$lasso)`

```
## [1] 0.14498611 0.98729386 -0.56374511 0.40656768 -0.20023679 -0.33156564
## [7] 0.18985685 0.08930237 -0.16087044 0.13798825 -0.06639638
```

The `cv.plasso`

function uses cross-validation to
determine the performance of different values for the
`lambda`

penalty term for both models (Post-Lasso and Lasso).
The returned output of class `cv.plasso`

includes the mean
squared errors.

When applying the `summary`

method and setting the
`default`

parameter as FALSE, you can get some informative
output considering the optimal choice of lambda.

```
= plasso::cv.plasso(X,y,kf=5)
p.cv summary(p.cv, default=FALSE)
```

```
##
## Call:
## plasso::cv.plasso(x = X, y = y, kf = 5)
##
## Lasso:
## Minimum CV MSE Lasso: 15.22
## Lambda at minimum: 0.01858
## Active variables at minimum: (Intercept) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
## Post-Lasso:
## Minimum CV MSE Post-Lasso: 15.2
## Lambda at minimum: 0.2087
## Active variables at minimum: (Intercept) X1 X5
```

Using the `plot`

method extends the basic
`glmnet`

visualization by the cross-validated MSEs for the
Post-Lasso model.

`plot(p.cv, legend_pos="left", legend_size=0.5)`

We can use the following code to get the optimal lambda value (for the Post-Lasso model here) and the associated coefficients at that value of \(\lambda\).

`$lambda_min_pl p.cv`

`## [1] 0.2087288`

```
= coef(p.cv, S="optimal")
coef_pcv as.vector(coef_pcv$plasso)
```

```
## [1] 0.1410181 0.7663423 0.0000000 0.0000000 0.0000000 -0.3000942
## [7] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
```

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2010.
“Regularization Paths for Generalized Linear Models via Coordinate
Descent.” *Journal of Statistical Software* 33 (1): 1–22.
https://doi.org/10.18637/jss.v033.i01.

Knaus, Michael C. 2022. “Double machine
learning-based programme evaluation under
unconfoundedness.” *The Econometrics Journal* 25
(3): 602–27. https://doi.org/10.1093/ectj/utac015.