The package \textsf{NBtsVarSel} provides functions for performing variable selection approach in sparse negative binomial GLARMA models, which are pervasive for modeling discrete-valued time series with overdispersion. The method consists in estimating the autoregressive moving average (ARMA) coefficients of GLARMA models and the overdispersion parameter with performing variable selection in regression coefficients of Generalized Linear Models (GLM) with regularised methods. For further details on the methodology we refer the reader to [1].

We describe the negative binomial GLARMA model for a single time series with additional covariates. Given the past history \(\mathcal{F}_{t-1} = \sigma(Y_s, s\leq t-1)\), we assume that
\begin{equation}Y*t | \mathcal{F}*{t-1} \sim \text{NB}\left(\mu*t ^{\star,} \alpha^{\star\right),}
\label{eq1}
\end{equation}
where \(\text{NB}(\mu, \alpha)\) denotes the negative binomial distribution with mean \(\mu\) and overdispersion parameter \(\alpha\). In (\ref{eq1}),
\begin{equation}\label{eq2}
\mu_t^{\star=\exp(W_t\star)} \textrm{ with } W_t^{\star=\sum}*{i=0}

We load the dataset of observations \verb|Y| with size \(n=50\) provided within the package.

```
data(Y)
```

The number of regressor variables \(p\) is equal to 30. Data \verb|Y| is generated with \(\pmb{\gamma}^{\star} = (0.5)\), \(\alpha=2\), and \(\pmb{\beta^{\star}}\), such that all the \(\beta_i^{\star}=0\) except for five of them: \(\beta_1^{\star}=1.73\), \(\beta_3^{\star}=0.38\), \(\beta_{17}^{\star}=0.29\), \(\beta_{21}^{\star}=-0.64\), and \(\beta_{23}^{\star}=-0.13\). The design matrix \(X\) is built by taking the covariates in a Fourier basis.

We initialize \(\pmb{\gamma^{0}} = (0)\) and \(\pmb{\beta^{0}}\) to be the coefficients estimated by \textsf{glm} function:

```
gamma0 = c(0)
glm_nb = glm.nb(Y~t(X)[,2:(p+1)])
beta0<-as.numeric(glm_nb$coefficients)
alpha0 = glm_nb$theta
```

We can estimate \(\pmb{\gamma^{\star}}\) with the Newton-Raphson method. The output is the vector of estimation of \(\pmb{\gamma^{\star}}\). The default number of iterations \verb|n_iter| of the Newton-Raphson algorithm is 100.

```
gamma_est_nr = NR_gamma(Y, X, beta0, gamma0, alpha0, n.iter=100)
gamma_est_nr
```

```
## [1] 0.2407588
```

This estimation is obtained by taking initial values \(\pmb{\gamma^{0}}\) and \(\pmb{\beta^{0}}\), which will improve once we substitute the initial values by \(\pmb{\hat{\gamma}}\) and \(\pmb{\hat{\beta}}\) obtained by \verb|variable_selection| function.

We perform variable selection and obtain the coefficients which are estimated to be active and the estimates of \(\pmb{\gamma^{\star}}\) and \(\pmb{\beta^{\star}}\). We take the number of iterations of the algorithm \verb|k_max| equal to 2. We take \verb|min| method, which corresponds to the stability selection method with minimal \(\lambda\), where \verb|threshold| is equal to 0.7 and the number of replications \verb|nb_rep_ss| \(=1000\). For more details about stability selection and the choice of parameters we refer the reader to [1]. The function supports parallel computation. To make it work, users should download the package \textsf{doMC}, which is not supported on Windows platforms.

```
result = variable_selectionresult = variable_selection(Y, X,
gamma.init = gamma0, alpha.init = NULL, k.max = 1, method = "cv",
t = 0.3, n.iter = 100, n.rep = 1000)
```

```
## Warning: executing %dopar% sequentially: no parallel backend registered
```

```
beta_est = result$beta_est
Estim_active = result$estim_active
gamma_est = result$gamma_est
alpha_est = result$alpha_est
```

```
## Estimated active coefficients: 1 3 17 21
```

```
## Estimated gamma: 0.2407588
```

```
## Estimated alphaa: 2.636391
```

We display a plot that illustrates which elements of \(\pmb{\beta^{\star}}\) are selected to be active and how close the estimated value \(\hat{\beta_i}\) is to the actual values \(\beta^{\star}_i\). True values of \(\pmb{\beta^{\star}}\) are plotted in crosses and estimated values are plotted in dots.

```
# First, we make a dataset of estimated betas
beta_data = data.frame(beta_est)
colnames(beta_data)[1] <- "beta"
beta_data$Variable = seq(1, (p + 1), 1)
beta_data$y = 0
beta_data = beta_data[beta_data$beta != 0, ]
# Next, we make a dataset of true betas
beta_t_data = data.frame(beta)
colnames(beta_t_data)[1] <- "beta"
beta_t_data$Variable = seq(1, (p + 1), 1)
beta_t_data$y = 0
beta_t_data = beta_t_data[beta_t_data$beta != 0, ]
# Finally, we plot the result
plot = ggplot() + geom_point(data = beta_data, aes(x = Variable,
y = y, color = beta), pch = 16, size = 5, stroke = 2) +
geom_point(data = beta_t_data, aes(x = Variable, y = y,
color = beta), pch = 4, size = 6, stroke = 2) +
scale_color_gradient2(name = expression(hat(beta)),
midpoint = 0, low = "steelblue", mid = "white",
high = "red") + scale_x_continuous(breaks = c(1,
seq(10, (p + 1), 10)), limits = c(0, (p + 1))) + scale_y_continuous(breaks = c(),
limits = c(-1, 1)) + theme(legend.title = element_text(color = "black",
size = 12, face = "bold"), legend.text = element_text(color = "black",
size = 10)) + theme(axis.text.x = element_text(angle = 90),
axis.text = element_text(size = 10), axis.title = element_text(size = 10,
face = "bold"), axis.title.y = element_blank())
plot
```

As we can see from the plot, all the zero coefficients are estimated to be zero and four out of five non-zero coefficients are estimated to be non-zero. Moreover, the estimates also correctly preserved the sign of the coefficients.

**References**

[1] M. Gomtsyan. “Variable selection in a specific regression time series of counts”, arXiv:arXiv:2307.00929