---
title: "Using FLORAL for survival models with longitudinal microbiome data"
output:
rmarkdown::html_vignette:
md_extensions: [
"-autolink_bare_uris"
]
vignette: >
%\VignetteIndexEntry{Using FLORAL for survival models with longitudinal microbiome data}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
out.width = "100%"
)
```
```{r setup, warning=FALSE, message=FALSE}
library(FLORAL)
library(dplyr)
library(patchwork)
library(survival)
set.seed(8192024)
```
In this vignette, we illustrate how to apply `FLORAL` to fit a Cox model with longitudinal microbiome data. Due to limited availability of public data sets with survival information, we use simulated data for illustrative purposes.
## Data simulation
We will use the built-in simulation function `simu()` to generate longitudinal compositional features and the corresponding time-to-event. The underlying methodology used for the simulation is based on a piece-wise exponential distribution as described by [Hendry 2014](https://doi.org/10.1002/sim.5945).
By default, the first 10 features out of the 500 features simulated below are associated with the time-to-event.
```{r simulation}
simdat <- simu(n=200, # sample size
p=500, # number of features
model="timedep",
pct.sparsity = 0.8, # proportion of zeros
rho=0, # feature-wise correlation
longitudinal_stability = TRUE # choose to simulate longitudinal features with stable trajectories
)
```
With the simulated data, the log-ratio lasso Cox model with time-dependent features can be fitted by running the following function. Here we provide a detailed description on each arguments:
* First of all, please use `longitudinal = TRUE` such that the algorithm would use the appropriate method to handle longitudinal data.
* The feature matrix input `x` should be the count matrix where rows specify samples and columns specify features.
* The vector of IDs of subjects/patients corresponding to the rows of `x` should be input as `id`.
* The vector of sample collection times corresponding to the rows of `x` should be input as `tobs`.
* The `Surv` object (`Surv(time,status)`) of **unique patients** should be input as `y`. Please note that the survival data should be sorted with respect to the IDs specified in `id`.
```{r FLORAL, warning=FALSE, message=FALSE}
fit <- FLORAL(x=simdat$xcount,
y=Surv(simdat$data_unique$t,simdat$data_unique$d),
family="cox",
longitudinal = TRUE,
id = simdat$data$id,
tobs = simdat$data$t0,
progress=FALSE,
plot=TRUE)
fit$selected
```
The list of selected features is saved in `fit$selected` as shown above.
To appropriately prepare the data in practice, we have the following recommendations:
* Start with patient metadata which includes survival data (time and status), sorting the metadata by patient IDs. Extract time and status variables for the `Surv` object for input as `y`.
* Curate the microbiome feature data matrix, sorted by patient IDs and time of sample collection. Save the patient ID and time of sample collection vectors for `id` and `tobs`. Save the feature table for input as `x`.