10. Make Datasets Documentation with write_man()

Introduction

R packages which contain datasets need documentation. The Roxygen2 package helps write R manual pages but there are many details. The write_man() function examines a dataset and writes a R file that contains Roxygen2 code which produces documents the dataset. The documentation is written using Markdown to organize details including the number of rows and columns in the dataset, the names, the labels (if any) and types of variable. The levels of categorical variables are also included.

Getting Started

Load the dataset you want to document into the global environment using tools like tidyREDCap::import_instruments() or the_data <- readr::read_csv("an_csv_file.csv") or the_data <- readxl::read_excel("an_excel_file.xlsx"). Then use write_man("the_data"). For example:

library(tidyverse)
library(conflicted)
library(labelled)  # for set_variable_labels()
demographics <- 
  readxl::read_excel("demographics.xlsx") |> 
  mutate(sex2 = as_factor(sex), .keep="unused") |> 
  labelled::set_variable_labels(
    age = "Age in Years",
    sex2 = "Sex assigned at Birth"
  )

rUM::write_man("demographics")

This produces a R file, whose name matches the dataset, in the R folder, that has details like

#' demographics dataset
#'
#' @description Description of the demographics dataset goes here
#'
#' @format A tibble with 3 rows and 2 variables:
#' \describe{
#'   \item{age}{
#'
#' | *Type:*        | numeric       |
#' | -------------- | ------------- |
#' |                |               |
#' | *Description:* | Age in Years |
#'
#'   }
#'   \item{sex2}{
#'
#' | *Type:*        | factor (First/Reference level = `Male`) |
#' | -------------- | ---------------------------------------------------- |
#' |                |                                                      |
#' | *Description:* | Sex assigned at Birth |
#' |                |                                                      |
#' | *Levels:*      | `Male, Female`           |
#'
#'   }
#' }
#' @source Where the data came from
"demographics"

If your dataset does not have labels you will need to modify the
#' | *Description:* | Description for *thingy* goes here |
line to have an useful description.

You will also want to modify the
@source Where the data came from
line to properly cite the source of the data.

Conclusion

If you follow this workflow for each of your datasets you will have user friendly documentation which will be ready for CRAN.