This vignette provides a guide to the ggraptR package. It describes the situations where the package can be applied and provides context in terms of package design and application. It also provides examples of the package in action.
ggraptR is an open source R package providing a graphical user interface for data exploration and visualisation. It is based on principles of visualization analysis by Tamara Munzner, and also acts as a wrapper for functionality implemented in the grammar of graphics for R, ggplot2.
ggplot2 offers a wide array of marks and channels that constitute the building blocks for developing visual encodings, as well as a large selection of built-in visual idioms. However, users unfamiliar with the package may not have visibility of the options available to them, or find it challenging to express their envisioned design in their desired visual idiom.
ggraptR is designed to handle a spectrum of data visualisation needs ranging from visualising the raw values of individual variables through to fully aggregated, pivot-table style visualisations. Data volume and visual complexity is handled by the implementation of two approaches to handle visual complexity - faceting into multiple views, and reduction of items and attributes.
The standard workflow in ggraptR consists of
By default, the visualisation refreshes whenever there is a change to the dataset or any input control.
However, when dealing with large datasets that take awhile to render it may be advantageous to only generate the visualisation once multiple controls have been tweaked. In this case, the 'Enable reactivity' checkbox should be unchecked, and multiple changes can be made before clicking on the 'Submit' button.
There are two options for importing data. The preferred way is to load a dataset into a data frame within the R environment. All loaded data frames will be available to ggraptR when ggraptR is launched.
To check which data frames are available, click on the dropdown underneath 'Choose a dataset'. A number of pre-loaded datasets along with any data frames within the R environment should be shown. For users of RStudio, these can be found under the environment tab (next to history and build).
If your data is present in the environment and not in ggraptR, you should confirm if the data type is data frame using the command:
class(my_data_frame)
where 'my_data_frame' is the name of your data.
If it is not of type “data.frame” then you can coerce it using
data_frame_name <- as.data.frame(my_data_frame)
where 'my_data_frame' is the name of your data and 'data_frame_name' is the name of your data when saved into a data frame format.
To load a csv file into a new data frame, read data in csv format using the read.csv command:
data_frame_name <- read.csv("data_to_be_imported.csv")
where 'data_frame_name' is your preferred name for the dataset to be loaded and 'data_to_be_imported' is the name of the csv file containing the dataset. The csv file should be located in the working directory.
Alternatively, you may import a dataset, by clicking on the import tab as shown:
Click 'choose file', and select the required file.
If the first row of the file contains header names, make sure the checkbox next to 'Header' is ticked.
Choose the appropriate Quote and Separator buttons depending how quotes and delimiters are used in the flat file.
Note that the options default to a header, double quote and comma separated variables (CSV).
Once the file is uploaded, an 'Upload complete' bar will be shown. The dataset will immediately become available in the 'Choose a dataset' dropdown.
The toolbar has four sections:
To choose a dataset for visualiastion, select a dataset from the 'Choose a dataset' dropdown. You should see a number of pre-loaded R datasets along with any active data frames in your environments and the most recent file you uploaded.
To plot raw data, simply:
Please refer to Appendix A further down this page for recommendations for the use of individual visual idioms.
Click on the 'Show aesthetics' box to explore a number of options to improve plot aesthetics, or the 'Show themes' box to customise plot label aesthetics.
'Show aesthetics' reveals the following controls:
Note: The 'jitter' effect is often used to enhance visual interpretability when dealing with a larger data sets with many overlapping data points. In this case, jittering would reveal the relative density of points around each overlapping data point.
If data appears cluttered or overly spaced out in the plot, check the 'Flip X and Y coordinates' box to explore swapping axes.
'Show themes' reveals the following controls:
Note that the “Enable reactivity” checkbox needs to be disabled in order to edit the axes labels and plot title.
Click on the 'Show facets' check box to display controls for presenting a multi-plot data visualisation.
There are two faceting alternatives available to users:
The differences are shown below:
Optionally, the facet scale functionality allows the DE the freedom to re-scale the X-axis and/or Y-axis within each facet grid.
This is useful to improve visual clarity if the relative values of data attributes do not make good use of the visualisation space. However, caution should be taken as this may lead to misleading interpretations for the casual observer who misses the different scales used.
Click on the 'Show filtering' check box to display controls for zooming in on a subset of the plot.
The two sliders can be dragged to restrict the range of the data shown. The visualisation area will scale so that the subsetted range will be shown over the full area.
An example of this control in action is shown below:
Aggregation controls are used to simulate pivot chart functionality.
To plot aggregated measures of data, check the 'Show dataset type and aggregation method' box.
The 'Dataset Type' dropdown can be ignored.
Select the desired aggregation method from the dropdown menu.
The aggregation contol applies a function to transform raw data into count data or calculated values.