gcxgclab: GCxGC Preprocessing and Analysis

The goal of gcxgclab is to provide a comprehensive program for preprocessing and analysis of two dimensional gas chromatography data. It is equipped with functions for baseline correction, smoothing, peak identification, peak alignment, identification of EICs, Mass Spectra, targeted analysis, compound identification with NIST and non-targeted analysis, plus plotting and data visualization.

Installation

You can install the development version of gcxgclab from CRAN with:

install.packages("gcxgclab")

Citation

If using gcxgclab in your work, please cite us.

Gamble, Stephanie & Granger, Caroline & Mannion, Joseph. gcxgclab: An R-package for two dimensional gas chromatography preprocessing and analysis. in preparation. 2024.

Example

Provided here is an example of the work flow to fully process and analyze an example GCxGC sample.

First, load the package.

library(gcxgclab)

Load the first file. This should be a cdf file containing the processed GCxGC sample data.

Extract data from the file and create data frames for the total ion chromatogram (TIC) data and full mass and intensity data.

filename <- system.file('extdata','Sample1.cdf',package='gcxgclab')
rt_frame <- extract_data(filename)

To visualize the data, graph a contour map of original, raw TIC data. The title may be changed to any string. The daata can be plotted in either 1 of 2 time dimensions. 2 is shown.

plot_chr(rt_frame, title='Raw Data', scale='linear', dim=2)

For the scale of this data, it is useful to visualize with a contour plot in log base 10 of the TIC data. This is done by changing the scale input from linear to log (log is the default).

plot_chr(rt_frame, title='Log Intensity', scale="log")

If desired, the phase may be shifted up or down. Shifting the phase in the negative direction (down) by 2 seconds.

shifted <- phase_shift(rt_frame, shift=-2)
plot_chr(shifted, title='Shifted')

Performing Whittaker smoothing. Lambda=20 has been chosen for the smoothing coefficient (which is also the default value). Larger values of lambda indicate more smoothing. Lambda must be chosen between 0 and 1000. Plotting in log scale.

sm_frame <- smooth(rt_frame, lambda=20)
plot_chr(sm_frame, title = 'Smoothed')

Performing baseline correction. Gamma = 0.5 (which is also the default value), indicating a moderate amount of noise to be considered baseline. Gamma must be chosen between 0 and 1. 0 results in almost no data being subtracted to the baseline, and 1 results in a high amount of the data being subtracted to the baseline. Plotting in log scale.

frame <- bl_corr(sm_frame, gamma=0.5)
plot_chr(frame, title = 'Baseline Corrected')

It is helpful to identify peaks in the samples. There are two functions to locate the peaks. One finds the top N peaks, and the second finds all peaks above a given threshold.

Finding the top 20 peaks in the sample. Then finding all peaks above an intensity of 100,000.

peaks <- top_peaks(frame$TIC_df, N=20)
thrpeaks <- thr_peaks(frame$TIC_df, THR=100000)

Now to visualize these peaks. Ploting red circles over the highest 20 peaks on the chromatogram image.

plot_peak(peaks, frame, title="Top 20 Peaks")

Plot circles at the highest 20 peaks, size indicating the intensity.

plot_peakonly(peaks, title="Top 20 Peaks")

Ploting red circles over all peaks above 100,000 threshold on the chromatogram image.

plot_peak(thrpeaks, frame, title="Peaks Above 100,000")

Plot circles at all peaks above 10,000 threshold, size indicating the intensity.

plot_peakonly(thrpeaks, title="Peaks Above 100,000")

Loading a second cdf file and performing smoothing and baseline correction for alignment to the first sample.

filename2 <- system.file('extdata','Sample2.cdf',package='gcxgclab')
rt_frame2 <- extract_data(filename2)
sm_frame2 <- smooth(rt_frame2, lambda=20)
frame2 <- bl_corr(sm_frame2, gamma=0.5)

peaks2 <- top_peaks(frame2$TIC_df, N=20)
plot_peak(peaks2, frame, title="Top 20 Peaks, Sample 2")

Loading a third cdf file and performing smoothing and baseline correction for alignment to the first sample.

filename3 <- system.file('extdata','Sample3.cdf',package='gcxgclab')
rt_frame3 <- extract_data(filename3)
sm_frame3 <- smooth(rt_frame3, lambda=20)
frame3 <- bl_corr(sm_frame3, gamma=0.5)

peaks3 <- top_peaks(frame3$TIC_df, N=20)
plot_peak(peaks3, frame, title="Top 20 Peaks, Sample 3")

Performing batch alignment to a reference sample. “Sample 1” is given as the reference sample. This keeps all peaks from reference sample, then aligns corresponding peaks from each of the other samples, returns a list of data frames for each, keeping peak height unchanged, only coordinate shifted to align with reference peak.

aligned_list <- align(list(frame,frame2,frame3))
for (i in 1:3){
    print(plot_peak(aligned_list$peaks[[i]], aligned_list$data[[i]], title=paste("Aligned Sample", i)))
  }