Introduction

phoenics is an R package designed to perform a differential analysis for metabolomics datasets, while accounting for metabolomic pathway information. More precisely, phoenics performs a test at pathway level based on metabolite quantifications and information on pathway metabolite composition. The automatic query of metabolic pathways is implemented in the package.

library("phoenics")

MTBLS422 dataset

The dataset MTBLS422 contains data obtained from the study Choo et al., 2017. Raw data have been made available on MetaboLights (with the id MTBLS422 https://www.ebi.ac.uk/metabolights/editor/MTBLS422). Metabolite quantifications were obtained based on the raw signal using ASICS package and are provided in the object quantif included in the dataset.

data("MTBLS422")
ls()
## [1] "design"   "pathways" "quantif"
head(quantif)
##                              1071        1151 1231        1281        1371
## Glycerol              0.000000000 0.000000000    0 0.000000000 0.000000000
## D-Glucose-6-Phosphate 0.002339491 0.001251202    0 0.001958586 0.002652924
## D-Galactose           0.001628629 0.003448431    0 0.001535844 0.002693243
## D-Sorbitol            0.000000000 0.000000000    0 0.000000000 0.000000000
## Myo-Inositol          0.001190331 0.001929542    0 0.001325843 0.001760645
## D-GlucuronicAcid      0.001495646 0.003459446    0 0.001377114 0.001855756
##                              1531         1681         1751        2081
## Glycerol              0.001403664 0.0000000000 0.0008294523 0.000000000
## D-Glucose-6-Phosphate 0.002877861 0.0035782998 0.0025688724 0.002273193
## D-Galactose           0.001952493 0.0037286043 0.0000000000 0.001240772
## D-Sorbitol            0.001066357 0.0009889747 0.0012056168 0.000000000
## Myo-Inositol          0.001430719 0.0028295101 0.0011248417 0.002085898
## D-GlucuronicAcid      0.001791299 0.0007645118 0.0000000000 0.001816359
##                              2161        2311
## Glycerol              0.000000000 0.000000000
## D-Glucose-6-Phosphate 0.001106493 0.002437845
## D-Galactose           0.002072136 0.001392737
## D-Sorbitol            0.000000000 0.000000000
## Myo-Inositol          0.001382456 0.001893727
## D-GlucuronicAcid      0.002410063 0.001502470

They can be transformed into a data format suitable for phoenics with:

quantif_f <- from_ASICS_to_PHOENICS(quantif)
head(quantif_f)
##           C00116      C00092      C00984      C00794      C00137      C00191
## 1071 0.000000000 0.002339491 0.001628629 0.000000000 0.001190331 0.001495646
## 1151 0.000000000 0.001251202 0.003448431 0.000000000 0.001929542 0.003459446
## 1231 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## 1281 0.000000000 0.001958586 0.001535844 0.000000000 0.001325843 0.001377114
## 1371 0.000000000 0.002652924 0.002693243 0.000000000 0.001760645 0.001855756
## 1531 0.001403664 0.002877861 0.001952493 0.001066357 0.001430719 0.001791299
##            C00864       C01697       C00029       C00003
## 1071 0.0005098139 0.0000000000 0.0005008620 0.0000000000
## 1151 0.0007075218 0.0005573846 0.0000000000 0.0000000000
## 1231 0.0000000000 0.0000000000 0.0000000000 0.0009024399
## 1281 0.0004129427 0.0005796955 0.0004431261 0.0000000000
## 1371 0.0007317685 0.0005080792 0.0000000000 0.0000000000
## 1531 0.0002865527 0.0000000000 0.0006293917 0.0000000000

In addition, pathway information was obtained using KEGGREST package and is included in pathways:

head(pathways)
##    metabolite_code   metabolite_name pathway_code         pathway_name
## 24          C00029       UDP-glucose     mmu00052 Galactose metabolism
## 25          C00116          Glycerol     mmu00052 Galactose metabolism
## 26          C00137      myo-Inositol     mmu00052 Galactose metabolism
## 27          C00794        D-Sorbitol     mmu00052 Galactose metabolism
## 28          C00984 alpha-D-Galactose     mmu00052 Galactose metabolism
## 29          C01697        Galactitol     mmu00052 Galactose metabolism

The user can also query this information using the function pathway_search:

pathways <- pathway_search(metab = colnames(quantif), organism = "mmu")

phoenics test

Perform test

The test implemented in phoenics is a mixed model with fixed and random effects. The current study design is provided in design:

head(design)
##           Id Age Treatment Mouse
## 1071 T1_G1R1 5.5   Control  G1R1
## 1151 T1_G1R2 5.5   Control  G1R2
## 1231 T2_G1R2 7.5   Control  G1R2
## 1281 T2_G1R1 7.5   Control  G1R1
## 1371 T1_G3R2 5.5 Vanco-Imi  G3R2
## 1531 T3_G1R1 9.0   Control  G1R1

The Age and Treatment are thus used as fixed effects and the Mouse is used as a random effect:

out_test <- test_pathway(quantif_f, design, pathways, 
                         fixed = c("Age", "Treatment"), random = "Mouse", 
                         npc = 2, model = "blmer")
out_test
## phoenics method based on PCA
## A list of pathways which contains for each pathway: 
## $pathway_name: Pathway name 
## $pathway_code: Pathway code 
## $metabolites: Metabolites in the pathway 
## $PCA: Factor analysis results 
## $model: Model results 
## $test_pathway: Pathway test results

Access test results

Results are organized as a list where each element of the list corresponds to a tested pathway:

names(out_test)
## [1] "Galactose metabolism"             "Inositol phosphate metabolism"   
## [3] "Vitamin digestion and absorption"

The function extract can be used to extract results on a given pathway:

res_1 <- extract(out_test, "Galactose metabolism")

Fixed effect \(p\)-values and full model are provided, respectively, in entries test_pathway and model of this object.

res_1[[1]]$test_pathway
##   Fixed_effect      pval
## 1          Age 0.2145888
## 2    Treatment 0.3831807
res_1[[1]]$model
## $mmu00052_PC1
## Cov prior  : Mouse ~ wishart(df = 3.5, scale = Inf, posterior.scale = cov, common.scale = TRUE)
## Prior dev  : -0.6945
## 
## Linear mixed model fit by REML ['blmerMod']
## Formula: mmu00052_PC1 ~ Age + Treatment + (1 | Mouse)
##    Data: score_path
## REML criterion at convergence: 40.441
## Random effects:
##  Groups   Name        Std.Dev.
##  Mouse    (Intercept) 2.054   
##  Residual             1.629   
## Number of obs: 11, groups:  Mouse, 4
## Fixed Effects:
##        (Intercept)                 Age  TreatmentVanco-Imi  
##             2.0008             -0.3463              1.1450  
## 
## $mmu00052_PC2
## Cov prior  : Mouse ~ wishart(df = 3.5, scale = Inf, posterior.scale = cov, common.scale = TRUE)
## Prior dev  : -1.996
## 
## Linear mixed model fit by REML ['blmerMod']
## Formula: mmu00052_PC2 ~ Age + Treatment + (1 | Mouse)
##    Data: score_path
## REML criterion at convergence: 36.1712
## Random effects:
##  Groups   Name        Std.Dev.
##  Mouse    (Intercept) 2.209   
##  Residual             1.136   
## Number of obs: 11, groups:  Mouse, 4
## Fixed Effects:
##        (Intercept)                 Age  TreatmentVanco-Imi  
##            -3.1797              0.3949              0.5002

In addition, adjusted \(p\)-values (BH method) across pathways can be directly obtained with the function adjust_pval:

adjust_pval(out_test)
##                       pathway_name pathway_code Fixed_effect        pval
## 1             Galactose metabolism     mmu00052          Age 0.214588838
## 3    Inositol phosphate metabolism     mmu00562          Age 0.037233449
## 5 Vitamin digestion and absorption     mmu04977          Age 0.009257926
## 2             Galactose metabolism     mmu00052    Treatment 0.383180685
## 4    Inositol phosphate metabolism     mmu00562    Treatment 0.485524008
## 6 Vitamin digestion and absorption     mmu04977    Treatment 0.419680043
##   adjusted_pval
## 1    0.21458884
## 3    0.07446690
## 5    0.02777378
## 2    1.00000000
## 4    1.00000000
## 6    1.00000000

Explore PCA results

PCA results can be explored using the entry PCA:

res_1[[1]]$PCA
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 11 individuals, described by 10 variables
## *The results are available in the following objects:
## 
##    name                description                                          
## 1  "$eig"              "eigenvalues"                                        
## 2  "$var"              "results for the variables"                          
## 3  "$var$coord"        "coord. for the variables"                           
## 4  "$var$cor"          "correlations variables - dimensions"                
## 5  "$var$cos2"         "cos2 for the variables"                             
## 6  "$var$contrib"      "contributions of the variables"                     
## 7  "$ind"              "results for the individuals"                        
## 8  "$ind$coord"        "coord. for the individuals"                         
## 9  "$ind$cos2"         "cos2 for the individuals"                           
## 10 "$ind$contrib"      "contributions of the individuals"                   
## 11 "$quali.sup"        "results for the supplementary categorical variables"
## 12 "$quali.sup$coord"  "coord. for the supplementary categories"            
## 13 "$quali.sup$v.test" "v-test of the supplementary categories"             
## 14 "$call"             "summary statistics"                                 
## 15 "$call$centre"      "mean of the variables"                              
## 16 "$call$ecart.type"  "standard error of the variables"                    
## 17 "$call$row.w"       "weights for the individuals"                        
## 18 "$call$col.w"       "weights for the variables"

In particular, phoenics implements some plots that can be directly obtained using

plot(out_test, pathway_id = "Galactose metabolism", plot = "var")

plot(out_test, pathway_id = "Galactose metabolism", plot = "ind", 
     habillage = "Age")

plot(out_test, pathway_id = "Galactose metabolism", plot = "eig")

plot(out_test, pathway_id = "Galactose metabolism", plot = "var")

plot(out_test, pathway_id = "Galactose metabolism", plot = "group")

Options of phoenics main test function

Switching from PCA to MFA

A MFA step, more suited to can be used in place of the PCA. In this case, the levels first fixed effect are used to split the effects into different tables which are matched by the levels of the first random effect (the typical case is when the first fixed effect contains information on time points in a repeated measurement study where samples are provided by the first random effect).

out_test <- test_pathway(quantif_f, design, pathways, 
                         fixed = c("Age", "Treatment"), random = "Mouse", 
                         npc = 2, model = "blmer", analysis = "MFA")
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
out_test
## phoenics method based on MFA
## A list of pathways which contains for each pathway: 
## $pathway_name: Pathway name 
## $pathway_code: Pathway code 
## $metabolites: Metabolites in the pathway 
## $PCA: Factor analysis results 
## $model: Model results 
## $test_pathway: Pathway test results

Wrapper to automatically query pathways

The option pathways = "auto" can also be used to automatically query KEGGREST instead of performing the two step analysis relying on test_pathway:

out_test3 <- test_pathway(quantif, design, pathways = "auto", 
                          fixed = c("Age", "Treatment"), random = "Mouse", 
                          npc = 2, model = "blmer", organism = "mmu")
out_test3

References

[1] Choo J. M., Kanno T., Zain N. M. M., Leong L. E. X., Abell G. C. J., Keeble J. E., Bruce K. D., Mason A. J., Rogers G. B. (2017). Divergent relationships between fecal microbiota and metabolome following distinct antibiotic-induced disruptions. mSphere, 2(1). DOI: 10.1128/msphere.00005-17.

[2] Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57(1): 289–300. DOI: 10.1111/j.2517-6161.1995.tb02031.x

Session information

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Paris
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] phoenics_0.5
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6         xfun_0.49            bslib_0.8.0         
##  [4] ggplot2_3.5.1        htmlwidgets_1.6.4    ggrepel_0.9.6       
##  [7] rstatix_0.7.2        lattice_0.22-6       vctrs_0.6.5         
## [10] tools_4.4.2          generics_0.1.3       sandwich_3.1-1      
## [13] tibble_3.2.1         cluster_2.1.6        pkgconfig_2.0.3     
## [16] Matrix_1.7-1         scatterplot3d_0.3-44 lifecycle_1.0.4     
## [19] factoextra_1.0.7     farver_2.1.2         compiler_4.4.2      
## [22] munsell_0.5.1        leaps_3.2            codetools_0.2-20    
## [25] carData_3.0-5        htmltools_0.5.8.1    sass_0.4.9          
## [28] yaml_2.3.10          car_3.1-2            pillar_1.10.0       
## [31] ggpubr_0.6.0         nloptr_2.1.1         FactoMineR_2.11     
## [34] jquerylib_0.1.4      tidyr_1.3.1          MASS_7.3-61         
## [37] flashClust_1.01-2    DT_0.33              cachem_1.1.0        
## [40] abind_1.4-8          boot_1.3-31          multcomp_1.4-26     
## [43] nlme_3.1-166         tidyselect_1.2.1     digest_0.6.37       
## [46] mvtnorm_1.3-2        dplyr_1.1.4          purrr_1.0.2         
## [49] labeling_0.4.3       splines_4.4.2        fastmap_1.2.0       
## [52] grid_4.4.2           colorspace_2.1-1     cli_3.6.3           
## [55] magrittr_2.0.3       survival_3.7-0       broom_1.0.7         
## [58] TH.data_1.1-2        withr_3.0.2          backports_1.5.0     
## [61] scales_1.3.0         estimability_1.5.1   rmarkdown_2.29      
## [64] emmeans_1.10.6       lme4_1.1-35.5        blme_1.0-6          
## [67] ggsignif_0.6.4       zoo_1.8-12           coda_0.19-4.1       
## [70] evaluate_1.0.1       knitr_1.49           rlang_1.1.4         
## [73] Rcpp_1.0.13-1        xtable_1.8-4         glue_1.8.0          
## [76] minqa_1.2.8          jsonlite_1.8.9       R6_2.5.1            
## [79] multcompView_0.1-10