The `statgenQTLxT`

package performs multi-trait and
multi-environment Genome Wide Association Studies (GWAS), following the
approach of (Zhou and
Stephens 2014). It builds on the `statgenGWAS`

package (for single trait GWAS) which is available from CRAN. The package
uses data structures and plots defined in the `statgenGWAS`

package. It is recommended to read the vignette of this package,
accessible in R via `vignette(package = "statgenGWAS")`

or
online at https://biometris.github.io/statgenGWAS/articles/GWAS.html
to get a general idea of those.

Multi-trait GWAS in the `statgenQTLxT`

package estimates
and tests the effect of a SNP in different trials or on different
traits, one SNP at a time. Genetic and residual covariances are fitted
only once, for a model without SNPs. Given balanced data on \(n\) genotypes and \(p\) traits (or trials) we fit a mixed model
of the form

\(Y = \left(\begin{array}{c} Y_1 \\ \vdots \\ Y_p\end{array}\right) = \left(\begin{array}{c} X\gamma_1 \\ \vdots \\ X\gamma_p\end{array}\right) + \left(\begin{array}{c} x\beta_1 \\ \vdots \\ x\beta_p\end{array}\right) + \left(\begin{array}{c} G_1 \\ \vdots \\ G_p\end{array}\right) + \left(\begin{array}{c} E_1 \\ \vdots \\ E_p\end{array}\right)\)

where \(Y_1\) to \(Y_p\) are \(n × 1\) vectors with the phenotypic values for traits or trials \(1, \ldots, p\). \(x\) is the \(n × 1\) vector of scores for the marker under consideration, and \(X\) the \(n × q\) design matrix for the other covariates. By default only a trait (environment) specific intercept is included. The vector containing the genetic background effects \(G_1, \ldots, G_p\) is Gaussian with zero mean and covariance \(V_g \otimes K\), where \(V_g\) is a \(p × p\) matrix of genetic (co)variances, and \(K\) an \(n × n\) kinship matrix. Similarly, the residual errors (\(E_1, \ldots, E_p\)) have covariance \(V_e \otimes I_n\), for a \(p × p\) matrix \(V_e\) of residual (co)variances.

For each SNP, the null-hypothesis \(\beta_1
= \dots = \beta_p = 0\) is tested, using the likelihood ratio
test (LRT) described in (Zhou and Stephens 2014). If
`estCom = TRUE`

, additional tests for a common effect and for
QTL-by-trait (QTL × T) or QTL-by-environment (QTL × E) are performed,
using the parameterization \(\beta_j = \alpha
+ \alpha_j (1 \leq j \leq p)\). As in (Korte et al. 2012), we use likelihood
ratio tests, but not restricted to the bivariate case. For the common
effect, we fit the reduced model \(\beta_j =
\alpha\), and test if \(\alpha =
0\). For the interactions, we test if \(\alpha_1 = \dots = \alpha_p = 0\).

\(V_g\) and \(V_e\) can be provided by the user
(`fitVarComp = FALSE`

); otherwise one of the following models
is used, depending on `covModel`

. If
`covModel = "unst"`

, an unstructured model is assumed, as in
(Zhou and Stephens
2014): \(V_g\) and \(V_e\) can be any positive-definite matrix,
requiring a total of \(p(p+1)/2\)
parameters per matrix. If `covModel = "fa"`

, a
factor-analytic model is fitted using an EM-algorithm, as in (Millet et al.
2016). \(V_g\) and \(V_e\) are assumed to be of the form \(W W^t + D\), where \(W\) is a \(p ×
m\) matrix of factor loadings and \(D\) a diagonal matrix with trait or
environment specific values. \(m\) is
the order of the model, and the arguments `mG`

and
`mE`

specify the order used for respectively \(V_g\) and \(V_e\). `maxIter`

sets the
maximum number of iterations used in the EM-algorithm. Finally, if
`covModel = "pw"`

, \(V_g\)
and \(V_e\) are estimated ‘pairwise’,
as in (Furlotte and
Eskin 2015). Looping over pairs of traits or trials \(1 \leq j < k \leq p\), \(V_g[j,k] = V_g[k,j]\) and \(V_e[j,k] = V_e[k,j]\) are estimated
assuming a bivariate mixed model. The diagonals of \(V_g\) and \(V_e\) are fitted assuming univariate mixed
models. If the resulting \(V_g\) or
\(V_e\) is not positive-definite, they
are replaced by the nearest positive-definite matrix. In case
`covModel = "unst"`

or `"pw"`

it is possible to
assume that \(V_e\) is diagonal
(`VeDiag = TRUE`

).

`gData`

Data for analysis on genomic data comes from different sources and is
stored in one data object of class **gData**
(**g**enomic **Data**) for convenience. A
`gData`

object will contain all data needed for performing
analyses, so the first thing to do when using the
`statgenQTLxT`

package is creating a `gData`

object; see the `statgenGWAS`

package for more details. A
`gData`

object contains the following components:

- Marker map, a data.frame describing the physical positions of the markers on the chromosomes,
- Marker matrix, a numerical matrix containing the genotyping,
- Phenotypic data, either a single data.frame or a list of data.frames containing the phenotypic data,
- Kinship matrix, a matrix describing the genetic relatedness between the different genotypes,
- Further covariates that can be used in the analyses.

In our examples below we will show how a `gData`

object is
created.

As examples of the functionality of the package two worked examples
are provided using maize data from the European Union project DROPS. The
data is available from https://doi.org/10.15454/IASSTN (Millet et al.
2019) and the relevant data sets are included as data.frames
in the `statgenGWAS`

package. We will first show how to load
the data and create a `gData`

object. Users already familiar
with the `statgenGwas`

packages might want to skip this part
and go straight to **Running Multi-trait
GWAS** section.

`dropsMarkers`

contains the coded marker information for
41,722 SNPs and 246 genotypes. `dropsMap`

contains
information about the positions of those SNPs on the B73 reference
genome V2. `dropsPheno`

contains data for the genotypic means
(Best Linear Unbiased Estimators, BLUEs) for a subset of ten
experiments, with one value per experiment per genotype, for eight
traits. For a more detailed description of the contents of the data see
`help(dropsData, package = statgenGWAS)`

.

The first step is to create a ** gData**
object from the raw data that can be used for the GWAS analysis. For
this the raw data has to be converted to a suitable format for a

`gData`

`help(createGData, package = statgenGWAS)`

and the
`statgenGWAS`

vignette for more details.When running a multi-trait or multi-environment GWAS, all traits used
in the analysis should be in the same `data.frame`

, with
genotype as first column and the phenotypic data in subsequent columns.
In case of a multi-trait analysis the phenotypic columns contain
different traits, measured in one environment, while for a
multi-environment the columns correspond to the different environments
(same trait). In both cases the `data.frame`

may only contain
phenotypic data. Additional covariates need to be stored in
`covar`

.

Below are some examples of what these `data.frames`

should
look like.

genotype | Trait1 | Trait2 | Trait3 |
---|---|---|---|

G1 | 0.3 | 17 | 277 |

G2 | 0.4 | 19 | 408 |

G3 | 0.5 | 17 | 206 |

G4 | 0.7 | 13 | 359 |

genotype | Trait1-Trial1 | Trait1-Trial2 | Trait1-Trial3 |
---|---|---|---|

G1 | 0.3 | 0.7 | 0.5 |

G2 | 0.4 | 0.9 | 0.1 |

G3 | 0.5 | 0.8 | 0.2 |

G4 | 0.7 | 0.5 | 0.4 |

In our first example, we want to perform multi-trait GWAS for one of
the DROPS environments. `dropsPheno`

contains genotypic means
for six traits in ten trials. To run a multi-trait GWAS analysis for
each of the ten trials, the data has to be added as a `list`

of ten `data.frames`

. Recall that these
`data.frames`

should have “genotype” as their first column
and may only contain traits after that. Other columns need to be
dropped.

The code below creates this `list`

of
`data.frames`

from `dropsPheno`

.

```
data(dropsPheno, package = "statgenGWAS")
## Convert phenotypic data to a list.
colnames(dropsPheno)[1] <- "genotype"
dropsPheno <- dropsPheno[c("Experiment", "genotype", "grain.yield", "grain.number",
"anthesis", "silking", "plant.height", "ear.height")]
## Split data by experiment.
dropsPhenoList <- split(x = dropsPheno, f = dropsPheno[["Experiment"]])
## Remove Experiment column.
## phenotypic data should consist only of genotype and traits.
dropsPhenoList <- lapply(X = dropsPhenoList, FUN = `[`, -1)
```

If the phenotypic data consists of only one trial/experiment, it can
be added as a single `data.frame`

without first converting it
to a `list`

. In that case `createGData`

will
convert the input to a `list`

with one item.

Now a `gData`

object containing map, marker information,
and phenotypes can be created. Kinship matrix and covariates may be
added later on.

```
## Load marker data.
data("dropsMarkers", package = "statgenGWAS")
## Add genotypes as row names of dropsMarkers and drop Ind column.
rownames(dropsMarkers) <- dropsMarkers[["Ind"]]
dropsMarkers <- dropsMarkers[, -1]
## Load genetic map.
data("dropsMap", package = "statgenGWAS")
## Add genotypes as row names of dropsMap.
rownames(dropsMap) <- dropsMap[["SNP.names"]]
## Rename Chromosome and Position columns.
colnames(dropsMap)[2:3] <- c("chr", "pos")
## Create a gData object containing map, marker and phenotypic information.
gDataDrops <- statgenGWAS::createGData(geno = dropsMarkers,
map = dropsMap,
pheno = dropsPhenoList)
```

To get an idea of the contents of the data a summary of the
`gData`

object can be made. This will give an overview of the
content of the map and markers and also print a summary per trait per
trial. Since there are ten trials and six traits in
`gDataDrops`

we restrict the output to one trial, using the
`trials`

argument of the summary function.

```
## Summarize gDataDrops.
summary(gDataDrops, trials = "Mur13W")
#> map
#> Number of markers: 41722
#> Number of chromosomes: 10
#>
#> markers
#> Number of markers: 41722
#> Number of genotypes: 246
#> Content:
#> 0 1 2 <NA>
#> 0.28 0.01 0.71 0.00
#>
#> pheno
#> Number of trials: 1
#>
#> Mur13W:
#> Number of traits: 6
#> Number of genotypes: 246
#>
#> grain.yield grain.number anthesis silking plant.height ear.height
#> Min. 3.3 1348 56 59 222 102
#> 1st Qu. 6.3 2641 61 64 251 125
#> Median 7.5 2965 63 66 258 133
#> Mean 7.4 2986 63 66 259 133
#> 3rd Qu. 8.4 3359 66 68 266 141
#> Max. 11.4 4510 71 74 294 172
#> NA's 0.0 0 0 0 0 0
```

Marker data has to be numerical and without missing values in order
to do GWAS analysis. This can be achieved using the
`codeMarkers()`

function, which can also perform imputation
of missing markers. The marker data available for the DROPS project has
already been converted from A/T/C/G to 0/1/2. We still use the
`codeMarkers()`

function to further clean the markers, in
this case by removing the duplicate SNPs.

```
## Set seed.
set.seed(1234)
## Remove duplicate SNPs from gDataDrops.
gDataDropsDedup <- statgenGWAS::codeMarkers(gDataDrops,
impute = FALSE,
verbose = TRUE)
#> Input contains 41722 SNPs for 246 genotypes.
#> 0 genotypes removed because proportion of missing values larger than or equal to 1.
#> 0 SNPs removed because proportion of missing values larger than or equal to 1.
#> 5098 duplicate SNPs removed.
#> Output contains 36624 SNPs for 246 genotypes.
```

Note that duplicate SNPs are removed at random. To get reproducible results make sure to set a seed.

To demonstrate the options of the `codeMarkers()`

function, see `help(codeMarkers, package = statgenGWAS)`

and
the `statgenGWAS`

vignette for more details.

The cleaned `gData`

object can be used for performing
multi-trait GWAS analysis. In this example the trial *Mur13W* is
used to demonstrate the options of the `runMultiTraitGwas()`

function, for a subset of five traits. As in (**millet2016?**) we
choose a factor analytic model for the genetic and residual
covariance.

```
## Run multi-trait GWAS for 5 traits in trial Mur13W.
GWASDrops <- runMultiTraitGwas(gData = gDataDropsDedup,
traits = c("grain.yield","grain.number",
"anthesis", "silking" ,"plant.height"),
trials = "Mur13W",
covModel = "fa")
```

The output of the `runMultiTraitGwas()`

function is an
object of class `GWAS`

. This is a list consisting of five
elements described below.

**GWAResult**: a list of`data.tables`

, one for each trial for which the analysis was run. Each`data.table`

has the following columns:

snp | SNP name |

trait | trait name |

chr | chromosome on which the SNP is located |

pos | position of the SNP on the chromosome |

pValue | P-value for the SNP |

LOD | LOD score for the SNP, defined as \(-\log_{10}(pValue)\) |

effect | effect of the SNP on the trait value |

effectSe | standard error of the effect of the SNP on the trait value |

allFreq | allele frequency of the SNP |

```
head(GWASDrops$GWAResult$Mur13W)
#> snp trait chr pos pValue LOD effect effectSe allFreq
#> 1: SYN83 anthesis 1 3498 0.928 0.033 -0.14 0.19 0.60
#> 2: PZE-101000060 anthesis 1 157104 0.187 0.729 0.49 0.21 0.72
#> 3: PZE-101000088 anthesis 1 238347 0.079 1.100 -0.65 0.25 0.84
#> 4: PZE-101000083 anthesis 1 239225 0.506 0.296 -0.16 0.18 0.58
#> 5: PZE-101000108 anthesis 1 255850 0.656 0.183 0.17 0.34 0.90
#> 6: PZE-101000111 anthesis 1 263938 0.466 0.332 -0.41 0.24 0.83
head(GWASDrops$GWAResult$Mur13W[GWASDrops$GWAResult$Mur13W$trait == "grain.yield", ])
#> snp trait chr pos pValue LOD effect effectSe allFreq
#> 1: SYN83 grain.yield 1 3498 0.928 0.033 -0.058 0.096 0.60
#> 2: PZE-101000060 grain.yield 1 157104 0.187 0.729 0.037 0.102 0.72
#> 3: PZE-101000088 grain.yield 1 238347 0.079 1.100 0.018 0.124 0.84
#> 4: PZE-101000083 grain.yield 1 239225 0.506 0.296 -0.108 0.091 0.58
#> 5: PZE-101000108 grain.yield 1 255850 0.656 0.183 -0.091 0.164 0.90
#> 6: PZE-101000111 grain.yield 1 263938 0.466 0.332 -0.028 0.119 0.83
```

The `data.tables`

above contain the results for traits
anthesis and grain.yield respectively. While the column effect is trait
specific, the p-Value is for the global null-hypothesis (\(\beta_1 = \dots = \beta_p = 0\)) described
above; these P-values are repeated for each trait.

**signSnp**: a list of`data.tables`

, one for each trial for which the analysis was run, containing the significant SNPs. Optionally also the SNPs close to the significant SNPs are included in the`data.table`

. See**Significance thresholds**for more information on how to do this. The data.tables in signSnp consist of the same columns as those in GWAResult described above. Two extra columns are added:

snpStatus | either “significant SNP” or “within … of a significant SNP” |

propSnpVar | proportion of the variance explained by the SNP, computed as \(\beta_{\textrm{SNP}}^2 * var(\textrm{SNP}) / var(\textrm{pheno})\) |

In this case there are no significant SNPs:

**kinship**: the kinship matrix (or matrices) used in the GWAS analysis. This can either be the user provided kinship matrix or the kinship matrix computed when running the`runMultiTraitGwas()`

function.**thr**: a list of thresholds, one for each trial for which the analysis was run, used for determining significant SNPs.**GWASInfo**: additional information on the analysis, e.g. the call and the type of threshold used.

For a quick overview of the results, e.g. the number of significant SNPs, use the summary function.

```
## Create summary of GWASDrops for the trait grain number.
summary(GWASDrops, traits = "grain.number")
#> Mur13W:
#> Traits analysed: anthesis, grain.number, grain.yield, plant.height, silking
#>
#> Data are available for 36624 SNPs.
#> 0 of them were not analyzed because their minor allele frequency is below 0.01
#>
#> GLSMethod: single
#>
#> Trait: grain.number
#>
#> LOD-threshold: 5.9
#> No significant SNPs found.
#>
#> No genomic control correction was applied
#> Genomic control inflation-factor: 1.7
```

The `plot`

function can be used to visualize the results
in GWASDrops, with a QQ-plot, Manhattan plot or QTL-plot. More details
for each `plotType`

are available in the
`statgenGWAS`

vignette.

A QQ-plot of the observed against the expected \(-\log_{10}(p)\) values can be made by
setting `plotType = "qq"`

. Most of the SNPs are expected to
have no effect, resulting in P-values uniformly distributed on \([0,1]\), and leading to the identity
function (\(y=x\)) on the \(-\log_{10}(p)\) scale. As in the plot
below, deviations from this line should only occur on the right side of
the plot, for a small number of SNPs with an effect on the phenotype
(and possibly SNPs in LD). There is **inflation** if the
observed \(-\log_{10}(p)\) values are
always above the line \(y=x\), and
(less common) **deflation** if they are always below this
line. A QQ-plot therefore gives a first impression of the quality of the
GWAS model: if for example \(-\log_{10}(p)\) values are consistently too
large (inflation), the correction for genetic relatedness may not be
adequate. In this case it may be of interest to correct the P-values for
inflation, using the `genomicControl`

argument in the
`runMultiTraitGwas()`

.

A Manhattan plot is made by setting
`plotType = "manhattan"`

. Significant SNPs are marked in
red.

More options linked with `plotType = "manhattan"`

are
described in the `statgenGWAS`

vignette.

A qtl plot can be made by setting `plotType = "qtl"`

. In
this plot the significant SNPs are marked by circles at their genomic
positions, with diameter proportional to the estimated effect size; for
an example see Millet et al. (2016). Typically, this is done for
multiple traits or environments, with the genomic position on the
x-axis, which are displayed horizontally above each other and can thus
be compared.

Since the traits are measured on a different scale, the effect
estimates cannot be compared directly. For better comparison, one can
set `normalize = TRUE`

, which divides the estimates by the
standard deviation of the phenotype.

```
## Plot a qtl plot of GWAS Drops for Mur13W.
## Set significance threshold to 5 and normalize effect estimates.
plot(GWASDrops, plotType = "qtl", yThr = 5, normalize = TRUE)
```

Other arguments can be used to plot a subset of the chromosomes
(`chr`

) and directly export the plot to .pptx
(`exportPptx = TRUE`

and specify `pptxName`

). Note
that the officer
package is required for this. A full list of arguments can be found by
running `help(plot.GWAS)`

.

The `runMultiTraitGwas()`

function has an argument
`kinshipMethod`

, which defines the kinship matrix used for
association mapping. Kinship matrices can be computed directly using the
`kinship`

function or within the
`runMultiTraitGwas function`

. There are five options: (1)
using the covariance between the scaled SNP-scores
(`kinshipMethod = "astle"`

, the default; see e.g. equation
(2.2) in Astle and Balding (2009)) (2) Identity by State
(`kinshipMethod = "IBS"`

; see e.g. equation (2.3) in Astle and Balding (2009)) (3) using the formula by VanRaden (2008)
(`kinshipMethod = "vanRaden"`

) (4) using an identity matrix
(`kinshipMethod = "identity"`

), corresponding to no kinship
correction (5) User-defined, in which case the argument `kin`

needs to be specified.

By default, the same kinship matrix is used for testing all SNPs
(`GLSMethod = "single"`

). When
`GLSMethod = "multi"`

, the kinship matrix is
chromosome-specific. In this case, the function fits variance components
and computes effect-estimates and P-values for each chromosome in turn,
using the kinship matrix for that chromosome (i.e. using all SNPs that
are **not** on this chromosome). Each chromosome-specific
kinship matrix is computed using the method specified by the argument
`kinshipMethod`

. As shown by Rincent
et al. (2014), this can give a
considerable improvement in power.

In this example, we will focus on one trait, grain yield, in all
trials. `dropsPheno`

contains genotypic means for 10 trials.
To be able to run a GWAS analysis with one trait measured in all trials,
the data has to be reshaped and added as a single data.frame with
“genotype” as first column and traits after that.

```
## Reshape phenotypic data to data.frame in wide format containing only grain.yield.
phenoDat <- reshape(dropsPheno[, c("Experiment", "genotype", "grain.yield")],
timevar = "Experiment",
idvar = "genotype",
direction = "wide",
v.names = "grain.yield")
## Rename columns to trial name only.
colnames(phenoDat)[2:ncol(phenoDat)] <-
gsub(pattern = "grain.yield.", replacement = "",
x = colnames(phenoDat)[2:ncol(phenoDat)])
```

Now we create a `gData`

object containing map marker
information and phenotypes.

```
## Create a gData object containing map, marker and phenotypic information.
gDataDropsxE <- statgenGWAS::createGData(geno = dropsMarkers,
map = dropsMap,
pheno = phenoDat)
summary(gDataDropsxE)
#> map
#> Number of markers: 41722
#> Number of chromosomes: 10
#>
#> markers
#> Number of markers: 41722
#> Number of genotypes: 246
#> Content:
#> 0 1 2 <NA>
#> 0.28 0.01 0.71 0.00
#>
#> pheno
#> Number of trials: 1
#>
#> phenoDat:
#> Number of traits: 10
#> Number of genotypes: 246
#>
#> Kar12W Gai12W Kar13W Ner12R Mar13R Mur13R Cra12R Cam12R Kar13R Mur13W
#> Min. 5.4 7.5 4.1 2.1 3.8 2.1 0.088 0.38 5.6 3.3
#> 1st Qu. 8.8 10.4 7.0 4.0 7.0 5.8 0.891 1.33 8.7 6.3
#> Median 9.9 11.2 7.9 4.7 7.7 6.9 1.381 1.87 9.8 7.5
#> Mean 9.7 11.2 8.0 4.7 7.8 6.9 1.492 1.98 9.9 7.4
#> 3rd Qu. 10.7 12.0 9.1 5.3 8.6 7.8 1.971 2.59 11.0 8.4
#> Max. 13.1 14.3 12.7 7.1 11.5 10.6 4.979 4.90 13.8 11.4
#> NA's 0.0 0.0 0.0 0.0 0.0 0.0 0.000 0.00 0.0 0.0
```

```
## Remove duplicate SNPs from gDataDrops.
gDataDropsDedupxE <- statgenGWAS::codeMarkers(gDataDropsxE,
impute = FALSE,
verbose = TRUE)
#> Input contains 41722 SNPs for 246 genotypes.
#> 0 genotypes removed because proportion of missing values larger than or equal to 1.
#> 0 SNPs removed because proportion of missing values larger than or equal to 1.
#> 5098 duplicate SNPs removed.
#> Output contains 36624 SNPs for 246 genotypes.
```

Similar to the first example we run a multi-trial GWAs using a factor analytic model.

```
## Run multi-trial GWAS for one trait in all trials.
GWASDropsxE <- runMultiTraitGwas(gData = gDataDropsDedupxE,
covModel = "fa")
```

Among the significant SNPs we find the large QTL on chromosome 6 reported in Millet et al. (2016).

```
head(GWASDropsxE$signSnp$pheno, row.names = FALSE)
#> snp trait chr pos pValue LOD effect effectSe allFreq snpStatus propSnpVar
#> 1: SYN25281 Kar12W 6 18646369 1.6e-07 6.8 0.23 0.100 0.79 significant SNP 0.017
#> 2: PZE-106021363 Kar12W 6 18846283 7.6e-11 10.1 0.28 0.091 0.70 significant SNP 0.034
#> 3: PZE-106021410 Kar12W 6 18990291 2.5e-11 10.6 0.30 0.091 0.70 significant SNP 0.037
#> 4: PZE-106021419 Kar12W 6 18991091 1.5e-12 11.8 0.25 0.092 0.74 significant SNP 0.026
#> 5: PZE-106021420 Kar12W 6 18991117 2.3e-10 9.6 0.27 0.091 0.70 significant SNP 0.032
#> 6: PZE-106021424 Kar12W 6 18991481 4.3e-12 11.4 0.24 0.092 0.74 significant SNP 0.022
```

For a quick overview of the results, e.g. the number of significant
SNPs, we again use the summary function. We restrict the output to two
trials using the `traits`

argument.

```
summary(GWASDropsxE, traits = c("Mur13W", "Kar12W"))
#> phenoDat:
#> Traits analysed: Cam12R, Cra12R, Gai12W, Kar12W, Kar13R, Kar13W, Mar13R, Mur13R, Mur13W, Ner12R
#>
#> Data are available for 36624 SNPs.
#> 0 of them were not analyzed because their minor allele frequency is below 0.01
#>
#> GLSMethod: single
#>
#> Trait: Mur13W
#>
#> LOD-threshold: 5.9
#> Number of significant SNPs: 8
#> Smallest p-value among the significant SNPs: 1.5e-12
#> Largest p-value among the significant SNPs: 4e-07 (LOD-score: 6.4)
#>
#> No genomic control correction was applied
#> Genomic control inflation-factor: 0.94
#>
#> Trait: Kar12W
#>
#> LOD-threshold: 5.9
#> Number of significant SNPs: 8
#> Smallest p-value among the significant SNPs: 1.5e-12
#> Largest p-value among the significant SNPs: 4e-07 (LOD-score: 6.4)
#>
#> No genomic control correction was applied
#> Genomic control inflation-factor: 0.94
```

As in the first example we use the `plot.GWAS()`

function
to visualize the results in `GWASDropsxE`

, with a QQ-plot,
Manhattan plot or QTL-plot.

The `runMultiTraitGwas()`

function has many more arguments
that can be specified. In this section similar arguments are grouped and
explained with examples on how to use them.

The threshold for selecting significant SNPs in a GWAS analysis is
computed by default using Bonferroni correction, with an alpha of 0.05.
The alpha can be modified setting the option *alpha* when calling
`runMultiTraitGwas()`

. Two other threshold types can be used:
a fixed threshold (`thrType = "fixed"`

) specifying the \(-\log_{10}(p)\) (`LODThr`

) value
of the threshold, or a threshold that defines the n SNPs with the
highest \(-\log_{10}(p)\) scores as
significant SNPs. Set `thrType = "small"`

together with
`nSnpLOD = n`

to do this. In the following example, we define
all SNPs with \(p < 10^{-4}\) as
significant SNPs.

```
## Run multi-trait GWAS for Mur13W.
## Use a fixed significance threshold of 4.
GWASDropsFixThr <- runMultiTraitGwas(gData = gDataDropsDedup,
trials = "Mur13W",
covModel = "fa",
thrType = "fixed",
LODThr = 4)
```

**Controlling false discovery rate**

A final option for selecting significant SNPs is by setting
`thrType = "fdr"`

. When doing so the significant SNPs won’t
be selected by computing a genome wide threshold, but by trying to
control the rate of false discoveries as in Brzyski et al. (2016).

First, a list is defined containing all SNPs with a P-value below
`pThr`

, default 0.05. Then clusters of SNPs are created using
a two step iterative process in which SNPs with the lowest P-values are
selected as cluster representatives. This SNP and all SNPs that have a
correlation with this SNP of \(\rho\)
or higher (specified by the function argument `rho`

, default
0.4) will form a cluster. The selected SNPs are removed from the list
and the procedure is repeated until no SNPs are left. At the end of this
step, one has a list of clusters, with corresponding vector of P-values
of the cluster representatives. Finally, to determine the number of
significant clusters, the first cluster is determined for which the
P-value of the cluster representative is larger than \(cluster_{number} * \alpha / m\), where
\(m\) is the number of SNPs and \(\alpha\) can be specified by the
corresponding function argument. All previous clusters are selected as
significant.

There are three ways to compute the variance components used in the
GWAS analysis. These can be specified by setting the argument
`covModel`

. See **Models for the genetic
and residual covariance** for a description of the
options.

**Note that covModel = unst can only be used for
less than 10 traits or trials. It is not recommended to use it for six
to nine trials for computational reasons.**

```
## Run multi-trait GWAS for for Mur13W.
## Use a factor analytic model for computing the variance components.
GWASDropsFA <- runMultiTraitGwas(gData = gDataDropsDedup,
trials = "Mur13W",
covModel = "fa")
## Rerun the analysis, using the variance components computed in the
## previous model as inputs.
GWASDropsFA2 <- runMultiTraitGwas(gData = gDataDropsDedup,
trials = "Mur13W",
fitVarComp = FALSE,
Vg = GWASDropsFA$GWASInfo$varComp$Vg,
Ve = GWASDropsFA$GWASInfo$varComp$Ve)
```

**Parallel computing**

To improve performance when using a pairwise variance component
model, it is possible to use parallel computing. To do this, a parallel
back-end has to be specified by the user, e.g. by using
`registerDoParallel`

from the `doParallel`

package
(see the example below). In addition, in the
`runMultiTraitGwas()`

function the argument
`parallel`

has to be set to `TRUE`

. With these
settings the computations are done in parallel per pair of traits.

Covariates can be included as extra fixed effects in the GWAS model.
The `runMultiTraitGwas()`

function distinguishes between
‘usual’ covariates and SNP-covariates. The former could be design
factors such as block, or other traits one wants to condition on. In the
latter case, the covariate(s) are one or more of the markers contained
in the genotypic data. SNP-covariates can be set with the argument
`snpCov`

, which should be a vector of marker names.
Similarly, other covariates should be specified using the argument
`covar`

, containing a vector of covariate names. The
`gData`

object should contain these covariates in
`gData$covar`

.

In case SNP-covariates are used, GWAS for all the other SNPs is performed with the the SNP-covariates as extra fixed effect; also the null model used to estimate the variance components includes these effects. For each SNP in SNP-covariates, a P-value is obtained using the same F-test and null model to estimate the variance components, but with only all other SNPs (if any) in SNP-covariates as fixed effects.

It is recommended to remove SNPs with a low minor allele frequency
(MAF) from the data before starting a GWAS analysis. However it is also
possible to do so in the analysis itself. The difference between these
approaches is that `codeMarkers()`

removes the SNPs, whereas
`runMultiTraitGwas()`

excludes them from the analysis but
leaves them in the output (with results set to `NA`

). In the
latter case it will still be possible to see the allele frequency of the
SNP.

By default all SNPs with a MAF lower than 0.01 are excluded from the
analysis. This can be controlled by the argument `MAF`

.
Setting MAF to 0 will still exclude duplicate SNPs since duplicates
cause problems when fitting the underlying models.

Besides a normal SNP-effect model, it is possible to fit a common
SNP-effect model as well (see **Hypotheses for
the SNP-effects**). When doing so, in addition to the
SNP-effect, also the common SNP-effect and the QTL×E effect and
corresponding standard errors and P-values are returned. These are
included as extra columns in the GWAResult `data.table`

in
the output of the function.

```
## Run multi-trait GWAS for Mur13W.
## Fit an additional common sNP effect model.
GWASDropsCommon <- runMultiTraitGwas(gData = gDataDropsDedup,
trials = "Mur13W",
covModel = "fa",
estCom = TRUE)
head(GWASDropsCommon$GWAResult$Mur13W)
#> snp trait chr pos pValue LOD effect effectSe allFreq pValCom effsCom effsComSe pValQtlE
#> 1: SYN83 anthesis 1 3498 0.954 0.02 -0.13 0.20 0.60 0.283 -0.048 0.044 0.99
#> 2: PZE-101000060 anthesis 1 157104 0.173 0.76 0.51 0.21 0.72 0.023 0.107 0.047 0.57
#> 3: PZE-101000088 anthesis 1 238347 0.091 1.04 -0.68 0.25 0.84 0.055 -0.111 0.058 0.20
#> 4: PZE-101000083 anthesis 1 239225 0.597 0.22 -0.15 0.19 0.58 0.111 -0.067 0.042 0.84
#> 5: PZE-101000108 anthesis 1 255850 0.510 0.29 0.18 0.34 0.90 0.641 -0.037 0.079 0.41
#> 6: PZE-101000111 anthesis 1 263938 0.622 0.21 -0.41 0.24 0.83 0.104 -0.089 0.055 0.88
```

The SNPs with a significant QTLxE effect come from the same regions as the SNPs in supplementary table 6 in Millet et al. (2016), who however used a different SNP set. Another difference is that here we explicitly test for QTLxE within the GWAS, whereas Millet et al. (2016) regressed SNP effects on environmental covariates.

Astle, William, and David J. Balding. 2009. “Population Structure
and Cryptic Relatedness in Genetic Association Studies.”
*Statistical Science* 24 (4): 451–71. https://doi.org/10.1214/09-sts307.

Brzyski, Damian, Christine B. Peterson, Piotr Sobczyk, Emmanuel J.
Candès, Malgorzata Bogdan, and Chiara Sabatti. 2016. “Controlling
the Rate of GWAS False Discoveries.”
*Genetics* 205 (1): 61–75. https://doi.org/10.1534/genetics.116.193987.

Furlotte, N. A., and E. Eskin. 2015. “Efficient Multiple-Trait
Association and Estimation of Genetic Correlation Using the
Matrix-Variate Linear Mixed Model.” *Genetics* 200 (1):
59–68. https://doi.org/10.1534/genetics.114.171447.

Korte, Arthur, Bjarni J Vilhjálmsson, Vincent Segura, Alexander Platt,
Quan Long, and Magnus Nordborg. 2012. “A Mixed-Model Approach for
Genome-Wide Association Studies of Correlated Traits in Structured
Populations.” *Nature Genetics* 44 (9): 1066–71. https://doi.org/10.1038/ng.2376.

Millet, Emilie J., Cyril Pommier, Mélanie Buy, Axel Nagel, Willem
Kruijer, Therese Welz-Bolduan, Jeremy Lopez, et al. 2019. “A
Multi-Site Experiment in a Network of European Fields for Assessing the
Maize Yield Response to Environmental Scenarios.” Portail Data
Inra. https://doi.org/10.15454/IASSTN.

Millet, Emilie J., Claude Welcker, Willem Kruijer, Sandra Negro,
Stephane Nicolas, Sebastien Praud, Nicolas Ranc, et al. 2016.
“Genome-Wide Analysis of Yield in Europe: Allelic Effects as
Functions of Drought and Heat Scenarios.” *Plant
Physiology*, July, pp.00621.2016. https://doi.org/10.1104/pp.16.00621.

Rincent, Renaud, Laurence Moreau, Hervé Monod, Estelle Kuhn, Albrecht E.
Melchinger, Rosa A. Malvar, Jesus Moreno-Gonzalez, et al. 2014.
“Recovering Power in Association Mapping Panels with Variable
Levels of Linkage Disequilibrium.” *Genetics* 197 (1):
375–87. https://doi.org/10.1534/genetics.113.159731.

VanRaden, P. M. 2008. “Efficient Methods to Compute Genomic
Predictions.” *Journal of Dairy Science* 91 (11): 4414–23.
https://doi.org/10.3168/jds.2007-0980.

Zhou, Xiang, and Matthew Stephens. 2014. “Efficient Multivariate
Linear Mixed Model Algorithms for Genome-Wide Association
Studies.” *Nature Methods* 11 (4): 407–9. https://doi.org/10.1038/nmeth.2848.