*epinetr* is a forward-time genetic simulation package that includes the ability to construct epistatic networks of arbitrary complexity. Applications for *epinetr* include the testing of methods for the detection and selection of epistasis, as well as assessing the impact of epistasis on prediction and the extent to which epistasis is captured by additive models, all under varying degrees of epistatic complexity.

There are four broad steps in the workflow:

- Construct the initial population with necessary parameters
- Attach additive effects to the population
- Attach an epistatic network to the population and visualise the network
- Run a forward-time simulation of the population and plot the simulation run

In Section 1, we look at the various ways of constructing a population in *epinetr*. In Section 2, we briefly discuss how to attach additive effects to a population before diving into the options for constructing epistatic networks. Section 3 shows how *epinetr* calculates the components of each individual’s phenotype, and how to recreate that calculation. Finally, in section 5, we demonstrate how to run the simulation.

As a preview, here’s an example workflow:

```
library(epinetr)
# Build a population of size 1000, with 50 QTL, broad-sense heritability of 0.4,
# narrow-sense heritability of 0.2 and overall trait variance of 40.
<- Population(popSize = 1000, map = map100snp, alleleFrequencies = runif(100),
pop QTL = 50, broadH2 = 0.4, narrowh2 = 0.2, traitVar = 40)
# Attach additive effects
<- addEffects(pop)
pop
# Attach an epistatic network
<- attachEpiNet(pop)
pop
# Plot the network
plot(getEpiNet(pop))
```

```
# Inspect initial phenotypic components
head(getComponents(pop))
```

```
## ID Sire Dam Additive Epistatic Environmental Phenotype EBV Sex
## 1 1 0 0 2.4857822 -1.679496 9.4310751 10.2373608 -9.180127 M
## 2 2 0 0 -4.3325681 -1.950742 -4.6059436 -10.8892540 -13.690509 M
## 3 3 0 0 5.5183899 -3.772924 9.4587617 11.2042274 -4.803730 F
## 4 4 0 0 1.4432450 -1.592687 0.5037069 0.3542645 -10.272931 M
## 5 5 0 0 -0.6374964 -5.772831 3.0216742 -3.3886534 -12.193311 F
## 6 6 0 0 -1.6760836 -1.161679 -5.8049378 -8.6427006 -9.762001 M
```

```
# Run a simulation across 250 generations
<- runSim(pop, generations = 250, truncSire = 0.1, truncDam = 0.5)
pop
# Plot the simulation run
plot(pop)
```

```
# Get the allele frequencies
<- getAlleleFreqRun(pop)
af
# Get the phased genotypes of the resulting population
<- getPhased(pop)
geno
# Get a subset of the resulting population
<- getComponents(pop)$ID
ID <- sample(ID, 50)
ID <- getSubPop(pop, ID) pop2
```

Constructing the initial population is done via the *Population* function. The only data you will absolutely need is a *map*, either via a variant call format (VCF) file or directly via a data frame.

If you are directly supplying a map data frame, the first column should list the single nucleotide polymorphism (SNP) IDs, the second column should list the chromosome IDs for each SNP and the third column should list the position of each SNP on its chromosome in base pairs.

For example:

`head(map100snp)`

```
## V1 V2 V3
## 1 1 1 31433512
## 2 2 1 61930663
## 3 3 1 95513586
## 4 4 1 130263477
## 5 5 1 166021598
## 6 6 1 201083562
```

`nrow(map100snp)`

`## [1] 100`

`length(unique(map100snp[, 2]))`

`## [1] 22`

There are 100 SNPs across 22 chromosomes in this map data frame.

Given this map, we can now construct a population. We’ll generate 200 individuals using allele frequencies selected from a uniform distribution, we’ll select 20 QTL at random and we’ll give the phenotypic trait under examination a variance of 40, a broad-sense heritability of 0.9 and a narrow-sense heritability of 0.6.

```
<- Population(popSize = 200, map = map100snp, QTL = 20,
pop alleleFrequencies = runif(100),
broadH2 = 0.9, narrowh2 = 0.6, traitVar = 40)
pop
```

```
## Population of size 200
## Specified initial trait variance: 40
## Initial broad-sense heritability: 0.9
## Initial narrow-sense heritability: 0.6
## Using 100 SNPs with 20 QTL:
## 2 4 9 11 16 28 29 30 32 37 45 57 59 62 64 66 88 90 95 98
```

We can fall back on the built-in defaults for *broadH2*, *narrowh2* and *traitVar* of 0.5, 0.3 and 1, respectively:

```
<- Population(popSize = 200, map = map100snp, QTL = 20,
pop alleleFrequencies = runif(100))
pop
```

```
## Population of size 200
## Specified initial trait variance: 1
## Initial broad-sense heritability: 0.5
## Initial narrow-sense heritability: 0.3
## Using 100 SNPs with 20 QTL:
## 19 25 28 30 33 35 43 46 55 58 60 75 77 78 81 83 88 91 95 98
```

*epinetr* estimates a breeding value for each individual in addition to supplying a true genetic value (TGV). The estimated breeding value (EBV) is calculated by first estimating the heritability using genomic relationship matrix (GRM)^{1} restricted maximum likelihood (GREML),^{2} then using the heritability estimate to in turn estimate additive SNP effects via gBLUP.^{3,4} The EBVs are thus a window into how the model generated by *epinetr* appears under an assumption of additivity.

We can bypass GREML by supplying our own heritability estimate using *h2est*:

```
<- Population(popSize = 200, map = map100snp, QTL = 20,
pop alleleFrequencies = runif(100), h2est = 0.6)
```

We can also specify the QTL by listing their SNP IDs:

```
<- Population(popSize = 200, map = map100snp,
pop QTL = c(62, 55, 92, 74, 11, 38),
alleleFrequencies = runif(100),
broadH2 = 0.9, narrowh2 = 0.6, traitVar = 40)
pop
```

```
## Population of size 200
## Specified initial trait variance: 40
## Initial broad-sense heritability: 0.9
## Initial narrow-sense heritability: 0.6
## Using 100 SNPs with 6 QTL:
## 11 38 55 62 74 92
```

(Note that the QTL will be displayed only if there are no more than 100 QTL, so as not to flood the screen.)

A full list of QTL can also be displayed like so:

`getQTL(pop)`

```
## ID Index
## 1 11 11
## 2 38 38
## 3 55 55
## 4 62 62
## 5 74 74
## 6 92 92
```

Note that any map supplied to the constructor will be sorted, first by chromosome and then by base pair position.

For greater control, we can supply a matrix of phased biallelic genotypes to the constructor, either directly or via a VCF file using the *vcf* parameter. (Using a VCF file will also supply a map.)

If given directly, the matrix should be in individual-major format, with each allele coded with either a 0 or a 1 and no unknown values. For example, `geno100snp`

is a genotype matrix of 100 SNPs across 500 individuals. It’s already in the necessary format, which uses one individual per row and two columns per SNP.

`dim(geno100snp)`

`## [1] 500 200`

Examining the first 5 SNPs for the first individual we find the following:

`1, 1:10] geno100snp[`

```
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1 0 1 1 1 1 1 1 0 0
```

That is, SNP 1 is heterozygous with genotype 1|0, SNPs 2-4 are homozygous with genotype 1|1 and SNP 5 is homozygous with genotype 0|0.

We can supply a phased genotype matrix to the constructor like so:

```
<- Population(popSize = nrow(geno100snp), map = map100snp, QTL = 20,
pop genotypes = geno100snp,
broadH2 = 0.9, narrowh2 = 0.6, traitVar = 40)
```

Supplying a phased genotype matrix allows us to directly specify the initial genotypes in the population. (The assumption is that the first allele for each SNP is inherited from the sire and the second allele for each SNP is inherited from the dam.) If we supply a population size to the constructor that does not match the number of rows in the genotype matrix, the genotypes will be used only to suggest allele frequencies for newly generated genotypes.

If we wish to use the genotypes to suggest allele frequencies while still maintaining the population size, we can set the *literal* flag to `FALSE`

.

```
<- Population(popSize = nrow(geno100snp), map = map100snp, QTL = 20,
pop genotypes = geno100snp, literal = FALSE,
broadH2 = 0.9, narrowh2 = 0.6, traitVar = 40)
```

The *Population* constructor can also be used to modify an existing population. We can, for example, adjust the heritability:

```
<- Population(pop, broadH2 = 0.7, traitVar = 30)
pop pop
```

```
## Population of size 500
## Specified initial trait variance: 30
## Initial broad-sense heritability: 0.7
## Initial narrow-sense heritability: 0.6
## Using 100 SNPs with 20 QTL:
## 8 14 17 27 31 38 44 46 47 48 50 52 54 55 65 67 77 88 93 100
```

We can also adjust the population size; this will necessarily generate a new set of genotypes based on the same allele frequencies:

```
<- Population(pop, popSize = 800)
pop pop
```

```
## Population of size 800
## Specified initial trait variance: 30
## Initial broad-sense heritability: 0.7
## Initial narrow-sense heritability: 0.6
## Using 100 SNPs with 20 QTL:
## 8 14 17 27 31 38 44 46 47 48 50 52 54 55 65 67 77 88 93 100
```

Similarly, we can adjust the allele frequencies, which will necessarily also generate a new set of genotypes:

```
<- Population(pop, alleleFrequencies = runif(100))
pop pop
```

```
## Population of size 800
## Specified initial trait variance: 30
## Initial broad-sense heritability: 0.7
## Initial narrow-sense heritability: 0.6
## Using 100 SNPs with 20 QTL:
## 8 14 17 27 31 38 44 46 47 48 50 52 54 55 65 67 77 88 93 100
```

Where possible, the population features are preserved while adjusting only the parameters specified.

Because we have specified a non-zero narrow-sense heritability for our population, we now need to attach additive effects. This is done using the *addEffects* function.

```
<- addEffects(pop)
pop pop
```

```
## Population of size 800
## Specified initial trait variance: 30
## Initial broad-sense heritability: 0.7
## Initial narrow-sense heritability: 0.6
## Additive variance in population: 18
## Using 100 SNPs with 20 QTL:
## 8 14 17 27 31 38 44 46 47 48 50 52 54 55 65 67 77 88 93 100
```

As expected, 60% of the phenotypic variance is attributable to additive effects.

By default, effects are selected from a normal distribution; we can, however, supply a different distribution function.

`<- addEffects(pop, distrib = runif) pop `

Alternatively, we can supply our own additive effects for the QTL.

```
<- c( 1.2, 1.5, -0.3, -1.4, 0.8,
effects 2.4, 0.2, -0.8, -0.4, 0.8,
-0.2, -1.4, 1.4, 0.2, -0.9,
0.4, -0.8, 0.0, -1.1, -1.3)
<- addEffects(pop, effects = effects)
pop getAddCoefs(pop)
```

```
## [1] 1.7140099 2.1425124 -0.4285025 -1.9996783 1.1426733 3.4280199
## [7] 0.2856683 -1.1426733 -0.5713366 1.1426733 -0.2856683 -1.9996783
## [13] 1.9996783 0.2856683 -1.2855075 0.5713366 -1.1426733 0.0000000
## [19] -1.5711758 -1.8568441
```

Note that the additive effects are scaled so as to guarantee the initial narrow-sense heritability. This is evident by adjusting the narrow-sense heritability within the population:

```
<- Population(pop, narrowh2 = 0.4)
pop getAddCoefs(pop)
```

```
## [1] 1.3994833 1.7493541 -0.3498708 -1.6327305 0.9329888 2.7989665
## [7] 0.2332472 -0.9329888 -0.4664944 0.9329888 -0.2332472 -1.6327305
## [13] 1.6327305 0.2332472 -1.0496124 0.4664944 -0.9329888 0.0000000
## [19] -1.2828597 -1.5161069
```

If broad-sense heritability is higher than narrow-sense heritability in the population, you will need to attach epistatic effects. The simplest way to do this is to use the *attachEpiNet* function with the default arguments, supplying only the population. This will generate a random epistatic network with the QTL as nodes.

```
<- attachEpiNet(pop)
pop pop
```

```
## Population of size 800
## Specified initial trait variance: 30
## Initial broad-sense heritability: 0.7
## Initial narrow-sense heritability: 0.4
## Additive variance in population: 12
## Epistatic variance in population: 9
## Using 100 SNPs with 20 QTL:
## 8 14 17 27 31 38 44 46 47 48 50 52 54 55 65 67 77 88 93 100
```

Note that the epistatic variance is as expected and the additive variance has been preserved.

We can visualise the network that was generated by using the *getEpiNet* function to retrieve the network before plotting it.

```
<- getEpiNet(pop)
epinet plot(epinet)
```

We can use the *scaleFree* flag to generate a network using the Barabasi-Albert model.^{5}

```
<- attachEpiNet(pop, scaleFree = TRUE)
pop plot(getEpiNet(pop))
```

The Barabasi-Albert model for constructing networks assumes a connected graph that then adds a node at a time, with the probability that an existing node is connected to the new node given by \(\frac{d_i}{\sum_{j} d_j}\), where \(d_i\) is the degree of node \(i\). In *epinetr*, nodes are added in a random order (by shuffling the initial list) so as to ensure that no bias is introduced due to the initial ordering of nodes.

The “random” network model in *epinetr* acts as a control case: it uses the same algorithm as the Barabasi-Albert model but gives a uniform probability of connection to all existing nodes. This ensures that there are the same number of interactions in both cases.

If we want a number of QTL to only have additive effects applied, we can use the *additive* argument, giving the number of QTL not to be included in the epistatic network.

```
<- attachEpiNet(pop, scaleFree = TRUE, additive = 7)
pop plot(getEpiNet(pop))
```

The minimum number of interactions per QTL can be given with the *m* argument.

```
<- attachEpiNet(pop, scaleFree = TRUE, additive = 7, m = 2)
pop plot(getEpiNet(pop))
```

There are three points to note with the *m* argument. The first point is that *m* specifically refers to the number of interactions and not the order of the interactions (which can be set with the *k* parameter; see below). The second point is that it has no impact on QTL designated as additive-only. The third point is that a minimal connected graph is initially constructed prior to *m* being strictly applied. (This means that you may still see some QTL with fewer than *m* interactions due to the value of *k*.)

We can also include higher-order interactions using the *k* argument, which accepts a vector specifying the orders of interaction to include:

```
<- attachEpiNet(pop, scaleFree = TRUE, additive = 7, m = 2, k=2:7)
pop plot(getEpiNet(pop))
```

In cases where \(m > 1\), any initial minimal connected graph is created by selecting \(n\) nodes such that \(n\) is the smallest integer satisfying the equation \(\binom{n - 1}{k - 1} \ge m\). (In the default case of pairwise interactions, this reduces to \(n \ge m + 1\).)

Where \(k > 2\) we have extended the Barabasi-Albert model such that the *attachEpiNet* function adds interactions in order of increasing complexity: first, the function adds all 2-way interactions, then all 3-way interactions, *etcetera*. The important point to note is that, for networks generated using the Barabasi-Albert model, probabilities of connectedness are based on all previous interactions; for example, 3-way interactions are based on all previously established 2- and 3-way interactions. In this way, networks generated using the Barabasi-Albert model with multiple orders of interaction “layer up”, such that the degrees from lower-order interactions contribute to the preferential attachment for higher-order interactions.

All auto-generated networks are highly connected. More diffuse networks can, however, be manually generated, as detailed in the next section.

Internally, the network is stored as an incidence matrix, where the *i*th row corresponds to the *i*th QTL and the *j*th column corresponds to the *j*th interaction.

We can inspect the complete set of interactions using the *getIncMatrix* function.

```
<- getIncMatrix(pop)
inc dim(inc)
```

`## [1] 20 102`

There are currently 102 interactions in the population. Let’s examine the first five.

`1:5] inc[, `

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 1 1 1
## [2,] 0 0 0 0 1
## [3,] 0 0 0 0 0
## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0
## [6,] 0 0 0 0 0
## [7,] 0 0 0 0 0
## [8,] 0 0 0 0 0
## [9,] 0 1 0 1 0
## [10,] 1 1 0 0 0
## [11,] 0 0 0 0 0
## [12,] 0 0 0 0 0
## [13,] 0 0 0 0 0
## [14,] 0 0 0 0 0
## [15,] 0 0 0 0 0
## [16,] 0 0 0 0 0
## [17,] 1 0 1 0 0
## [18,] 0 0 0 0 0
## [19,] 0 0 0 0 0
## [20,] 0 0 0 0 0
```

Here we can see that the first interaction is between QTL 10 and QTL 17, and that QTL 1 is included in 3 of the first 5 interactions.

We can define our own network in the same way. `rincmat100snp`

is an example of a user-defined incidence matrix, giving 19 interactions across 20 QTL:

` rincmat100snp`

```
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
## [1,] 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0
## [6,] 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 1 0
## [7,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [9,] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [10,] 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1
## [11,] 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## [12,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [13,] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1
## [14,] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [15,] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [16,] 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0
## [17,] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
## [18,] 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0
## [19,] 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [20,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
```

We can attach an epistatic network based on this incidence matrix to our population using the *incmat* argument:

`<- attachEpiNet(pop, incmat = rincmat100snp) pop `

Let’s visualise the subsequent network:

`plot(getEpiNet(pop))`

As per the interaction matrix, 4 of the 20 QTL are not part of any interactions. (Rows 3, 7, 12 and 20 only contain 0s.)

We can create a 3-way interaction by modifying the matrix such that QTL 20 is included in the first interaction:

```
# Include the 20th QTL in the first interaction
<- rincmat100snp
mm 20, 1] <- 1
mm[<- attachEpiNet(pop, incmat = mm) pop
```

Again, let’s visualise the subsequent network.

`plot(getEpiNet(pop))`

The network now includes a single 3-way interaction.

As already shown, it is possible to create purely additive QTL; similarly, it is also possible to create purely epistatic QTL. This involves specifying the additive coefficients explicitly, where at least one coefficient is 0; for example:

```
# 20 coefficients, 3 of which are 0
<- sample(c(rep(0, 3), rnorm(17)), 20)
coefs <- addEffects(pop, effects = coefs)
pop getAddCoefs(pop)
```

```
## [1] -1.1866044 0.0000000 -0.7987737 2.1071437 -1.9582108 0.0000000
## [7] 0.5373050 -2.0857346 0.6914182 0.5718450 1.3797111 -0.1835077
## [13] -2.1876711 0.4220297 2.3485040 1.1208452 -1.6565688 -0.1025604
## [19] 0.0000000 -1.2441011
```

We can see here that QTL 2, 6 and 19 all have additive coefficients of 0, making them purely epistatic (since the population includes epistatic interactions).

If we wish to have both purely additive and purely epistatic QTL in our model, we can explicitly give the SNP IDs of the QTL we want to be purely additive to *attachEpiNet*.

Suppose we have 15 QTL overall. For the sake of simplicity, we’ll make the 15 QTL the first 15 SNPs in the map:

`<- Population(pop, QTL = 1:15) pop `

Let’s make the first 5 QTL additive-only by giving their SNP IDs to the additive parameter of *attachEpiNet*:

`<- attachEpiNet(pop, additive = 1:5) pop `

As we can see in the incidence matrix, the first 5 of the 15 QTL have no epistatic effects:

`getIncMatrix(pop)`

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0
## [6,] 0 0 0 0 1 0 0 0 0
## [7,] 0 0 0 1 0 0 0 1 0
## [8,] 0 0 0 0 0 0 1 0 0
## [9,] 1 0 0 0 0 0 0 0 0
## [10,] 0 0 0 0 0 0 0 0 1
## [11,] 0 0 0 0 0 1 0 0 0
## [12,] 0 1 1 0 1 0 0 0 0
## [13,] 0 0 1 0 0 1 1 0 1
## [14,] 1 1 0 1 0 0 0 0 0
## [15,] 0 0 0 0 0 0 0 1 0
```

Now, we can plot the network in order to visualise the incidence matrix:

`plot(getEpiNet(pop))`

Let’s make QTL 6-10 epistatic-only by explicitly giving their coefficients as 0 to *addEffects*:

```
<- rnorm(15)
coefs 6:10] <- 0
coefs[<- addEffects(pop, effects = coefs)
pop getAddCoefs(pop)
```

```
## [1] -0.3363628 -1.6236018 2.2309085 0.8569522 -2.2750932 0.0000000
## [7] 0.0000000 0.0000000 0.0000000 0.0000000 -0.6328774 1.1852075
## [13] 2.7895676 -1.2451551 2.3815707
```

We now have a population where 5 of the QTL are purely additive, 5 are purely epistatic and 5 are both additive and epistatic.

Now that we’ve seen how *epinetr* builds epistatic networks, it’s time to turn our attention to the effects generated by these networks.

Each QTL can, of course, only have one of three genotypes per individual: the homozygous genotype coded 0|0, the heterozygous genotype and the homozygous genotype coded 1|1. For an interaction consisting of \(k\) QTL, this corresponds to \(3^k\) possible genotypes within the interaction overall, and for this reason, *epinetr* assigns a set of \(3^k\) potential epistatic effects drawn from a normal distribution to each interaction.

Let’s create a new population with 20 QTL, additive effects and an epistatic network consisting solely of 3-way interactions:

```
<- Population(popSize = 200, map = map100snp, QTL = 20,
pop alleleFrequencies = runif(100),
broadH2 = 0.9, narrowh2 = 0.6, traitVar = 40)
<- addEffects(pop)
pop <- attachEpiNet(pop, k = 3) pop
```

We can plot the network in order to visualise these 3-way interactions:

`plot(getEpiNet(pop))`

Let’s determine which QTL are part of the first interaction:

```
<- which(getIncMatrix(pop)[, 1] > 0)
qtls qtls
```

`## [1] 4 8 13`

The possible effects for this interaction can be found using *getInteraction*:

```
<- getInteraction(pop, 1) # Return first interaction array
interaction1 interaction1
```

```
## , , 1
##
## [,1] [,2] [,3]
## [1,] -1.2072446 -0.6793566 -1.0239678
## [2,] -1.3789572 -0.7572964 -0.1307223
## [3,] 0.2981165 -0.4515898 0.1725427
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 0.81805416 0.1933691 0.2100762
## [2,] -0.07391847 -0.8650812 -0.8782884
## [3,] 1.43371732 0.4321997 -0.1391735
##
## , , 3
##
## [,1] [,2] [,3]
## [1,] 0.1807991 -0.8564893 -0.5510985
## [2,] 0.4716519 -1.5234126 -0.1814069
## [3,] 0.8556439 0.9479921 -1.5996994
```

As expected, this is a \(3^3\) array of possible effects: the first dimension maps to QTL 4, the second dimension maps to QTL 8 and the third dimension maps to QTL 13. Along each dimension, the first index maps to the homozygous genotype coded 0|0, the second index maps to the heterozygous genotype and the third index maps to the homozygous genotype coded 1|1.

Suppose for a particular individual QTL 4 has the heterozygous genotype. The possible effects for the interaction are thus given by the following:

`2, , ] interaction1[`

```
## [,1] [,2] [,3]
## [1,] -1.3789572 -0.07391847 0.4716519
## [2,] -0.7572964 -0.86508120 -1.5234126
## [3,] -0.1307223 -0.87828836 -0.1814069
```

Furthermore, suppose that QTL 8 has the homozygous genotype coded 0|0. The possible effects for the interaction are now further constrained to the following:

`2, 1, ] interaction1[`

`## [1] -1.37895721 -0.07391847 0.47165186`

Finally, suppose that QTL 13 has the homozygous genotype coded 1|1. We now have all we need in order to know the effect of this interaction on the phenotype:

`2, 1, 3] interaction1[`

`## [1] 0.4716519`

Thus the contribution of this particular interaction to this individual’s overall phenotype is 0.4716519. (An offset, however, has yet to be applied: see the next section for details.)

Once any necessary additive and epistatic effects are attached to our population, we can inspect the phenotypic components of each individual, using the *getComponents* function:

```
<- getComponents(pop)
components head(components)
```

```
## ID Sire Dam Additive Epistatic Environmental Phenotype EBV Sex
## 1 1 0 0 -4.992970 0.2690891 -2.4580563 -7.181938 -8.5135585 F
## 2 2 0 0 2.676540 -0.4178922 0.4479760 2.706624 -0.6102625 M
## 3 3 0 0 4.020194 6.6477155 3.8542549 14.522164 -4.2400437 M
## 4 4 0 0 1.124665 8.2536251 -0.5979690 8.780321 -3.5219286 F
## 5 5 0 0 -3.912983 -4.2158419 3.5527972 -4.576027 -0.7012027 M
## 6 6 0 0 5.091122 0.1184330 0.6049673 5.814522 -2.6283665 F
```

As we can see, GBLUP-based EBVs have also been calculated for each individual.

Inspecting the additive component, we can see its mean and variance are as expected:

`mean(components$Additive)`

`## [1] -4.285895e-16`

`var(components$Additive)`

`## [1] 24`

Similarly for the epistatic and environmental components:

`mean(components$Epistatic)`

`## [1] -2.994106e-17`

`var(components$Epistatic)`

`## [1] 12`

`mean(components$Environmental)`

`## [1] -8.938163e-18`

`var(components$Environmental)`

`## [1] 4`

The means are (effectively) 0 because the components are zero-centred using fixed offsets applied to the additive and epistatic components that are preserved across generations. (Only the initial generation’s environmental component is zero-centred.) We can retrieve these offsets with the *getAddOffset* and *getEpiOffset* functions, respectively:

`getAddOffset(pop)`

`## [1] 4.562087`

`getEpiOffset(pop)`

`## [1] -1.68342`

For the overall phenotypic value, we find the following:

`mean(components$Phenotype)`

`## [1] -4.467694e-16`

`var(components$Phenotype)`

`## [1] 39.99499`

This approximation of the specified variance is due to a small amount of co-variance between components:

`cov(components$Additive, components$Epistatic)`

`## [1] 0.0009446808`

`cov(components$Additive, components$Environmental)`

`## [1] 0.9330524`

`cov(components$Environmental, components$Epistatic)`

`## [1] -0.9365032`

However:

`cor(components$Additive, components$Epistatic)`

`## [1] 5.566585e-05`

*epinetr* attempts to minimise these co-variances by selecting from within the random distributions for the environmental and epistatic components such that co-variances are minimal, given computational constraints. In particular, the epistatic component is optimised using a genetic algorithm.

We are now in a position to derive the additive component for the population. First, we’ll retrieve the population’s unphased genotypes using the *getGeno* function:

`<- getGeno(pop) geno `

Alternatively, we could use the *getHaplo* function, which returns a list of the two haplotype matrices within the population; we would then need to sum the two haplotypes together.

Next, we’ll select only the QTL within the genotypes:

`<- geno[, getQTL(pop)$Index] geno `

Finally, we’ll multiply the genotypes by the additive coefficients and apply the offset:

```
<- geno %*% getAddCoefs(pop) + getAddOffset(pop)
additive 1:5] additive[
```

`## [1] -4.992970 2.676540 4.020194 1.124665 -3.912983`

Compare with the additive component for the first five individuals given by *getComponents*:

`getComponents(pop)$Additive[1:5]`

`## [1] -4.992970 2.676540 4.020194 1.124665 -3.912983`

The function *getEpistasis* returns a matrix:

`head(getEpistasis(pop))`

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.4321997 0.03355358 1.1170048 -0.2463428 -1.1557373 -0.3430831
## [2,] 0.1725427 -0.26700607 0.8759898 -0.9041645 -0.1861366 1.2035432
## [3,] 0.9479921 1.78280165 1.1170048 1.3558287 -0.2343259 0.6207839
## [4,] 0.9479921 1.78280165 0.5806533 1.3558287 -0.8433322 1.1487916
## [5,] -0.8650812 0.21955212 -0.4380132 -1.1667513 0.0960735 2.3181067
## [6,] -0.4515898 -0.01210541 -0.4380132 -0.2463428 -0.2687650 1.2035432
## [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1.27561625 -0.2176913 -0.05497562 0.2563134 -0.2414309 1.167833327
## [2,] 0.40326226 -0.2343021 0.90066350 0.2529619 -0.3650824 -0.184411957
## [3,] -0.02139568 0.5325095 -0.25805079 0.2420587 0.3588936 -0.007341176
## [4,] -0.02139568 0.7274425 -0.05139366 0.2420587 0.3588936 -0.007341176
## [5,] -1.52222251 -0.4841283 0.11019784 0.6658429 -0.9251595 -0.304829929
## [6,] 0.78720461 1.1480047 0.90066350 0.2563134 -0.5680568 -0.184411957
## [,13] [,14] [,15] [,16] [,17] [,18]
## [1,] -0.27423302 -0.9065720 -0.49372796 0.3246601 2.003025 -0.7239022942
## [2,] -0.04230161 -1.7812907 0.04305405 0.5443013 1.331692 -0.4977871598
## [3,] -0.04230161 1.0911978 -0.49372796 -0.6646650 2.003025 0.0008485864
## [4,] -0.03842866 1.0911978 -0.15550355 -0.9031100 1.289131 2.4327597748
## [5,] 0.22452460 -0.1535754 -1.12129228 -0.6646650 1.331692 0.1473070864
## [6,] -0.04230161 0.4534234 -1.12129228 -1.7647522 2.003025 0.1473070864
```

The rows in this matrix are the individuals; the columns are the contributions of each interaction to the overall epistatic component. By summing the rows and applying the epistatic offset to each value, we can derive the contribution of epistasis to each individual’s phenotype:

```
<- rowSums(getEpistasis(pop)) + getEpiOffset(pop)
epistatic 1:5] epistatic[
```

`## [1] 0.2690891 -0.4178922 6.6477155 8.2536251 -4.2158419`

We can similarly compare this result with the epistatic component for the first five individuals given by *getComponents*:

`getComponents(pop)$Epistatic[1:5]`

`## [1] 0.2690891 -0.4178922 6.6477155 8.2536251 -4.2158419`

We can thus easily derive both the additive and epistatic components for each individual. This can be further replicated for individuals not in the population.

Suppose you have a matrix of genotypes (`geno2`

) for five individuals not in the population:

```
<- geno2[, getQTL(pop)$Index]
geno2 geno2
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 1 0 1 0 1 1 1 1 2 1 2 0 0 2
## [2,] 1 1 2 0 1 2 1 0 1 2 0 1 1 1
## [3,] 1 0 1 2 2 1 2 1 2 1 2 2 2 1
## [4,] 1 1 1 0 1 0 2 1 2 0 2 1 0 0
## [5,] 2 1 0 0 2 1 2 2 1 2 1 0 1 1
## [,15] [,16] [,17] [,18] [,19] [,20]
## [1,] 1 0 0 1 1 0
## [2,] 2 1 2 1 1 1
## [3,] 1 1 2 1 1 0
## [4,] 2 1 0 1 1 1
## [5,] 1 2 1 0 2 1
```

We can find their additive components using the following code:

```
<- geno2 %*% getAddCoefs(pop) + getAddOffset(pop)
additive2 1:5] additive2[
```

`## [1] 12.893860 2.783537 -4.146981 5.408116 7.460516`

Similarly, we can find their epistatic components using the following code:

```
<- rowSums(getEpistasis(pop, geno = geno2)) + getEpiOffset(pop)
epistatic2 epistatic2
```

`## [1] -5.221683 -10.206804 2.527887 -4.556824 -4.719111`

Note that this is achieved via the optional *geno* argument in the *getEpistasis* function.

In order to run the simulation, we need to use the *runSim* function.

`<- runSim(pop, generations = 150) popRun `

The above command will iterate through 150 generations, with generation 1 being the initial generation supplied.

There are several optional arguments that can be supplied to the simulator to alter selection, recombination and mutation across generations:

*selection*determines whether random selection (the default) is employed or linear ranking selection^{6}is used (by supplying the string “ranking”);*fitness*determines whether selection occurs based on phenotypic value (the default), true genetic value (by supplying the string “TGV”) or estimated breeding value (by supplying the string “EBV”);*truncSire*and*truncDam*give the proportion of sires and dams, respectively, to include in selection, when sorted by descending phenotype (defaulting to 1);*burnIn*determines how many initial generations will use random selection with no truncation (defaulting to 0);*roundsSire*and*roundsDam*give the maximum number of generations for sires and dams, respectively, to survive within the population, assuming there are enough offspring generated to fill the population (defaulting to 1);*litterDist*is a vector of probabilities for the size of each litter, starting with a litter size of 0 (defaulting to`c(0, 0, 1)`

, i.e. each litter will always contain two offspring);*breedSire*is the maximum number of times a sire can breed within a single generation (defaulting to 10);*mutation*is the rate of mutation for each SNP;*recombination*is a vector of probabilities specifying the rate of recombination between consecutive SNPs in the*map*(obviously excepting consecutive SNPs on different chromosomes);*allGenoFileName*is a string giving the file name to optionally output the genotypes from all generations.

Each mating pair produces a number of full-sibling offspring by sampling once from the litter-size probability mass function given by *litterDist*. The vector can be of arbitrary length, such that in order to specify the possibility of up to \(n\) full-sibling offspring per mating pair, a vector of length \(n + 1\) must be supplied.

Linear ranking selection is a form of weighted stochastic selection: if the individuals in a population of size \(n\) are each given a rank \(r\) based on descending order of fitness (i.e. the individual with the highest fitness is given the rank \(r_1 = 1\) while the individual with the lowest fitness is given the rank \(r_n = n\)), the probability of an individual \(i\) being selected for mating is given by:

\[P(i \textrm{ is selected}) = \frac{2(n - r_i + 1)}{n(n + 1)}\]

For a population of 10, we have the following probabilities of selection for the highest-to-lowest ranked individuals:

```
<- 10
n <- 2 * (n - 1:n + 1) / (n * (n + 1))
pmf pmf
```

```
## [1] 0.18181818 0.16363636 0.14545455 0.12727273 0.10909091 0.09090909
## [7] 0.07272727 0.05454545 0.03636364 0.01818182
```

Fitness itself can be based on the phenotypic value, the true genetic value or the estimated breeding value, with the true genetic value simply being the sum of any additive and epistatic components of the phenotypic value; as stated previously, the estimated breeding value is calculated using gBLUP.

*epinetr* performs selection by first splitting the population into male and female sub-populations. Next, if the round is outside any initial burn-in period, each sub-population is truncated to a proportion of its original size per the values of *truncSire* and *truncDam*, respectively.

When linear ranking selection is used, females are then exhaustively sampled, without replacement, for each mating pair using their linear ranking probabilities, as given above; males are sampled for each mating pair using their linear ranking probabilities but with replacement, where they are each only replaced a maximum number of times as specified by *breedSire*. Random selection occurs in the same manner, but all probabilities are uniform. During any initial burn-in period, random selection is enforced.

As stated above, selection occurs based on fitness: this can be based on the phenotypic value, the true genetic value or the estimated breeding value

Finally, each mating pair produces a number of full-sibling offspring by sampling once from the litter-size probability mass function given by *litterDist* (with the default guaranteeing two full-sibling offspring per mating pair). Half-siblings occur when sires can mate more than once per round (as given by *breedSire*) or sires or dams survive beyond one round (as given by *roundsSire* and *roundsDam*, respectively). Note that in order to maintain the population size, the youngest sires and dams earmarked as reaching the end of their breeding lifespan may be kept in the population to the next round if there is a shortfall of offspring: this can also result in the appearance of half-siblings.

We can perform different simulation runs on the same population as follows:

```
<- runSim(pop, generations = 150, selection = "ranking")
popRunRank <- runSim(pop, generations = 150, burnIn = 50,
popRunBurnIn truncSire = 0.1, truncDam = 0.5,
roundsSire = 5, roundsDam = 5,
litterDist = c(0.1, 0.3, 0.4, 0.2),
breedSire = 7)
<- runSim(pop, generations = 150,
popRunTGV truncSire = 0.1, truncDam = 0.5,
fitness = "TGV")
<- runSim(pop, generations = 150,
popRunEBV truncSire = 0.1, truncDam = 0.5,
fitness = "EBV")
```

To visually compare these runs, we can plot them:

`plot(popRun)`

`plot(popRunRank)`

`plot(popRunBurnIn)`

`plot(popRunTGV)`

`plot(popRunEBV)`

Using the *allGenoFileName* argument in *runSim* allows the simulator to write a serialised file containing all the genotypes generated during the run. To retrieve such a file, we use the *loadGeno* function:

```
<- runSim(pop, generations = 150, allGenoFileName = "geno.epi")
popRun <- loadGeno("geno.epi") geno
```

This object is a matrix in the same phased format as the genotype matrices supplied to the constructor.

`1:5, 1:8] geno[`

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 0 0 0 1 0 0 0 0
## [2,] 0 1 0 0 0 0 0 0
## [3,] 0 0 0 1 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0
## [5,] 0 0 0 1 0 0 0 0
```

To begin retrieving data from a simulation run, we can use the *getPedigree* function to return the pedigree data frame for the population across the entire run. For example:

```
<- getPedigree(popRun)
ped 512:517, ] ped[
```

```
## ID Sire Dam Additive Epistatic Environmental Phenotype EBV
## 512 512 370 368 5.49227630 -0.14132151 0.4684311 5.8193859 3.00544812
## 513 513 305 390 0.70992929 -0.17735042 0.2974583 0.8300372 0.05407006
## 514 514 305 390 0.22261479 4.99607912 -1.3551571 3.8635368 1.77408432
## 515 515 210 388 2.68490168 2.17325966 3.4823950 8.3405564 -1.65336985
## 516 516 210 388 -7.35501516 0.09688397 -0.3280092 -7.5861404 -6.80026577
## 517 517 309 375 0.07909907 -0.99515305 0.8024472 -0.1136068 0.13332138
## Sex Round
## 512 F 3
## 513 F 3
## 514 F 3
## 515 M 3
## 516 M 3
## 517 M 3
```

Note that the originating round is included in the pedigree of each individual.

The *getAlleleFreqRun* function returns the allele frequencies for each SNP per generation. For example:

```
<- getQTL(popRun)$Index
qtl <- getAlleleFreqRun(popRun)
af 1]] af[, qtl[
```

```
## [1] 0.8050 0.8000 0.8225 0.8425 0.8825 0.8825 0.9000 0.9075 0.9150 0.9125
## [11] 0.9125 0.9000 0.9025 0.9025 0.9075 0.9150 0.9075 0.8925 0.8950 0.8825
## [21] 0.8650 0.8650 0.8550 0.8625 0.8750 0.8800 0.8825 0.8600 0.8750 0.8575
## [31] 0.8625 0.8425 0.8325 0.8225 0.8075 0.7750 0.7875 0.8300 0.8075 0.7725
## [41] 0.7975 0.7825 0.7725 0.7975 0.8075 0.7950 0.7775 0.7525 0.7925 0.7600
## [51] 0.7475 0.7375 0.7600 0.7175 0.7125 0.7100 0.7000 0.6975 0.7050 0.6725
## [61] 0.7000 0.7275 0.7325 0.7675 0.7625 0.7675 0.7650 0.7575 0.7100 0.7250
## [71] 0.7175 0.7250 0.7375 0.7225 0.7225 0.7550 0.7350 0.7025 0.6900 0.6900
## [81] 0.7000 0.7300 0.7175 0.7200 0.7400 0.7500 0.7625 0.7975 0.8200 0.8200
## [91] 0.7700 0.7775 0.7475 0.7725 0.7950 0.8000 0.8000 0.7900 0.8000 0.8050
## [101] 0.8225 0.8500 0.8250 0.8125 0.8225 0.8400 0.8125 0.7925 0.7825 0.8350
## [111] 0.8400 0.8375 0.8775 0.8700 0.8425 0.8350 0.8050 0.8125 0.7875 0.7750
## [121] 0.8075 0.8000 0.7800 0.7825 0.7700 0.7575 0.7850 0.7225 0.7625 0.7800
## [131] 0.7575 0.7425 0.7525 0.7450 0.7825 0.7900 0.7600 0.7325 0.7275 0.7500
## [141] 0.7675 0.7450 0.7650 0.7625 0.7825 0.7775 0.7500 0.7475 0.7725 0.7950
```

The *getPhased* function returns the phased genotype matrix for the current population:

```
<- getPhased(popRun)
geno 1:6, 1:10] geno[
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 1 0 0 1 1 1 1 0 1
## [2,] 0 1 1 0 0 1 1 1 0 1
## [3,] 1 1 0 0 1 1 1 1 1 0
## [4,] 0 1 0 0 1 1 1 1 1 0
## [5,] 1 1 0 0 1 1 1 1 1 0
## [6,] 0 1 1 0 1 1 1 1 0 1
```

Alternatively, the *getGeno* function returns the unphased genotype matrix for the current population:

```
<- getGeno(popRun)
geno 1:6, 1:5] geno[
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 2 0 2 2 1
## [2,] 1 1 1 2 1
## [3,] 2 0 2 2 1
## [4,] 1 0 2 2 1
## [5,] 2 0 2 2 1
## [6,] 1 1 2 2 1
```

Finally, we can create a new subpopulation based on the current population by specifying the IDs to use:

```
<- getComponents(popRun)$ID
ID <- sample(ID, 50)
ID <- getSubPop(popRun, ID) popRun2
```

Finally, we can use a pedigree data frame to determine selection, with the data frame giving IDs for each individual as well as its sire and dam IDs. Such a data frame can be retrieved from a previous simulation run using the *getPedigree* function on the resulting population, or we can instead use a new data frame like so:

`201:210, ] pedData[`

```
## ID Sire Dam
## 201 201 54 178
## 202 202 54 178
## 203 203 72 19
## 204 204 72 19
## 205 205 2 114
## 206 206 2 114
## 207 207 84 134
## 208 208 84 134
## 209 209 24 172
## 210 210 24 172
```

To use this “pedigree dropper”, we call *runSim* with the *pedigree* argument:

`<- runSim(pop, pedigree = pedData) popRunPed `

The pedigree dropper first sorts the pedigree into the implicit number of generations, then runs the simulation using selection according to the given data frame. (Note that if you’re using pedigree data from a previous run, the number of generations reported by the pedigree dropper may be different from the number of iterations of the simulator that produced the pedigree.)

As usual, we can plot the resulting run:

`plot(popRunPed)`

The *epinetr* package is a flexible suite of functions designed to allow for the analysis of epistasis under a multitude of conditions, with complex interactions being a core component of the simulation. This vignette is intended as an overview of the package, with as much detail as possible included. That said, the help pages for each function will provide further detail.

1.

VanRaden, P. M. Efficient methods to compute genomic predictions. *Journal of dairy science* **91**, 4414–4423 (2008).

2.

Yang, J., Zeng, J., Goddard, M. E., Wray, N. R. & Visscher, P. M. Concepts, estimation and interpretation of SNP-based heritability. *Nature genetics* **49**, 1304–1310 (2017).

3.

Clark, S. A. & Werf, J. van der. Genomic best linear unbiased prediction (gBLUP) for the estimation of genomic breeding values. in *Genome-wide association studies and genomic prediction* 321–330 (Springer, 2013).

4.

Habier, D., Fernando, R. L. & Garrick, D. J. Genomic BLUP decoded: A look into the black box of genomic prediction. *Genetics* **194**, 597–607 (2013).

5.

Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. *science* **286**, 509–512 (1999).

6.

Goldberg, D. E. & Deb, K. A comparative analysis of selection schemes used in genetic algorithms. in *Foundations of genetic algorithms* vol. 1 69–93 (Elsevier, 1991).