CoPheScan: Input data

Ichcha Manipur

2024-03-11

The input dataset for a trait (querytrait) should contain the summary data for SNPs in a genomic region around the query variant (querysnpid) and should have the following fields:

For a Case-control dataset

beta: \(\beta\) or effect size

varbeta: variance of \(\beta\) or square of the standard error of \(\beta\)

snp: SNP identifier which maybe rsid or CHR_BP_REF_ALT or CHR_BP

type:‘cc’

N: sample size

For a Quantitave dataset

When, beta and varbeta are not available the following

beta: \(\beta\) or effect size

varbeta: variance of \(\beta\) or square of the standard error of \(\beta\)

snp: SNP identifier which maybe rsid or CHR_BP_REF_ALT or CHR_BP

type:‘quant’

N: sample size

sdY: for a quantitative trait, the population standard deviation of the trait.

Additional fields in case of missing beta/varbeta or sdY

MAF: Minor allele frequency (only required when either beta/varbeta or sdY are unavailable)

pvalues: only required when beta/varbeta are unavailable

s: fraction of samples that are cases (only for a case-control trait when beta/varbeta are unavailable)

library(cophescan)

Explore the data structure of the example dataset available in the cophescan package

data("cophe_multi_trait_data")
trait_dat = cophe_multi_trait_data$summ_stat$Trait_1
str(trait_dat)
#> List of 8
#>  $ beta   : Named num [1:1000] -0.01369 0.01666 0.09057 -0.00571 -0.05606 ...
#>   ..- attr(*, "names")= chr [1:1000] "chr19-11173352" "chr19-11173626" "chr19-11173716" "chr19-11173807" ...
#>  $ varbeta: Named num [1:1000] 0.000516 0.000399 0.003124 0.000419 0.000473 ...
#>   ..- attr(*, "names")= chr [1:1000] "chr19-11173352" "chr19-11173626" "chr19-11173716" "chr19-11173807" ...
#>  $ z      : Named num [1:1000] -0.603 0.834 1.62 -0.279 -2.578 ...
#>   ..- attr(*, "names")= chr [1:1000] "chr19-11173352" "chr19-11173626" "chr19-11173716" "chr19-11173807" ...
#>  $ snp    : chr [1:1000] "chr19-11173352" "chr19-11173626" "chr19-11173716" "chr19-11173807" ...
#>  $ MAF    : Named num [1:1000] 0.2614 0.4871 0.0318 0.4046 0.3042 ...
#>   ..- attr(*, "names")= chr [1:1000] "chr19-11173352" "chr19-11173626" "chr19-11173716" "chr19-11173807" ...
#>  $ type   : chr "cc"
#>  $ N      : num 20000
#>  $ s      : num 0.5

Additional field for cophe.susie

LD: Linkage Disequilibrium matrix with row and column names being the same as the snp field.

trait_dat$LD = cophe_multi_trait_data$LD
str(trait_dat$LD[1:10, 1:10])
#>  num [1:10, 1:10] 1 0.0267 -0.1078 -0.0627 0.1033 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:10] "chr19-11173352" "chr19-11173626" "chr19-11173716" "chr19-11173807" ...
#>   ..$ : chr [1:10] "chr19-11173352" "chr19-11173626" "chr19-11173716" "chr19-11173807" ...

It is important to check that there is alignment of alleles for which the beta is reported and those in the LD matrix. This can be verified either using coloc::check_alignment or performing a diagnostic check using the susie package https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html.

Note