Skip to contents

Perform heritability enrichment analysis of trait based on GWAS summary statistics.

Usage

sHDL(
  D,
  gwas.df,
  LD.path,
  output.file = NULL,
  mc.cores = 1,
  stepwise = TRUE,
  fill.missing.N = c("none", "min", "max", "median", "mean"),
  lim = exp(-18),
  lam.cut = NULL,
  Dr.path = "./Dr",
  overwrite = FALSE,
  verbose = FALSE,
  fix.h2 = NULL,
  fix.intercept = NULL,
  maxit = 1000,
  pgtol = 0.001,
  start.v = c(0.1, 1, 1),
  lwr = NULL,
  upr = NULL,
  mode = c("disk", "memory"),
  pattern = ".*chr(\\d{1,2})\\.(\\d{1,2})[_\\.].*",
  norm.method = c("minmax", "scaled", "none"),
  par.h2 = FALSE,
  nthreads = 1
)

Arguments

D

A single-column matrix of annotation weights with rownames of SNP IDs and colname specifying the annotation name. Must have exactly one column.

gwas.df

A data frame including GWAS summary statistics of genetic variants for a trait. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; N, sample size; Z, z-score; If Z is not given, alternatively, you may provide: b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS.

LD.path

Path to the directory where linkage disequilibrium (LD) information is stored, compatible with HDL LD reference panels.

output.file

Where the log and results should be written. If you do not specify a file, the log will be printed on the console.

mc.cores

Number of cores to use for parallelization, default mc.cores = 1.

stepwise

Whether to estimate enrichment fold by estimating heritability and intercept first, default stepwise = FALSE. If fix.h2 and fix.intercept are not NULL, stepwise will be overridden.

fill.missing.N

If the sample size is missing in the GWAS summary statistics, fill.missing.N is used to fill the missing values.

lim

Tolerance limitation to ensure the positive-definiteness of covariance matrices, default lim = exp(-18).

lam.cut

Eigenvalue cutoff for LD matrices, default lam.cut = NULL, which means no cutoff. For analyses with a limited number of traits and annotations, a lower cutoff (such as 0.1, or even not using a cutoff at all) is recommended. For large-scale analyses, a higher cutoff (such as 1) is recommended, to yield fast computation.

Dr.path

Path to the directory where the Dr matrices are stored, default Dr.path = "./Dr".

overwrite

Whether to overwrite the existing Dr matrices, default overwrite = FALSE.

verbose

Whether to print the log on the console, default verbose = FALSE.

fix.h2

Whether to fix the heritability to fix.h2 or estimate the heritability, default fix.h2 = NULL, which means estimate the heritability.

fix.intercept

Whether to fix the intercept to fix.intercept or estimate the intercept, default fix.intercept = NULL, which means estimate the intercept.

maxit

Maximum number of iterations, default maxit = 1000.

pgtol

Tolerance for convergence, default pgtol = 1e-3.

start.v

A vector of starting values c(h2, intercept, fold) for optimization.

lwr

Lower bounds for c(h2, intercept, fold). Default is NULL, which means c(0, 0.1, 0).

upr

Upper bounds for c(h2, intercept, fold). Default is NULL, which means c(1, 5, M / Md), where Md is the sum of annotation weights and M is the total number of SNPs.

mode

If mode = "disk", Dr is stored to disk (path returned only). If mode = "memory", Dr is loaded to memory (matrix returned). Default is mode = "disk".

pattern

Chromosome and picece pattern of LD files, default is ".*chr(\d{1,2})\.(\d{1,2})[_\.].*".

norm.method

The normalization method, either "minmax" (default), "scaled" or "none". If "minmax", the annotation weight vector D is normalized to [0, 1]. If "scaled", the sum of normalized vector D is scaled to the number of annotated SNPs. If "none", the annotation weight vector D is not normalized.

par.h2

Whether to estimate the partitioned heritability, default par.h2 = FALSE.

nthreads

Number of threads to use for matrix operations, default nthreads = 1. The default value is suitable for most cases, do not change it unless you are sure about the performance.

Value

A list is returned with:

  • time The time used for optimization.

  • fold The estimated heritability enrichment fold.

  • h2 The estimated SNP-based heritability.

  • intercept The estimated intercept.

  • fold.se The standard error of the estimated heritability enrichment fold.

  • h2.se The standard error of the estimated heritability.

  • intercept.se The standard error of the estimated intercept.

  • fold.p P-value based on Wald test for the estimated heritability enrichment fold.

  • h2.p P-value based on Wald test for the estimated heritability.

  • intercept.p P-value based on Wald test for the estimated intercept.

  • stepwise Whether the optimization is done in a stepwise manner.

  • converged Whether the optimization converges.

  • message The message returned by optim.

If par.h2 = TRUE, the following additional items are returned:

  • par.h2 The estimated partitioned heritability.

  • par.h2.se The standard error of the estimated partitioned heritability.

  • par.h2.p P-value based on Wald test for the estimated partitioned heritability.

Note

Users can download the precomputed eigenvalues and eigenvectors of LD correlation matrices for European ancestry population. The download link can be found at https://github.com/zhenin/HDL/wiki/Reference-panels These are the LD matrices and their eigen-decomposition from 335,265 genomic British UK Biobank individuals. Three sets of reference panel are provided: 1) 1,029,876 QCed UK Biobank imputed HapMap3 SNPs. The size is about 33 GB after unzipping. Although it takes more time, using the imputed panel provides more accurate estimates of genetic correlations. Therefore if the GWAS includes most of the HapMap3 SNPs, then we recommend using the imputed reference panel. 2) 769,306 QCed UK Biobank imputed HapMap2 SNPs. The size is about 18 GB after unzipping.If one of your GWAS includes most of the HapMap 2 SNPs, but many SNPs (more than 1 then this HapMap2 panel is more proper to be used for HDL. 3) 307,519 QCed UK Biobank Axiom Array SNPs. The size is about 7.5 GB after unzipping.

References

Lan A and Shen X (2024). Modeling the Genetic Architecture of Complex Traits via Stratified High-Definition Likelihood.

Author

Ao Lan, Xia Shen

Examples

if (FALSE) {
## The GWAS summary statistics for birth weight loaded from HDL package.
data(gwas1.example, package='HDL')
M <- nrow(gwas1.example)
set.seed(1234)
D <- matrix(rbinom(M, 1, 0.01), ncol=1) # random D vector
row.names(D) <- gwas1.example$SNP
colnames(D) <- 'testAnno'

## The path to the directory where linkage disequilibrium (LD) information is stored.
LD.path <- "path/to/UKB_array_SVD_eigen90_extraction"

## To speed up the test, we set a large lam.cut value.
res.sHDL <- sHDL(D, gwas1.example, LD.path, mc.cores=4, stepwise=TRUE, lam.cut=10, Dr.path=NULL, mode="memory")
print(as.data.frame(res.sHDL))
}