Skip to contents

Perform heritability enrichment analysis of trait based on GWAS summary statistics.

Usage

sHDL(
  D,
  gwas.df,
  LD.path,
  output.file = NULL,
  nthreads = 1,
  stepwise = TRUE,
  fill.missing.N = c("none", "min", "max", "median", "mean"),
  lim = exp(-18),
  lam.cut = NULL,
  Dr.path = "./Dr",
  overwrite = FALSE,
  verbose = FALSE,
  fix.h2 = NULL,
  fix.intercept = NULL,
  maxit = 1000,
  pgtol = 0.001,
  start.v = c(1, 0.1, 1),
  lwr = NULL,
  upr = NULL,
  mode = c("disk", "memory"),
  pattern = ".*chr(\\d{1,2})\\.(\\d{1,2})[_\\.].*",
  norm.method = c("minmax", "scaled", "none")
)

Arguments

D

A vector of genomic annotations with vector names of SNP IDs.

gwas.df

A data frame including GWAS summary statistics of genetic variants for a trait. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; N, sample size; Z, z-score; If Z is not given, alternatively, you may provide: b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS.

LD.path

Path to the directory where linkage disequilibrium (LD) information is stored, compatible with HDL LD reference panels.

output.file

Where the log and results should be written. If you do not specify a file, the log will be printed on the console.

nthreads

Number of threads to use, default nthreads = 1.

stepwise

Whether to estimate enrichment fold by estimating heritability and intercept first, default stepwise = FALSE. If fix.h2 and fix.intercept are not NULL, stepwise will be overridden.

fill.missing.N

If the sample size is missing in the GWAS summary statistics, fill.missing.N is used to fill the missing values.

lim

Tolerance limitation to ensure the positive-definiteness of covariance matrices, default lim = exp(-18).

lam.cut

Eigenvalue cutoff for LD matrices, default lam.cut = NULL, which means no cutoff.

Dr.path

Path to the directory where the Dr matrices are stored, default Dr.path = "./Dr".

overwrite

Whether to overwrite the existing Dr matrices, default overwrite = FALSE.

verbose

Whether to print the log on the console, default verbose = FALSE.

fix.h2

Whether to fix the heritability to fix.h2 or estimate the heritability, default fix.h2 = NULL, which means estimate the heritability.

fix.intercept

Whether to fix the intercept to fix.intercept or estimate the intercept, default fix.intercept = NULL, which means estimate the intercept.

maxit

Maximum number of iterations, default maxit = 1000.

pgtol

Tolerance for convergence, default pgtol = 1e-3.

start.v

A vector of starting values c(fold, h2, intercept) for optimization.

lwr

Lower bounds for c(fold, h2, intercept). Default is NULL, which means c(0, 0, 0.1).

upr

Upper bounds for c(fold, h2, intercept). Default is NULL, which means c(M / Md, 1, 5), where Md is the sum of annotation weights and M is the total number of SNPs.

mode

Whether to store Dr to disk or memory, default mode = "disk". If mode = "disk", Dr is stored to disk (path returned only) and lam are not returned. If mode = "memory", Dr and lam are returned.

pattern

Chromosome and picece pattern of LD files, default is ".*chr(\d{1,2})\.(\d{1,2})[_\.].*".

norm.method

The normalization method, either "minmax" (default), "scaled" or "none". If "minmax", the annotation weight vector D is normalized to [0, 1]. If "scaled", the sum of normalized vector D is scaled to the number of annotated SNPs. If "none", the annotation weight vector D is not normalized.

Value

A list is returned with:

  • time The time used for optimization.

  • fold The estimated heritability enrichment fold.

  • h2 The estimated SNP-based heritability.

  • intercept The estimated intercept.

  • fold.se The standard error of the estimated heritability enrichment fold.

  • h2.se The standard error of the estimated heritability.

  • intercept.se The standard error of the estimated intercept.

  • fold.p P-value based on Wald test for the estimated heritability enrichment fold.

  • h2.p P-value based on Wald test for the estimated heritability.

  • intercept.p P-value based on Wald test for the estimated intercept.

  • stepwise Whether the optimization is done in a stepwise manner.

  • converged Whether the optimization converges.

  • message The message returned by optim.

Note

Users can download the precomputed eigenvalues and eigenvectors of LD correlation matrices for European ancestry population. The download link can be found at https://github.com/zhenin/HDL/wiki/Reference-panels These are the LD matrices and their eigen-decomposition from 335,265 genomic British UK Biobank individuals. Three sets of reference panel are provided: 1) 1,029,876 QCed UK Biobank imputed HapMap3 SNPs. The size is about 33 GB after unzipping. Although it takes more time, using the imputed panel provides more accurate estimates of genetic correlations. Therefore if the GWAS includes most of the HapMap3 SNPs, then we recommend using the imputed reference panel. 2) 769,306 QCed UK Biobank imputed HapMap2 SNPs. The size is about 18 GB after unzipping.If one of your GWAS includes most of the HapMap 2 SNPs, but many SNPs (more than 1 then this HapMap2 panel is more proper to be used for HDL. 3) 307,519 QCed UK Biobank Axiom Array SNPs. The size is about 7.5 GB after unzipping.

References

Lan A and Shen X (2024). Modeling the Genetic Architecture of Complex Traits via Stratified High-Definition Likelihood.

Author

Ao Lan, Xia Shen

Examples

if (FALSE) {
## The GWAS summary statistics for birth weight loaded from HDL package.
data(gwas1.example, package='HDL')
M <- nrow(gwas1.example)
set.seed(1234)
D <- rbinom(M, 1, 0.01) # random D vector
names(D) <- gwas1.example$SNP

## The path to the directory where linkage disequilibrium (LD) information is stored.
LD.path <- "path/to/UKB_array_SVD_eigen90_extraction"

## To speed up the test, we set a large lam.cut value.
res.sHDL <- sHDL(D, gwas1.example, LD.path, nthreads=4, stepwise=TRUE, lam.cut=10, Dr.path=NULL, mode="memory")
print(as.data.frame(res.sHDL))
}