Perform heritability enrichment analysis of trait based on GWAS summary statistics.

Usage

sHDL(
  D,
  gwas.df,
  LD.path,
  output.file = NULL,
  mc.cores = 1,
  stepwise = TRUE,
  fill.missing.N = c("none", "min", "max", "median", "mean"),
  lim = exp(-18),
  lam.cut = NULL,
  Dr.path = "./Dr",
  overwrite = FALSE,
  verbose = FALSE,
  fix.h2 = NULL,
  fix.intercept = NULL,
  maxit = 1000,
  pgtol = 0.001,
  start.v = c(0.1, 1, 1),
  lwr = NULL,
  upr = NULL,
  mode = c("disk", "memory"),
  pattern = ".*chr(\\d{1,2})\\.(\\d{1,2})[_\\.].*",
  norm.method = c("minmax", "scaled", "none"),
  par.h2 = FALSE,
  nthreads = 1
)

Arguments

D: A single-column matrix of annotation weights with rownames of SNP IDs and colname specifying the annotation name. Must have exactly one column.
gwas.df: A data frame including GWAS summary statistics of genetic variants for a trait. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; N, sample size; Z, z-score; If Z is not given, alternatively, you may provide: b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS.
LD.path: Path to the directory where linkage disequilibrium (LD) information is stored, compatible with HDL LD reference panels.
output.file: Where the log and results should be written. If you do not specify a file, the log will be printed on the console.
mc.cores: Number of cores to use for parallelization, default mc.cores = 1.
stepwise: Whether to estimate enrichment fold by estimating heritability and intercept first, default stepwise = FALSE. If fix.h2 and fix.intercept are not NULL, stepwise will be overridden.
fill.missing.N: If the sample size is missing in the GWAS summary statistics, fill.missing.N is used to fill the missing values.
lim: Tolerance limitation to ensure the positive-definiteness of covariance matrices, default lim = exp(-18).
lam.cut: Eigenvalue cutoff for LD matrices, default lam.cut = NULL, which means no cutoff. For analyses with a limited number of traits and annotations, a lower cutoff (such as 0.1, or even not using a cutoff at all) is recommended. For large-scale analyses, a higher cutoff (such as 1) is recommended, to yield fast computation.
Dr.path: Path to the directory where the Dr matrices are stored, default Dr.path = "./Dr".
overwrite: Whether to overwrite the existing Dr matrices, default overwrite = FALSE.
verbose: Whether to print the log on the console, default verbose = FALSE.
fix.h2: Whether to fix the heritability to fix.h2 or estimate the heritability, default fix.h2 = NULL, which means estimate the heritability.
fix.intercept: Whether to fix the intercept to fix.intercept or estimate the intercept, default fix.intercept = NULL, which means estimate the intercept.
maxit: Maximum number of iterations, default maxit = 1000.
pgtol: Tolerance for convergence, default pgtol = 1e-3.
start.v: A vector of starting values c(h2, intercept, fold) for optimization.
lwr: Lower bounds for c(h2, intercept, fold). Default is NULL, which means c(0, 0.1, 0).
upr: Upper bounds for c(h2, intercept, fold). Default is NULL, which means c(1, 5, M / Md), where Md is the sum of annotation weights and M is the total number of SNPs.
mode: If mode = "disk", Dr is stored to disk (path returned only). If mode = "memory", Dr is loaded to memory (matrix returned). Default is mode = "disk".
pattern: Chromosome and picece pattern of LD files, default is ".*chr(\d{1,2})\.(\d{1,2})[_\.].*".
norm.method: The normalization method, either "minmax" (default), "scaled" or "none". If "minmax", the annotation weight vector D is normalized to [0, 1]. If "scaled", the sum of normalized vector D is scaled to the number of annotated SNPs. If "none", the annotation weight vector D is not normalized.
par.h2: Whether to estimate the partitioned heritability, default par.h2 = FALSE.
nthreads: Number of threads to use for matrix operations, default nthreads = 1. The default value is suitable for most cases, do not change it unless you are sure about the performance.

Value

A data.frame is returned with:

item The name of the parameter.
estimation The estimated value of the parameter.
se The standard error of the parameter.
p The p-value of the parameter.
note The note of the parameter.

Note

Users can download the precomputed eigenvalues and eigenvectors of LD correlation matrices for European ancestry population. The download link can be found at https://github.com/zhenin/HDL/wiki/Reference-panels These are the LD matrices and their eigen-decomposition from 335,265 genomic British UK Biobank individuals. Three sets of reference panel are provided: 1) 1,029,876 QCed UK Biobank imputed HapMap3 SNPs. The size is about 33 GB after unzipping. Although it takes more time, using the imputed panel provides more accurate estimates of genetic correlations. Therefore if the GWAS includes most of the HapMap3 SNPs, then we recommend using the imputed reference panel. 2) 769,306 QCed UK Biobank imputed HapMap2 SNPs. The size is about 18 GB after unzipping.If one of your GWAS includes most of the HapMap 2 SNPs, but many SNPs (more than 1 then this HapMap2 panel is more proper to be used for HDL. 3) 307,519 QCed UK Biobank Axiom Array SNPs. The size is about 7.5 GB after unzipping.

References

Lan A and Shen X (2024). Modeling the Genetic Architecture of Complex Traits via Stratified High-Definition Likelihood.

Author

Ao Lan, Xia Shen

Examples

if (FALSE) {
## The GWAS summary statistics for birth weight loaded from HDL package.
data(gwas1.example, package='HDL')
M <- nrow(gwas1.example)
set.seed(1234)
D <- matrix(rbinom(M, 1, 0.01), ncol=1) # random D vector
row.names(D) <- gwas1.example$SNP
colnames(D) <- 'testAnno'

## The path to the directory where linkage disequilibrium (LD) information is stored.
LD.path <- "path/to/UKB_array_SVD_eigen90_extraction"

## To speed up the test, we set a large lam.cut value.
res.sHDL <- sHDL(D, gwas1.example, LD.path, mc.cores=4, stepwise=TRUE, lam.cut=10, Dr.path=NULL, mode="memory")
print(as.data.frame(res.sHDL))
}