Perform heritability enrichment analysis of trait based on GWAS summary statistics.
Source:R/sHDL.R
sHDL.Rd
Perform heritability enrichment analysis of trait based on GWAS summary statistics.
Usage
sHDL(
D,
gwas.df,
LD.path,
output.file = NULL,
nthreads = 1,
stepwise = TRUE,
fill.missing.N = c("none", "min", "max", "median", "mean"),
lim = exp(-18),
lam.cut = NULL,
Dr.path = "./Dr",
overwrite = FALSE,
verbose = FALSE,
fix.h2 = NULL,
fix.intercept = NULL,
maxit = 1000,
pgtol = 0.001,
start.v = c(1, 0.1, 1),
lwr = NULL,
upr = NULL,
mode = c("disk", "memory"),
pattern = ".*chr(\\d{1,2})\\.(\\d{1,2})[_\\.].*",
norm.method = c("minmax", "scaled", "none")
)
Arguments
- D
A vector of genomic annotations with vector names of SNP IDs.
- gwas.df
A data frame including GWAS summary statistics of genetic variants for a trait. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; N, sample size; Z, z-score; If Z is not given, alternatively, you may provide: b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS.
- LD.path
Path to the directory where linkage disequilibrium (LD) information is stored, compatible with HDL LD reference panels.
- output.file
Where the log and results should be written. If you do not specify a file, the log will be printed on the console.
- nthreads
Number of threads to use, default
nthreads = 1
.- stepwise
Whether to estimate enrichment fold by estimating heritability and intercept first, default
stepwise = FALSE
. Iffix.h2
andfix.intercept
are notNULL
,stepwise
will be overridden.- fill.missing.N
If the sample size is missing in the GWAS summary statistics,
fill.missing.N
is used to fill the missing values.- lim
Tolerance limitation to ensure the positive-definiteness of covariance matrices, default
lim = exp(-18)
.- lam.cut
Eigenvalue cutoff for LD matrices, default
lam.cut = NULL
, which means no cutoff.- Dr.path
Path to the directory where the Dr matrices are stored, default
Dr.path = "./Dr"
.- overwrite
Whether to overwrite the existing Dr matrices, default
overwrite = FALSE
.- verbose
Whether to print the log on the console, default
verbose = FALSE
.- fix.h2
Whether to fix the heritability to
fix.h2
or estimate the heritability, defaultfix.h2 = NULL
, which means estimate the heritability.- fix.intercept
Whether to fix the intercept to
fix.intercept
or estimate the intercept, defaultfix.intercept = NULL
, which means estimate the intercept.- maxit
Maximum number of iterations, default
maxit = 1000
.- pgtol
Tolerance for convergence, default
pgtol = 1e-3
.- start.v
A vector of starting values
c(fold, h2, intercept)
for optimization.- lwr
Lower bounds for
c(fold, h2, intercept)
. Default isNULL
, which meansc(0, 0, 0.1)
.- upr
Upper bounds for
c(fold, h2, intercept)
. Default isNULL
, which meansc(M / Md, 1, 5)
, whereMd
is the sum of annotation weights andM
is the total number of SNPs.- mode
Whether to store Dr to disk or memory, default
mode = "disk"
. Ifmode = "disk"
,Dr
is stored to disk (path returned only) and lam are not returned. Ifmode = "memory"
,Dr
andlam
are returned.- pattern
Chromosome and picece pattern of LD files, default is
".*chr(\d{1,2})\.(\d{1,2})[_\.].*"
.- norm.method
The normalization method, either
"minmax"
(default),"scaled"
or"none"
. If"minmax"
, the annotation weight vectorD
is normalized to [0, 1]. If"scaled"
, the sum of normalized vectorD
is scaled to the number of annotated SNPs. If"none"
, the annotation weight vectorD
is not normalized.
Value
A list is returned with:
time The time used for optimization.
fold The estimated heritability enrichment fold.
h2 The estimated SNP-based heritability.
intercept The estimated intercept.
fold.se The standard error of the estimated heritability enrichment fold.
h2.se The standard error of the estimated heritability.
intercept.se The standard error of the estimated intercept.
fold.p P-value based on Wald test for the estimated heritability enrichment fold.
h2.p P-value based on Wald test for the estimated heritability.
intercept.p P-value based on Wald test for the estimated intercept.
stepwise Whether the optimization is done in a stepwise manner.
converged Whether the optimization converges.
message The message returned by
optim
.
Note
Users can download the precomputed eigenvalues and eigenvectors of LD correlation matrices for European ancestry population. The download link can be found at https://github.com/zhenin/HDL/wiki/Reference-panels These are the LD matrices and their eigen-decomposition from 335,265 genomic British UK Biobank individuals. Three sets of reference panel are provided: 1) 1,029,876 QCed UK Biobank imputed HapMap3 SNPs. The size is about 33 GB after unzipping. Although it takes more time, using the imputed panel provides more accurate estimates of genetic correlations. Therefore if the GWAS includes most of the HapMap3 SNPs, then we recommend using the imputed reference panel. 2) 769,306 QCed UK Biobank imputed HapMap2 SNPs. The size is about 18 GB after unzipping.If one of your GWAS includes most of the HapMap 2 SNPs, but many SNPs (more than 1 then this HapMap2 panel is more proper to be used for HDL. 3) 307,519 QCed UK Biobank Axiom Array SNPs. The size is about 7.5 GB after unzipping.
References
Lan A and Shen X (2024). Modeling the Genetic Architecture of Complex Traits via Stratified High-Definition Likelihood.
Examples
if (FALSE) {
## The GWAS summary statistics for birth weight loaded from HDL package.
data(gwas1.example, package='HDL')
M <- nrow(gwas1.example)
set.seed(1234)
D <- rbinom(M, 1, 0.01) # random D vector
names(D) <- gwas1.example$SNP
## The path to the directory where linkage disequilibrium (LD) information is stored.
LD.path <- "path/to/UKB_array_SVD_eigen90_extraction"
## To speed up the test, we set a large lam.cut value.
res.sHDL <- sHDL(D, gwas1.example, LD.path, nthreads=4, stepwise=TRUE, lam.cut=10, Dr.path=NULL, mode="memory")
print(as.data.frame(res.sHDL))
}