
Perform heritability enrichment analysis of trait based on GWAS summary statistics.
Source:R/sHDL.R
sHDL.Rd
Perform heritability enrichment analysis of trait based on GWAS summary statistics.
Usage
sHDL(
D,
gwas.df,
LD.path,
output.file = NULL,
mc.cores = 1,
stepwise = TRUE,
fill.missing.N = c("none", "min", "max", "median", "mean"),
lim = exp(-18),
lam.cut = NULL,
Dr.path = "./Dr",
overwrite = FALSE,
verbose = FALSE,
fix.h2 = NULL,
fix.intercept = NULL,
maxit = 1000,
pgtol = 0.001,
start.v = c(0.1, 1, 1),
lwr = NULL,
upr = NULL,
mode = c("disk", "memory"),
pattern = ".*chr(\\d{1,2})\\.(\\d{1,2})[_\\.].*",
norm.method = c("minmax", "scaled", "none"),
par.h2 = FALSE,
nthreads = 1
)
Arguments
- D
A single-column matrix of annotation weights with rownames of SNP IDs and colname specifying the annotation name. Must have exactly one column.
- gwas.df
A data frame including GWAS summary statistics of genetic variants for a trait. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; N, sample size; Z, z-score; If Z is not given, alternatively, you may provide: b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS.
- LD.path
Path to the directory where linkage disequilibrium (LD) information is stored, compatible with HDL LD reference panels.
- output.file
Where the log and results should be written. If you do not specify a file, the log will be printed on the console.
- mc.cores
Number of cores to use for parallelization, default
mc.cores = 1
.- stepwise
Whether to estimate enrichment fold by estimating heritability and intercept first, default
stepwise = FALSE
. Iffix.h2
andfix.intercept
are notNULL
,stepwise
will be overridden.- fill.missing.N
If the sample size is missing in the GWAS summary statistics,
fill.missing.N
is used to fill the missing values.- lim
Tolerance limitation to ensure the positive-definiteness of covariance matrices, default
lim = exp(-18)
.- lam.cut
Eigenvalue cutoff for LD matrices, default
lam.cut = NULL
, which means no cutoff. For analyses with a limited number of traits and annotations, a lower cutoff (such as 0.1, or even not using a cutoff at all) is recommended. For large-scale analyses, a higher cutoff (such as 1) is recommended, to yield fast computation.- Dr.path
Path to the directory where the Dr matrices are stored, default
Dr.path = "./Dr"
.- overwrite
Whether to overwrite the existing Dr matrices, default
overwrite = FALSE
.- verbose
Whether to print the log on the console, default
verbose = FALSE
.- fix.h2
Whether to fix the heritability to
fix.h2
or estimate the heritability, defaultfix.h2 = NULL
, which means estimate the heritability.- fix.intercept
Whether to fix the intercept to
fix.intercept
or estimate the intercept, defaultfix.intercept = NULL
, which means estimate the intercept.- maxit
Maximum number of iterations, default
maxit = 1000
.- pgtol
Tolerance for convergence, default
pgtol = 1e-3
.- start.v
A vector of starting values
c(h2, intercept, fold)
for optimization.- lwr
Lower bounds for
c(h2, intercept, fold)
. Default isNULL
, which meansc(0, 0.1, 0)
.- upr
Upper bounds for
c(h2, intercept, fold)
. Default isNULL
, which meansc(1, 5, M / Md)
, whereMd
is the sum of annotation weights andM
is the total number of SNPs.- mode
If
mode = "disk"
,Dr
is stored to disk (path returned only). Ifmode = "memory"
,Dr
is loaded to memory (matrix returned). Default ismode = "disk"
.- pattern
Chromosome and picece pattern of LD files, default is
".*chr(\d{1,2})\.(\d{1,2})[_\.].*"
.- norm.method
The normalization method, either
"minmax"
(default),"scaled"
or"none"
. If"minmax"
, the annotation weight vectorD
is normalized to [0, 1]. If"scaled"
, the sum of normalized vectorD
is scaled to the number of annotated SNPs. If"none"
, the annotation weight vectorD
is not normalized.- par.h2
Whether to estimate the partitioned heritability, default
par.h2 = FALSE
.- nthreads
Number of threads to use for matrix operations, default
nthreads = 1
. The default value is suitable for most cases, do not change it unless you are sure about the performance.
Value
A list is returned with:
time The time used for optimization.
fold The estimated heritability enrichment fold.
h2 The estimated SNP-based heritability.
intercept The estimated intercept.
fold.se The standard error of the estimated heritability enrichment fold.
h2.se The standard error of the estimated heritability.
intercept.se The standard error of the estimated intercept.
fold.p P-value based on Wald test for the estimated heritability enrichment fold.
h2.p P-value based on Wald test for the estimated heritability.
intercept.p P-value based on Wald test for the estimated intercept.
stepwise Whether the optimization is done in a stepwise manner.
converged Whether the optimization converges.
message The message returned by
optim
.
If par.h2 = TRUE
, the following additional items are returned:
par.h2 The estimated partitioned heritability.
par.h2.se The standard error of the estimated partitioned heritability.
par.h2.p P-value based on Wald test for the estimated partitioned heritability.
Note
Users can download the precomputed eigenvalues and eigenvectors of LD correlation matrices for European ancestry population. The download link can be found at https://github.com/zhenin/HDL/wiki/Reference-panels These are the LD matrices and their eigen-decomposition from 335,265 genomic British UK Biobank individuals. Three sets of reference panel are provided: 1) 1,029,876 QCed UK Biobank imputed HapMap3 SNPs. The size is about 33 GB after unzipping. Although it takes more time, using the imputed panel provides more accurate estimates of genetic correlations. Therefore if the GWAS includes most of the HapMap3 SNPs, then we recommend using the imputed reference panel. 2) 769,306 QCed UK Biobank imputed HapMap2 SNPs. The size is about 18 GB after unzipping.If one of your GWAS includes most of the HapMap 2 SNPs, but many SNPs (more than 1 then this HapMap2 panel is more proper to be used for HDL. 3) 307,519 QCed UK Biobank Axiom Array SNPs. The size is about 7.5 GB after unzipping.
References
Lan A and Shen X (2024). Modeling the Genetic Architecture of Complex Traits via Stratified High-Definition Likelihood.
Examples
if (FALSE) {
## The GWAS summary statistics for birth weight loaded from HDL package.
data(gwas1.example, package='HDL')
M <- nrow(gwas1.example)
set.seed(1234)
D <- matrix(rbinom(M, 1, 0.01), ncol=1) # random D vector
row.names(D) <- gwas1.example$SNP
colnames(D) <- 'testAnno'
## The path to the directory where linkage disequilibrium (LD) information is stored.
LD.path <- "path/to/UKB_array_SVD_eigen90_extraction"
## To speed up the test, we set a large lam.cut value.
res.sHDL <- sHDL(D, gwas1.example, LD.path, mc.cores=4, stepwise=TRUE, lam.cut=10, Dr.path=NULL, mode="memory")
print(as.data.frame(res.sHDL))
}