If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Genetics, Yale University School of Medicine, New Haven, CT, USADepartment of Genetics, Washington University School of Medicine, St. Louis, MO, USA
Department of Genetics, Yale University School of Medicine, New Haven, CT, USADepartment of Biomedical Sciences, Korea University College of Medicine, Seoul, Korea
Cardiovascular Research Institute, University of California, San Francisco, CA, USADepartment of Medicine, University of California, San Francisco, CA, USADepartment of Pediatrics, University of California, San Francisco, CA, USA
Cardiovascular Research Institute, University of California, San Francisco, CA, USADepartment of Medicine, University of California, San Francisco, CA, USADepartment of Dermatology, University of California, San Francisco, CA, USA
Cardiovascular Research Institute, University of California, San Francisco, CA, USADepartment of Medicine, University of California, San Francisco, CA, USADepartment of Biochemistry and Biophysics, University of California, San Francisco, CA, USA
Cardiovascular Research Institute, University of California, San Francisco, CA, USAPhysiological Nursing, University of California, San Francisco, CA, USA
Low levels of high density lipoprotein-cholesterol (HDL-C) are associated with an elevated risk of arteriosclerotic coronary heart disease. Heritability of HDL-C levels is high. In this research discovery study, we used whole-exome sequencing to identify damaging gene variants that may play significant roles in determining HDL-C levels. We studied 204 individuals with a mean HDL-C level of 27.8 ± 6.4 mg/dl (range: 4–36 mg/dl). Data were analyzed by statistical gene burden testing and by filtering against candidate gene lists. We found 120 occurrences of probably damaging variants (116 heterozygous; four homozygous) among 45 of 104 recognized HDL candidate genes. Those with the highest prevalence of damaging variants were ABCA1 (n = 20), STAB1 (n = 9), OSBPL1A (n = 8), CPS1 (n = 8), CD36 (n = 7), LRP1 (n = 6), ABCA8 (n = 6), GOT2 (n = 5), AMPD3 (n = 5), WWOX (n = 4), and IRS1 (n = 4). Binomial analysis for damaging missense or loss-of-function variants identified the ABCA1 and LDLR genes at genome-wide significance. In conclusion, whole-exome sequencing of individuals with low HDL-C showed the burden of damaging rare variants in the ABCA1 and LDLR genes is particularly high and revealed numerous occurrences in HDL candidate genes, including many genes identified in genome-wide association study reports. Many of these genes are involved in cancer biology, which accords with epidemiologic findings of the association of HDL deficiency with increased risk of cancer, thus presenting a new area of interest in HDL genomics.
Clinical arteriosclerotic coronary heart disease (CHD) is a multifactorial disorder. Circulating lipid and lipoprotein levels, notably, triglycerides (TGs), low density lipoproteins (LDLs), and high density lipoproteins (HDLs), are independent risk factors, and other components such as TG-rich remnants and prebeta-1 HDL play important roles too. Here we report the results of a research discovery genomic study designed to understand better the genetic causes of inherited hypoalphalipoproteinemia, low levels of plasma HDL-cholesterol (HDL-C). Over forty years ago, epidemiologic studies established that low levels of HDL-C are independently associated with an elevated risk of CHD and carotid disease (
National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) Third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) final report.
). Those participants in the Framingham Heart Study with HDL-C levels below 35 mg/dl had eight times the prevalence of CHD compared to those with levels above 65 mg/dl (
). Other studies, including the Lipid Research Clinics Prevalence Mortality Follow-up Study, the Coronary Primary Prevention Trial, and the Multiple Risk Factor Intervention Trial (MRFIT) study, supported these findings (
) is likely a major contributor here is inferred by studies that have shown cholesterol efflux capacity to be associated with atherosclerotic cardiovascular disease risk (
). It is now appreciated that the level of HDL-C itself is not an indication of the precise number of HDL particles, their functionality, the distribution of the numerous HDL subspecies, or the rate of RCT (
). Most cells, macrophages especially, efflux cholesterol to the RCT retrieval pathway to maintain cholesterol homeostasis. This process involves the ATP-binding cassette transporters ABCA1, ABCG1, and the scavenger receptor scavenger receptor class B, type 1 (
HDL also has a signaling role that is important to endothelial and platelet function, lymphocyte trafficking, and angiogenesis that involves HDL-bound sphingosine 1-phosphate (
). The extent to which a low level of HDL-C, resulting from known genetic causes, is a direct risk factor for atherosclerotic disease has become controversial recently with some authors doubting the genetic link (
). Thus, it was considered that genetic variation that resulted in a decreased level of HDL-C might be a risk factor for CHD. Since that time, much effort has been undertaken to discover the underlying genetic variations that associate with HDL-C levels. Studies in both humans and animals initially revealed several genes that contribute to the variance in levels of HDL-C. These include the apolipoprotein A-1 gene (APOA1), ABCA1, lecithin cholesterol acyltransferase (LCAT), phospholipid transfer protein (PLTP), cholesteryl ester transfer protein (CETP), and hepatic lipase (LIPC) (
). Genome-wide association studies (GWASs) in large populations have now revealed numerous additional candidates, though it is important to point out that the effect sizes of many individual gene variants are often small (
). Often, mutations in genes such as lipoprotein lipase (LPL), for example, that result in a sizable elevation of TGs are associated also with decreased HDL-C. Much of the lowering of HDL-C here is due to the well-known entropically driven transfer of cholesteryl esters, facilitated by CETP, from the core of HDL particles to TG-rich lipoproteins (
). This transfer manifests as a hyperbolic inverse relationship between levels of TG and HDL-C.
We performed whole-exome sequencing on 204 individuals with low levels of HDL-C. These were selected from 21,639 individuals in the UCSF Genomic Resource in Arteriosclerosis (GRA) (
), a significant number (∼40%) of those with low levels of HDL-C have secondary causes, most notably hypertriglyceridemia, we selected participants taking into account the known inverse hyperbolic relationship between TG and HDL-C (
The sequencing data were analyzed by two separate approaches. Firstly, statistical gene burden testing was used to identify potentially causal rare variants, particularly from novel genes. Secondly, after preliminary filtering to identify rare coding variants, we filtered against two lists of candidate genes. The first list consisted of 594 lipid metabolism related genes, and this included all 23 lipid-related genes sequenced by Geller and colleagues in their study of subjects with low HDL-C (
). The other list was a subset from the first list and comprised 104 genes well established as being involved in HDL metabolism or associated robustly with levels of HDL-C in GWASs. We then used 10 deleteriousness assessment tools and the ClinVar public archive to determine the degree of probability that a particular variant was functionally damaging and therefore likely to be disease causing.
Our aim in these studies was to provide a valuable resource for future cardiovascular research by compiling a list of rare gene variants that have a high likelihood of being functionally damaging and in many cases clearly pathogenic. We believe this study is unique in applying exome sequencing specifically to a cohort with HDL deficiency. Others have used targeted sequenced of limited numbers of candidate genes, and we have compared our findings to these previous studies (
) were included in this study. Because this was a discovery study, no formal power calculation was done. At the time blood samples were collected, none of the individuals analyzed in this study were taking a lipid-altering medication. They were selected based on a plasma level of HDL-C below the 10th percentile of the GRA population, after adjusting for sex, and plasma level of TG based on the known hyperbolic inverse relationship between levels of HDL-C and TG (
). Briefly, using data on a total of 4,140 GRA subjects, we derived a hyperbolic 10th percentile isopleth function that takes the form y = m1 + ((m2∗m3)/(m2 + x)), where y is HDL-C, x is TG, and m1, m2, and m3 are constants. Participants completed a questionnaire to document medical history, clinically important lifestyle factors, and family history. Many of those included in this study attended a tertiary lipid clinic and had other dyslipidemias. Study participants were subdivided into four groups based on subphenotypes of dyslipidemia. We used a cutoff for defining hypertriglyceridemia as recommended by the National Cholesterol Education Program Adult Treatment Panel III (
National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) Third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) final report.
); TG ≥150 mg/dl. Participants were considered to have high LDL-C if the value was ≥160 mg/dl, except in the case of four participants under 18 years old where the 90th percentile sex- and age-adjusted values were used (
Population Studies Data Book: Vol. I, The Prevalence Study. U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health (NIH publication number 80-1527),
Washington, DC1980
). There were five kinships included among the 204 participants. There were two sib pairs, one mother daughter pair, and two father son pairs. All study participants gave written informed consent prior to their enrollment in the study, which adhered to the World Medical Association Declaration of Helsinki and was approved by the UCSF Institutional Review Board as part of the UCSF Human Research Protection Program. Children were included with parental consent.
DNA preparation and biochemical analyses
Blood samples were collected, after an overnight fast, in tubes containing 0.1% EDTA. Genomic DNA was extracted using the Wizard purification kit (Qiagen, Germantown, MD). Plasma was obtained after centrifugation at 3,000 rpm for 20 min at 4°C. Levels in plasma of total cholesterol (TC), HDL-C, and TG were measured using an automated chemical analyzer (COBAS Chemistry analyzer) as previously described (
) when TG levels were below 400 mg/dl. For some participants, including five with TG concentrations above 400 mg/dl, LDL-C was determined by sequential ultracentrifugation. Levels in plasma of prebeta HDL, measured as apoA1 protein content, were determined as previously described (
Samples sequenced at Yale were exon-captured using IDt xGen target capture kit followed by 99 base-paired-end sequencing on the Illumina platform.
Samples sequenced at the UCSF were exon-captured by Roche NimbleGen SeqCap EZ library probe, and the captured libraries were sequenced on the HiSeq2500. Processing of image files was performed using a standard protocol. Raw image files were analyzed and converted to base calls by real-time analysis using the recommended default settings. Real-time analysis output base call files (∗.bcl) were converted to FASTQ files with consensus assessment of sequence and variation using bcl2fastq pipeline.
Quality control, read alignment, and variant calling
Blue Collar Bioinformatics (Bcbio v1.0.0) was used to run the QC pipeline. We used Burrows-Wheeler Aligner (
) (BWA v0.7.15) paired-end mode to map reads to the human reference genome (hg19). BAM files obtained were used in subsequent steps. Picard (http://broadinstitute.github.io/picard/) (v2.5.0) was used to sort mapped reads into coordinate order and to ensure all mate-pair information was properly updated. Picard was also used to mark duplicate and low-quality reads, defined as those with low mapping quality score, that were unpaired or unmapped, or that failed a platform/vendor quality check. Five callers were then used for variant calling: Genome Analysis Tool Kit (
For post-variant calling analysis, filtered call sets were concatenated to create a nonredundant variant list. Synonymous, intronic, intergenic, and UTR variants were removed. Variants fulfilling the following criteria were kept: a quality score (QUAL) >40; a read depth (DP) ≥20; a genotype quality (GQ) of ≥60 if called by GATK, GATK haplotype caller, or Freebayes; variants with >1 caller. A random selection of 12 with only 1 caller were all shown to be artifacts by Sanger sequencing. Annovar (
). Annovar output was filtered further based on a list of common gene variants provided with the Ingenuity Variant Analysis suite. The top 50th percent of these were removed from our call set. We then filtered by gnomAD minor allele frequency (MAF). MAF cutoff threshold of 0.01 was used to identify heterozygous variants with 0.05 for the homozygous variants. For platform-specific variants, we removed those found in more than 40% of the samples. This remaining variant set was used for the “candidate gene analysis” described in the following.
For binomial analysis, variants were called using GATK Haplotype Caller and annotated using Annovar. High-quality variants which passed GATK variant quality score calibration with a read depth (DP) ≥8, a genotype quality score (GQ) ≥20, a mapping quality (MQ) ≥40, at least three supporting reads, and not falling in the low-complexity regions were kept (
). Rare variants with MAF ≤1E-05, 1E-04 or 1E-03 in ExAC, 1000 Genome, and NHLBI Exome Sequencing Project databases were examined. Damaging variants including the loss-of-function (LoF) variants and damaging missense (D-Mis) variants were considered for the analysis. LoF variants are defined as stop-gain, stop-loss, frameshift insertions/deletions, splice site disruption, and start-loss. D-Mis variants are nonsynonymous variants predicted as deleterious by MetaSVM (
For the candidate gene analysis approach, we evaluated the NGS data after the post-variant calling analysis to determine the degree to which individual variants were causative, firstly in 594 candidate genes including those in lipid metabolism pathways plus GWAS hits (supplemental Table S1). This is an expanded list of genes that we previously reported (
) or that are empirically related (supplemental Table S2). In both cases, we separately evaluated the data for heterozygous variants and for homozygous variants. Rare variants for those with elevated either LDL-C or TG were separately also filtered against lists of genes associated with levels of LDL or TG as appropriate (
Because this was a research discovery study, and we were more interested to look at the overall deleterious gene burden in this cohort with low HDL-C, and it was not designed to provide individual clinical assessments, we did not strictly follow the American College of Medical Genetics (ACMG) guidelines (
ACMG Laboratory Quality Assurance Committee Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
). Rather, we developed an overall damaging prediction based on 10 separate variant impact prediction tools and used an overall 80th percentile cutoff to retain deleterious variants. Those between the 50th and 80th percentiles were considered to be of undetermined significance and the rest (<50th percentile) likely benign. Of 66 variants with a “pathogenic” entry in ClinVar, by this approach, we classified only six as likely benign. Those variants with a damaging prediction over the 50th percentile and an unequivocal ClinVar pathogenic entry were considered pathogenic. The 10 algorithms we used were SIFT, PolyPhen (HVAR), MutationTaster, MutationAssessor, FATHMM, PROVEAN, METASVM, MetaLR, MCAP, and FATHMM-MKL.
Binomial analysis
A one-tailed binomial test was used to examine the enrichment of an observed number of damaging dominant variants in each gene in comparison with expectation as described before (
). The expected number of variants was calculated based on de novo mutability through the following formula:
where ‘i’ denotes the ‘ith’ gene and ‘N’ denotes the total number of damaging dominant variants. The de novo mutability for each type of variant in each gene was calculated based on trinucleotide contexts as described by Samocha et al. (
) to functionally classify the genes. We used Fisher’s exact test with false discovery rate correction to compare our list with the PANTHER human database.
Copy number variant analysis
Copy number variations were called using XHMM (eXome-Hidden Markov Model) software (
). GATK DepthOfCoverage was first used to calculate mean read coverage from the aligned sequencing file. The output data were then normalized by removing the variance component with variance >70%, and a z-score was calculated. Afterward, the hidden Markov-based model called copy number variants (CNVs) and calculated the quality scores. Only high-quality CNVs spanning at least three exons and with a quality score ≥90 were kept and visually inspected. The remaining CNVs were further annotated with frequencies in 1000 Genome and DECIPHER databases. Only rare CNVs with a frequency ≤1 × 10−3 in 1000 Genome and DECIPHER as well as an in-cohort frequency ≤10% were kept.
Other statistical analysis
Lipid, lipoprotein, demographic, and clinical data were analyzed using PASW Statistics for Apple Macintosh (IBM Corp., New York, NY). Continuous variables were checked for normality, and those with skewed distributions mathematically transformed prior to testing. Body mass index (BMI), TC, and TG were log-transformed and LDL-C square root transformed. P-values were calculated using an unpaired t-test for parametric variables and using Fisher’s exact test for categorical variables.
Results
Clinical and other characteristics of the study participants are presented in Table 1. The study subjects were selected because they had plasma HDL-C below the 10th percentile for the GRA population. The mean level of HDL-C was 27.8 ± 6.4 mg/dl (range: 4–36 mg/dl) (Table 1) and is substantially lower than the mean value of 58.9 ± 19.9 mg/dl that we reported recently for a control cohort (
). There were a considerable number of participants (124; 60.8%) who presented with an additional dyslipidemia. Plasma levels of LDL-C and TG were used to allocate each participant to one of four clinically meaningful groups (Table 2) that emphasize these other dyslipidemias. These are: 1 Isolated Low HDL-C; 2 Low HDL-C plus high LDL-C; 3 Low HDL-C plus high TG; 4 Low HDL-C plus combined hyperlipidemia. There were 80 participants (39.2%) in group 1, 23 (11.3%) in group 2, 77 (37.7%) in group 3, and 24 (11.8%) in group 4. With respect to differences with group 1, there were significantly more females in group 4. HDL-C levels were lower in group 3, and TC higher in groups 2, 3, and 4. Also, the mean age in group 2 was significantly lower. Levels of prebeta HDL were significantly higher in the two groups with high TG (Table 2), consistent with our recent study (
P-value between females and males. P-values were calculated by t-test for parametric variables and by Fisher’s exact test for categorical variables. BMI, total cholesterol, and triglycerides were log-transformed, and LDL-cholesterol square root transformed, prior to testing.
BMI, body mass index (kg/m2); CAD, coronary artery disease; MI, myocardial infarction.
Mean values for age, BMI, and lipids are ± SD.
a Self-defined ethnicity.
b P-value between females and males. P-values were calculated by t-test for parametric variables and by Fisher’s exact test for categorical variables. BMI, total cholesterol, and triglycerides were log-transformed, and LDL-cholesterol square root transformed, prior to testing.
TC, total cholesterol; TG, triglyceride: all lipid measurements in mg/dl.
Values for age, lipids, and lipoproteins are ± SD. Prebeta HDL values are mg/dl of apoAI. P values were calculated by t-test with respect to differences with the isolated low HDL-C group. Fisher exact test was used for sex differences. Total cholesterol values were log-transformed prior to testing.
a P = 0.038.
b P = 0.003.
c P < 0.001.
d These variables were not tested as they were used as criteria in group selection.
The participants, as self-assessed, were primarily of white European ancestry (76.5%) but included those of Hispanic, East Asian, South Asian, and African-American descent. The self-assessed ethnicity percentages agree very closely with those determined by principal component analysis (Data not shown). Table 1 shows that there were significant differences in ethnicity between males and females, with more males being European and nearly three times as many females being of Hispanic ancestry. Study participants were generally overweight, and 8.6% had CHD, with more males being affected than females. Few had diabetes though a considerable number suffered from hypertension, and there were few smokers.
Candidate gene analysis
Among the whole cohort of 204 participants, after filtering against 594 lipid metabolism candidate genes (supplemental Table S1), we detected a total of 460 potentially damaging heterozygous variants (40 of which were considered to be pathogenic) and 10 that were homozygous (one of which was known to be pathogenic; LPL p.G215E) (Table 3). There were 591 individual occurrences of these variants distributed among 192 participants. All these variants are listed in supplemental Table S3. The candidate genes with the most frequent occurrences of potentially damaging variants were ABCA1 (n = 20), LDLR (n = 15), LPA (n = 14), CYP1A1 (n = 12), ABCC1 (n = 11), PDIA2 (n = 11), ABCC3 (n = 9), STAB1 (n = 9), ABCC2 (n = 8), ACACB (n = 8), APOBEC1 (n = 8), CPS1 (n = 8), OSBPL1A (n = 8), BRCA2 (n = 7), CD36 (n = 7), PCCB (n = 7), ABCA8 (n = 6), AGPAT2 (n = 6), APOB (n = 6), BCMO1 (n = 6), CREBBP (n = 6), LRP1 (n = 6), and MYL5 (n = 6). A selection of rare variants, 48 in total, were all confirmed to be real by Sanger sequencing.
Table 3Characterization of variants after post-variant calling analysis
We detected a total of 34 rare ABCA1 variants among 38 (18.6%) of the 204 individuals studied, with 20 of the variants considered to be potentially damaging or pathogenic. These 20 mutations (one person carried two ABCA1 variants) are listed in Table 4, which includes lipid and lipoprotein measurements. Six of the variants reported here are novel. Two of these were frameshifts, and one was a nonsense mutation. There was a ClinVar entry on seven of the others. Four ClinVar entries stated they were “likely benign,” and one was of “uncertain significance,” despite their high damaging prediction rating. Those participants with deleterious ABCA1 variants, when compared to noncarriers, had a lower level of LDL-C (103 ± 32 mg/dl vs. 131 ± 57; P = 0.040). There were no other significant differences in lipid or lipoprotein measurements, including prebeta HDL (4.20 ± 1.36 mg/dl vs. 4.30 ± 1.39; P = 0.802).
Table 4Twenty rare heterozygous missense, nonsense, and frameshift mutations in ABCA1
Damaging prediction: number of damaging predictions out of 10 separate variant impact prediction tools. Lipid values in mg/dl (prebeta HDL measured as apoA1 content).
Prediction
HDL-C
LDL-C
TG
TC
Prebeta HDL
European
M
9-107665929-T>C
L11P
IH1
4.84E-06
rs777372679
.
10
Probably damaging
7
137
159
165
ND
European
M
9-107646756-C>T
P85L
ECD1
1.40E-03
rs145183203
Likely benign
8.5
Probably damaging
35
103
76
153
4.14
European
F
9-107602623-A>G
K331E
ECD1
-
-
.
7.5
Probably damaging
27
107
209
176
6.14
Indian
M
9-107599797-G>A
R369H
ECD1
2.44E-05
rs370223805
.
9.5
Probably damaging
28
112
96
153
4.65
European
F
9-107599263-C>T
R437W
ECD1
4.87E-05
rs150448790
.
9.5
Probably damaging
35
110
91
156
4.2
European
M
9-107594878-C>T
R496W
ECD1
6.00E-04
rs147675550
Likely benign
7
Probably damaging
23
60
200
113
4.42
9-107574868-G>A
G1346E
IH3
1.00E-04
rs762770081
Likely pathogenic
9.5
Pathogenic
European
M
9-107593329-G>A
W590X
ECD1
-
-
.
-
Probably damaging
22
83
187
142
5.75
European
M
9-107593272-C>T
T609M
ECD1
3.25E-05
rs755276277
.
9
Probably damaging
23
61
44
88
1.33
East Asian
M
9-107587972-T>C
V845A
TMD1
3.25E-05
rs541344598
.
8
Probably damaging
36
125
77
177
3.53
European
M
9-107584879-dupTACC
R909fs
NBD1
-
-
.
-
Probably damaging
26
134
84
177
4.95
European
M
9-107583758-C>T
T953I
NBD1
-
-
.
9.5
Probably damaging
23
127
177
185
4.16
European
M
9-107578515-G>T
G1216V
R1
4.47E-05
rs562403512
.
9.5
Probably damaging
25
97
65
132
ND
Hispanic
F
9-107578437-C>T
T1242M
R1
2.03E-05
rs144923927
.
10
Probably damaging
15
36
750
214
ND
European
M
9-107574881-C>T
R1342W
IH3
1.62E-05
rs760786920
Uncertain significance
9.5
Probably damaging
4
136
275
200
ND
Mexican
F
9-107571799-C>T
L1408F
ECD2
4.00E-04
rs201879964
Likely benign
6.5
Probably damaging
35
126
195
200
ND
European
M
9-107568536-insT
L1484fs
ECD2
-
-
.
-
Probably damaging
7
121
260
177
5.79
European
M
9-107566964-C>T
T1501
ECD2
-
-
.
9.5
Probably damaging
28
73
149
130
2.22
European
M
9-107560784-G>A
R1680Q
TMD2
3.00E-04
rs150125857
Likely benign
9.5
Probably damaging
33
58
45
100
2.85
European
F
9-107556776-A>C
N1800H
TMD2
3.00E-04
rs146292819
Pathogenic
9
Pathogenic
27
145
119
196
4.71
ECD1/2, extracellular domains 1 and 2; IH1/3, intracellular helices 1 and 3; NBD1, nucleotide-binding domain 1; R1, regulatory domain 1; TMD1/2, transmembrane domains 1 and 2.
a Damaging prediction: number of damaging predictions out of 10 separate variant impact prediction tools. Lipid values in mg/dl (prebeta HDL measured as apoA1 content).
Participants with deleterious variants among one or more of the ABCC1, ABCC2, or ABCC3 transporter genes (supplemental Table S3), when compared to noncarriers, had no statistically significant differences in lipid or lipoprotein measurements, including prebeta HDL (4.19 ± 1.53 mg/dl vs. 4.31 ± 1.37; P = 0.689).
Among the 104 HDL-C candidate genes (supplemental Table S2), there was a total of 120 probably damaging variants (Fig. 1), 116 of which were heterozygous (eight considered to be pathogenic) and four were homozygous (one pathogenic). There were 110 participants who had at least one potentially damaging HDL candidate gene variant, 31 of whom had 2 and 10 had 3 each. Of these 110 individuals, 46 were among group 1 (Isolated low HDL group; Table 2), eight among group 2 (Low HDL-C plus high LDL-C), 45 among group 3 (Low HDL-C plus high TG), and 11 among group 4 (Low HDL-C plus combined hyperlipidemia). The percentages in these groups were 57.5%, 34.8%, 58.4%, and 45.8%, respectively. Hence, the frequencies were lower in the two groups with elevated levels of LDL-C (groups 2 and 4). This perhaps reflects the impact of the high frequency of damaging LDLR mutations on lowering HDL-C among these two groups.
Fig. 1Histogram showing the distribution among participants with low HDL cholesterol of 120 probably damaging rare HDL candidate gene variants. These variants were found within 45 of the 104 candidate genes. HDL, high density lipoprotein.
The 11 HDL genes with the highest occurrence of damaging variants were ABCA1 (n = 20), STAB1 (n = 9), OSBPL1A (n = 8), CPS1 (n = 8), CD36 (n = 7), LRP1 (n = 6), and ABCA8 (n = 6), GOT2 (n = 5), AMPD3 (n = 5), WWOX (n = 4), and IRS1 (n = 4) (Fig. 1). All the damaging variants among HDL candidate genes are listed in supplemental Table S4. Three probably damaging rare variants were found in the APOA1 gene, which codes for the major protein of HDL, and two in LCAT (Table 5). None have been reported in ClinVar. We noticed also that one person was homozygous for a possibly damaging LCAT SNP (p.Ser232Thr; no ClinVar entry), but the frequency here (gnomAD MAF 0.0176) is above the inclusion cutoff we used for rare heterozygous variants and the damaging prediction score was borderline at the 80th percentile cutoff. However, the frequency was considerably higher in our cohort (MAF 0.0466), with 17 heterozygotes and 1 homozygote. Data relating to this LCAT SNP are included in Table 5 (but excluded from Fig. 1). Two individuals were homozygous for LPL mutations, p.Asp36Asn (TG 295 mg/dl) and p.Gly215Glu (TG 406 mg/dl). The first of these has an equivocal entry in ClinVar, and the second is listed as pathogenic.
Two homozygous damaging mutations (frameshift and acceptor splice mutants) were discovered in the LILRA3 gene (leukocyte immunoglobulin–like receptor A3) in a person of Japanese ancestry. These were p.Leu131fs (rs201804218; gnomAD 0.0059) and c.86-1G>C (rs11574607; gnomAD 0.0208).
We found a total of 14 different rare variants in the LDLR gene among 14 participants. One variant was found in two individuals, and one other carried two variants. The subjects with these variants are listed in Fig. 2 along with lipid levels. Twelve LDLR variants are potentially deleterious mutations, with two having borderline damaging prediction evaluations and equivocal ClinVar entries. There were two frameshift, two nonsense, and one deletion mutation. The 12 individuals with LDLR mutations unequivocally classified as either pathogenic or probably damaging each have an LDL-C above 160 mg/dl and are among those with subphenotypes 2 or 3 (Table 2). Those with the 12 deleterious LDLR variants, when compared to noncarriers, had a higher level of LDL-C (254 ± 86 mg/dl vs. 120 ± 42; P < 0.001) and of TC (312 ± 84 mg/dl vs. 185 ± 90; P < 0.001). There were no significant differences in other lipid or lipoprotein measurements, including prebeta HDL (4.54 ± 1.49 mg/dl apoA1 vs. 4.28 ± 1.39; P = 0.543). Among the 47 individuals with elevated LDL (Table 2; groups 2 and 4), we found when filtering against 67 LDL candidate genes (23, 24, 26) a total of 29 potentially damaging variants (all heterozygous) in 21 of these individuals in 15 genes (supplemental Fig. S1). The LDLR mutations were by far the most numerous with 12 occurrences (not including the two with borderline scores, equivocal ClinVar entries and normal levels of LDL-C). There were two mutations in each of ABCG5, APOA1, HPR, and IRF2BP2.
Fig. 2Individuals with low levels of plasma HDL-C who carry rare, potentially damaging, LDLR variants. Twelve of these 14 variants are pathogenic, or probably so.
For the 101 individuals with elevated levels of TG, we filtered against 61 TG candidate genes (23, 24, 26). We found 32 potentially damaging variants in 21 genes among 31 individuals (supplemental Fig. S2). There were three LPL and three LRP1 mutations and two each in the LMF1, APOA1, GCKR, PPARA, INSR, PEPD, CYP26A1, ATG4C, and CAPN3 genes. Of note, in this respect, no damaging variants were seen in three key genes: APOA5, APOC2, or GPIHBP1.
Notable among the damaging mutations found in the other lipid metabolism candidate genes were those in ABCC1, ABCC2, and ABCC3 (supplemental Table S3). The total numbers of participants with damaging mutations in these three genes were 11, 8, and 9, respectively. In total, there were 26 who carried one or more of these variants. These 26 were evenly distributed across the four dyslipidemia subphenotypes (Table 2). These variants therefore were not linked with the presence of high TG or LDL-C, only with low HDL.
As part of our post variant calling analysis, we filtered for all genes with MAF <0.05 for homozygosity of damaging variants. Of note, there were two individuals with a damaging homozygous mutation (p.Arg83Gln; rs8140287) in the ISX gene, which is highly and exclusively expressed throughout the intestines. The homozygous frequency of this variant is expected to be 5-fold less than seen here in this cohort. ISX downregulates intestinal expression of SR-BI (scavenger receptor class B, type I; SCARB1) (
), an HDL receptor that mediates the selective uptake of HDL-C. Two brothers were heterozygous carriers of a rare missense mutation (c.3019C>G; p.Pro1007Ala) in the Niemann-Pick type C gene (NPC1, an HDL-C candidate gene). In the ClinVar record for this variant (rs80358257), it is described as pathogenic and linked to Niemann-Pick type C disease.
Binomial and PANTHER analysis results
Binomial analysis for rare (MAF ≤ 0.001) heterozygous D-Mis or LoF variants identified the ABCA1, LDLR, HK3, and CFTR genes with genome-wide statistical significance (Fig. 3). This was based on gene size, de novo mutability, and case-control analysis, and where there was reliable coverage in control data sets. ABCA1 and LDLR are included in our set of lipid metabolism–related candidate genes (supplemental Table S1). ABCA1 is included in our set of HDL-C candidate genes (supplemental Table S2).
Fig. 3Enrichment for damaging heterozygous gene variants. Q–Q plots comparing observed versus expected P values in participants with low HDL-C compared to controls. Binomial analysis for D-Mis and LoF variants at MAF of ≤0.0001 (A) and ≤0.001 (B). The significance of the difference between the observed and expected number of heterozygous variants was calculated using a one-sided binomial test. ABCA1 and LDLR show genome-wide threshold significance of increased burden of damaging heterozygous variants at both MAFs. In addition, HK3 and CFTR are significant at an MAF of ≤0.001. LoF, loss-of-function.
Hexokinase 3 (HK3) phosphorylates intracellular glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. Cystic fibrosis transmembrane conductance regulator (CFTR) is also known as an ATP-binding cassette, subfamily C, member 7 (ABCC7). The results for these two genes (HK3 and CFTR) are less convincing than those for ABCA1 and LDLR because of the inflation in the Q–Q plot.
The results showing the functional classification of genes by PANTHER analysis of recessive D-Mis + LoF genotypes at MAF of ≤0.001 are presented in Table 6. Several metabolism-related gene ontology (GO) terms show a large magnitude of enrichment with significant P-values. Among the top 25 terms are five related to lipid metabolism pathways.
Table 6PANTHER gene ontology analysis for 50 genes with D-Mis + LoF recessive genotypes at MAF ≤0.001
GO Biological Process Complete
# Genes in Pathway
# Genes Hit
Expected
Fold Enrichment
Raw P-Value
FDR
Genes
Striated muscle cell development (GO:0055002)
133
5
0.32
15.79
1.88E-05
4.95E-02
MYO18B, RYR1, MYH11, TTN, MYH3
Cellular response to LDL particle stimulus (GO:0071404)
Rare copy number gains and losses were examined using eXome-Hidden Markov Model (XHMM; https://atgu.mgh.harvard.edu/xhmm/) software. A total of 49 occurrences of duplications (supplemental Table S5) and 20 occurrences of deletions (supplemental Table S6) were found in the cohort. Among them, two duplications and two deletions were recurrent in two individuals. Pathway analysis using the genes that are altered in copy number did not yield any significantly over-represented GO terms (data not shown). No CNVs were found among our list of 104 HDL candidate genes (supplemental Table S2). A large, homozygous, 11,534-bp deletion, which includes the whole of exon 9, in the MTTP gene (microsomal triglyceride transfer protein) is a very likely cause of low HDL-C in one subject. This person has the phenotype of abetalipoproteinemia (TC 28 mg/dl; TG 6 mg/dl; LDL-C 6 mg/dl; HDL-C 22 mg/dl), a disorder caused by defects in this gene (
In this research discovery study, we aimed to gain a better understanding of the genetic basis of low levels of plasma HDL-C. Binomial analysis revealed four genes to carry potentially damaging rare variants (MAF ≤ 0.001) with genome-wide statistical significance among our cohort with low levels of plasma HDL-C. These were ABCA1, LDLR, HK3, and CFTR. PANTHER functional classification analysis revealed significant enrichment of lipid metabolism–related GO pathways. Although no CNVs were found for any genes on our candidate gene lists, a large deletion was found in the MTTP gene in a person with the phenotype of abetalipoproteinemia, and this is certainly the cause of his low HDL-C (22 mg/dl).
For the potentially damaging variants we discovered that were on our list of lipid metabolism–related candidate genes, the most prevalent were among ABCA1 and LDLR. A total of 38 among the 204 studied carried rare ABCA1 variants in our study (18.6%). This is somewhat lower than the 26.9% found in a recent report (
), though that cohort (n = 202) had a lower mean level of HDL-C (18 mg/dl) than our present study (27.8 ± 6.4 mg/dl; range 4–36 mg/dl). When we employed stringent cutoffs using 10 variant impact prediction tools, 19 of the 38 individuals with rare ABCA1 variants carried potentially damaging mutations (one carried 2), that is, 9.3% of the cohort of 204.
We detected a higher prevalence of damaging, rare LDLR variants (12 of 204 participants) than a recent similar study of subjects with low HDL-C (4 of 202) (
). However, in that study, the mean plasma level of LDL-C was somewhat lower than here (107 mg/dl vs. 128 mg/dl). A significant percentage (23%; 47) of our cohort had elevated LDL-C (≥160 mg/dl), and all 12 with damaging LDLR mutations fell within this group. These 12 are characterized as having a diagnosis of familial hypercholesterolemia (FH). Among 18 kindred with FH and known LDLR gene deleterious mutations in our GRA collection, we have analyzed the baseline levels of LDL-C and HDL-C for a total of 128 genotyped individuals. Among these were 54 wild type, 69 heterozygotes, and five homozygotes. LDL-C values were, respectively, 127 ± 30 mg/dl (range: 63–203), 295 ± 77 mg/dl (range: 168–509), and 817 ± 102 mg/dl (range: 719–925) (ANOVA P < 0.001). Corresponding values for HDL-C were 54.6 ± 16.7 mg/dl (range: 25–107), 47.6 ± 15.4 mg/dl (range: 20–99), and 34.4 ± 8.0 mg/dl (range: 25–45) (ANOVA P = 0.005). The HDL-C values are presented as box plots in supplemental Fig. S3. These values are similar to those reported previously (
in: Valle D.L. Antonarakis S. Ballabio A. Beaudet A.L. Mitchell G.A. Metabolic & Molecular Bases of Inherited Disease. McGraw-Hill,
New York2007: 1-122
in: Valle D.L. Antonarakis S. Ballabio A. Beaudet A.L. Mitchell G.A. Metabolic & Molecular Bases of Inherited Disease. McGraw-Hill,
New York2007: 1-122
In addition to those in ABCA1, there were numerous other probably damaging or pathogenic variants found among other HDL candidate genes. Notable were the nine occurrences in STAB1, eight in each of OSBPL1A and CPS1, and six in each of LRP1 and ABCA8. STAB1 codes for stabilin 1, a multifunctional scavenger receptor (
). A nonsense mutation in the OSBPL1A gene has previously been reported to be causal for low HDL-C and shown to decrease cellular cholesterol efflux capacity (
). Although its function in lipid metabolism is unclear, it is thought that the OSBPL proteins are phospholipid/sterol transporters with OSBPL1A, regulating interactions between the endoplasmic reticulum and the late endocytic compartment (
). It is responsible for converting ammonia to carbamyl phosphate in the liver. It is unclear how heterozygous LoF mutations can affect HDL-C levels. In GWASs, LRP1 has been reported to associate with plasma levels of HDL-C and TG (
). This gene codes for the large LDL receptor–related protein 1, which has homology to the LDLR. It is also referred to as the α2-macroglobulin receptor. LRP1 is widely expressed, notably in the liver, adipocytes, ovary, mammary gland, fibroblasts, and central nervous system (CNS). It is a multifunctional endocytic receptor thought, pertinently here, to be involved in the clearance of chylomicron remnants by the liver (
). ABCA8 (ATP-binding cassette, subfamily A, member 8) is widely expressed, notably in the heart, skeletal muscle, and liver and codes for a lipophilic xenobiotic transporter. It has recently been shown in mice to be responsible for the efflux of cholesterol and taurocholate across the hepatic sinusoidal membrane (
). In addition to these findings was the presence of the LCAT SNP, rs4986970 (p.S232T) in 18 subjects at a frequency (MAF 0.047) here higher than in GnomAD (0.017), ExAC (0.018), or TOPMED (0.017). This SNP has been previously reported to associate with HDL levels in two Danish studies (
). The prediction tools we used did not indicate that this LCAT SNP was especially damaging, though five of the 10 returned a “damaging” score. It is not reported in ClinVar. One participant, of Japanese ancestry, was homozygous for two damaging mutations in the LILRA3 gene. This gene was strongly associated with levels of HDL-C in GWASs (
). However, results of genetic studies among a Japanese population cast doubt on the extent to which these particular LILRA3 mutations here can be classed as causal (
Long-term persistence of both functional and non-functional alleles at the leukocyte immunoglobulin-like receptor A3 (LILRA3) locus suggests balancing selection.
One notable finding in these studies was the high prevalence of potentially damaging rare variants among three lipid metabolism candidate genes, all members of the ATP-binding cassette subfamily C. There were 11 occurrences with ABCC1, eight with ABCC2, and nine with ABCC3. In total, there were 26 participants who carried one or more of these variants. These 26 were evenly distributed across the four dyslipidemia subphenotypes, that is, they were not associated with the presence of high TG or LDL-C. These genes code for proteins previously known as multidrug resistance proteins, MRP1, MRP2, and MRP3. ABCC2 and ABCC3 proteins are also referred to as canalicular multispecific organic anion transporters, CMOAT and CMOAT2. As with ABCC1, these genes are expressed mainly in the liver apical canalicular membrane and act to transport conjugated anionic compounds, including conjugates of bile salts into bile. It is of further interest in this respect that there were 13 occurrences of damaging variants in CFTR among our study group with low HDL-C. This gene was not on our list of lipid candidate genes, and the variants were revealed by binomial analysis. However, it is also a member of ATP-binding cassette subfamily C and is also referred to as ABCC7.
Apart from the LDLR gene, the significance of the potentially damaging variants found among other non-HDL candidate genes is unclear. The potentially damaging variants in the LPA, PDIA2, APOBEC1, BRCA2, PCCB, AGPAT2, CREBBP, and MYL5 genes are in each case fairly evenly distributed between the four different dyslipidemia subphenotypes. The numerous CYP1A1 variants are, except for one person with high TG, found among those with isolated low HDL-C. The six participants with APOB mutants are among the isolated low HDL-C group, except for 1 with high LDL-C. The ACACB and BCMO1 mutants are distributed among the high-TG and isolated low-HDL groups.
We tested whether the plasma levels of prebeta HDL were associated with genes with the most prevalent numbers of variants, restricting our analysis to ABCA1, LDLR, and those individuals carrying one or more of the ABCC1, ABCC2, or ABCC3 transporters. In no case was there a difference in the level of prebeta HDL between carriers and noncarriers. Here all the deleterious ABCA1 rare variants were heterozygous. With homozygous cases of ABCA1 deficiency (Tangier disease), almost all the apoA1in plasma is found in the form of small prebeta HDL particles (
Because of the ubiquity of lipid metabolism in biology and the many roles of HDL beyond lipid transport, per se, it is likely that alterations in genes with roles in HDL metabolism will have impacts broadly in human biology. A large population study revealed significant increases in cancer associated with low levels of HDL, specifically, myeloma, myeloproliferative tumors, breast, lung, and nervous system cancers (
). A critical structural or functional role for the WWOX gene in the CNS would account for the observation that mutations at that locus are associated with neurodevelopmental and neurodegenerative disorders, including Alzheimer disease (
). A number of the gene loci identified with low HDL levels in this study are associated with cancer. The IRS1 (insulin receptor substrate 1) protein has a role in DNA repair and has been implicated in medulloblastoma, breast cancer, and osteosarcoma (
). WWOX has been identified as a tumor-suppressive gene. WWOX deficiency has a role in the expression of the estrogen receptor and is associated with triple-negative breast cancer (
). Clearly, LoF mutations in genes with oncoprotective roles would be expected to increase the risk of malignancy. However, gain of function in tumor promotor genes could also be oncogenic. WWOX, as a fragile gene, is likely to undergo deletions and rearrangements that make it a prominent candidate for diverse cancers. The demonstration of a significant relationship of diminished HDL levels to cancer in a population study suggests many of the functional roles of HDL in cell replication remain to be discovered. Their discovery holds promise of new venues of treatment for cancer. The prominent roles of lipid metabolism in the CNS indicate that there will be important impact on neurocognitive and neurodegenerative disorders.
In conclusion, we wished to establish a better understanding of the causes of inherited low levels of plasma HDL-C. Our aim was to provide a valuable resource for compiling a catalog of gene variants that are deemed potentially damaging or pathogenic. We detected at least one damaging mutation in an HDL candidate gene in 110 participants, a little over half of the study cohort. Clearly there must be either other genes involved or variation in promoters, enhancer, silencer elements, etc., not detectable in our study, that play a role. A limitation of this study with regard to the candidate gene approach is that because our goal was to determine the relative burden of deleterious variants in a panel of HDL candidate genes among individuals with low HDL-C and we only undertook exome sequencing of such individuals, we were not able to compare the prevalence of mutations among a cohort considered to have normal levels. It must also be noted that while the whole-exome sequencing methodology employed here has some power to detect structural variants (CNVs), it often lacks sensitivity. More advanced techniques such as Linked-Read whole-genome sequencing allow for much higher success in structural variant detection as we have recently found in other studies (
). A number of the gene loci identified here have significant roles in cell biology and are also associated with cancer and neurodegenerative diseases, providing promising venues for the molecular understanding of these disorders and possible roles for HDL in their etiology.
Data availability
All relevant data are contained within the manuscript.
The authors declare that they have no conflicts of interest with the contents of this article.
Acknowledgments
We wish to thank all participants for their cooperation and willingness to participate in this study. This study makes use of data generated by the DECIPHER community. A full list of centers who contributed to the generation of the data is available from https://decipher.sanger.ac.uk/about/stats and via email from [email protected]
Author contributions
J. P. K., R. P. L., W. D., Y. L., C. R. P., K. H. Y. W., M. L.-S., and P.-Y. K. conceptualization; W. D., W.-C. H., M. L., B. L., S. C. J., J. C., F. L.-G., D. V., A. P., C. C., R. L., M. B., H. Z., I. M., and Y. L. methodology; W. D., K. H. Y. W., M. L.-S., C. R. P., M. L., S. C. J., H. Z., and J. C. formal analysis; J. P. K., and M. J. M. investigation; R. P. L., J. P. K., and P.-Y. K. resources; W. D., C. R. P., and I. M. data curation; C. R. P., W. D., and K. H. Y. W. writing - original draft; C. R. P., W. D., K. H. Y. W., M. L.-S., J. P. K., R. P. L., and M. J. M. writing - review & editing; C. R. P., I. M., and W. D. supervision; R. P. L., J. P. K. and P.-Y. K. project administration; R. P. L., J. P. K. and P.-Y. K. funding acquisition.
Funding and additional information
Funding for the DECIPHER project was provided by Wellcome, United Kingdom. This work was supported by the Joseph Drown Foundation and the Campini Foundation, United States and by gifts from Peter Read, Harold Dittmer, Susan Boeing, and Donald Yellon. K. H. Y. W was supported by the National Institutes of Health (NIH), United States under award R01 HG005946 to P.-Y. K. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The Yale Center for Mendelian Genomics is funded by U54 HG006504 granted to R. P. L.
National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III)
Third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) final report.
Population Studies Data Book: Vol. I, The Prevalence Study. U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health (NIH publication number 80-1527),
Washington, DC1980
Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
in: Valle D.L. Antonarakis S. Ballabio A. Beaudet A.L. Mitchell G.A. Metabolic & Molecular Bases of Inherited Disease. McGraw-Hill,
New York2007: 1-122
Long-term persistence of both functional and non-functional alleles at the leukocyte immunoglobulin-like receptor A3 (LILRA3) locus suggests balancing selection.