Whole-exome sequencing reveals damaging gene variants associated with hypoalphalipoproteinemia

Low levels of high density lipoprotein-cholesterol (HDL-C) are associated with an elevated risk of arteriosclerotic coronary heart disease. Heritability of HDL-C levels is high. In this research discovery study, we used whole-exome sequencing to identify damaging gene variants that may play significant roles in determining HDL-C levels. We studied 204 individuals with a mean HDL-C level of 27.8 ± 6.4 mg/dl (range: 4–36 mg/dl). Data were analyzed by statistical gene burden testing and by filtering against candidate gene lists. We found 120 occurrences of probably damaging variants (116 heterozygous; four homozygous) among 45 of 104 recognized HDL candidate genes. Those with the highest prevalence of damaging variants were ABCA1 (n = 20), STAB1 (n = 9), OSBPL1A (n = 8), CPS1 (n = 8), CD36 (n = 7), LRP1 (n = 6), ABCA8 (n = 6), GOT2 (n = 5), AMPD3 (n = 5), WWOX (n = 4), and IRS1 (n = 4). Binomial analysis for damaging missense or loss-of-function variants identified the ABCA1 and LDLR genes at genome-wide significance. In conclusion, whole-exome sequencing of individuals with low HDL-C showed the burden of damaging rare variants in the ABCA1 and LDLR genes is particularly high and revealed numerous occurrences in HDL candidate genes, including many genes identified in genome-wide association study reports. Many of these genes are involved in cancer biology, which accords with epidemiologic findings of the association of HDL deficiency with increased risk of cancer, thus presenting a new area of interest in HDL genomics.

In conclusion, wholeexome sequencing of individuals with low HDL-C showed the burden of damaging rare variants in the ABCA1 and LDLR genes is particularly high and revealed numerous occurrences in HDL candidate genes, including many genes identified in genome-wide association study reports. Many of these genes are involved in cancer biology, which accords with epidemiologic findings of the association of HDL deficiency with increased risk of cancer, thus presenting a new area of interest in HDL genomics.
Supplementary key words dyslipidemia • genetics • HDL • LDL • lipoproteins • prebeta-1 HDL • reverse cholesterol transport • triglycerides Clinical arteriosclerotic coronary heart disease (CHD) is a multifactorial disorder. Circulating lipid and lipoprotein levels, notably, triglycerides (TGs), low density lipoproteins (LDLs), and high density lipoproteins (HDLs), are independent risk factors, and other components such as TG-rich remnants and prebeta-1 HDL play important roles too. Here we report the results of a research discovery genomic study designed to understand better the genetic causes of inherited hypoalphalipoproteinemia, low levels of plasma HDLcholesterol (HDL-C). Over forty years ago, epidemiologic studies established that low levels of HDL-C are independently associated with an elevated risk of CHD and carotid disease (1)(2)(3)(4). Those participants in the Framingham Heart Study with HDL-C levels below 35 mg/dl had eight times the prevalence of CHD compared to those with levels above 65 mg/dl (4). Other studies, including the Lipid Research Clinics Prevalence Mortality Follow-up Study, the Coronary Primary Prevention Trial, and the Multiple Risk Factor Intervention Trial (MRFIT) study, supported these findings (5). The mechanisms underlying the antiatherogenic properties of HDL-C are not fully understood. That reverse cholesterol transport (RCT) (6) is likely a major contributor here is inferred by studies that have shown cholesterol efflux capacity to be associated with atherosclerotic cardiovascular disease risk (7). It is now appreciated that the level of HDL-C itself is not an indication of the precise number of HDL particles, their functionality, the distribution of the numerous HDL subspecies, or the rate of RCT (8). Most cells, macrophages especially, efflux cholesterol to the RCT retrieval pathway to maintain cholesterol homeostasis. This process involves the ATP-binding cassette transporters ABCA1, ABCG1, and the scavenger receptor scavenger receptor class B, type 1 (9).
HDL also has a signaling role that is important to endothelial and platelet function, lymphocyte trafficking, and angiogenesis that involves HDL-bound sphingosine 1-phosphate (10)(11)(12). Other functions of HDL are emerging, including its protective role in inflammation (13) and its antioxidant properties (14). The extent to which a low level of HDL-C, resulting from known genetic causes, is a direct risk factor for atherosclerotic disease has become controversial recently with some authors doubting the genetic link (15,16), while others have challenged this interpretation (17).
It has been long established that the heritability (H 2 ) of HDL-C concentration in plasma is high, at about 0.67 (18,19). Thus, it was considered that genetic variation that resulted in a decreased level of HDL-C might be a risk factor for CHD. Since that time, much effort has been undertaken to discover the underlying genetic variations that associate with HDL-C levels. Studies in both humans and animals initially revealed several genes that contribute to the variance in levels of HDL-C. These include the apolipoprotein A-1 gene (APOA1), ABCA1, lecithin cholesterol acyltransferase (LCAT), phospholipid transfer protein (PLTP), cholesteryl ester transfer protein (CETP), and hepatic lipase (LIPC) (20). Additional genes more recently found to be associated with HDL-C are procollagen C-endopeptidase enhancer 2 (PCOLCE2) (21) and WW domain-containing oxidoreductase (WWOX) (22). Genome-wide association studies (GWASs) in large populations have now revealed numerous additional candidates, though it is important to point out that the effect sizes of many individual gene variants are often small (23)(24)(25)(26)(27). Often, mutations in genes such as lipoprotein lipase (LPL), for example, that result in a sizable elevation of TGs are associated also with decreased HDL-C. Much of the lowering of HDL-C here is due to the well-known entropically driven transfer of cholesteryl esters, facilitated by CETP, from the core of HDL particles to TGrich lipoproteins (28). This transfer manifests as a hyperbolic inverse relationship between levels of TG and HDL-C.
We performed whole-exome sequencing on 204 individuals with low levels of HDL-C. These were selected from 21,639 individuals in the UCSF Genomic Resource in Arteriosclerosis (GRA) (29,30) for whom lipid panel data, clinical and demographic data, and DNA were available. Because, as it has been pointed out (17), a significant number (∼40%) of those with low levels of HDL-C have secondary causes, most notably hypertriglyceridemia, we selected participants taking into account the known inverse hyperbolic relationship between TG and HDL-C (28).
The sequencing data were analyzed by two separate approaches. Firstly, statistical gene burden testing was used to identify potentially causal rare variants, particularly from novel genes. Secondly, after preliminary filtering to identify rare coding variants, we filtered against two lists of candidate genes. The first list consisted of 594 lipid metabolism related genes, and this included all 23 lipid-related genes sequenced by Geller and colleagues in their study of subjects with low HDL-C (17). Our list also included all the gene loci investigated in a similar recent resequencing study (31). The other list was a subset from the first list and comprised 104 genes well established as being involved in HDL metabolism or associated robustly with levels of HDL-C in GWASs. We then used 10 deleteriousness assessment tools and the ClinVar public archive to determine the degree of probability that a particular variant was functionally damaging and therefore likely to be disease causing.
Our aim in these studies was to provide a valuable resource for future cardiovascular research by compiling a list of rare gene variants that have a high likelihood of being functionally damaging and in many cases clearly pathogenic. We believe this study is unique in applying exome sequencing specifically to a cohort with HDL deficiency. Others have used targeted sequenced of limited numbers of candidate genes, and we have compared our findings to these previous studies (17,31).

Study participants
A total of 204 individuals (70 women and 134 men) who had been recruited into the GRA (29,30) were included in this study. Because this was a discovery study, no formal power calculation was done. At the time blood samples were collected, none of the individuals analyzed in this study were taking a lipid-altering medication. They were selected based on a plasma level of HDL-C below the 10th percentile of the GRA population, after adjusting for sex, and plasma level of TG based on the known hyperbolic inverse relationship between levels of HDL-C and TG (28). Briefly, using data on a total of 4,140 GRA subjects, we derived a hyperbolic 10th percentile isopleth function that takes the form y = m1 + ((m2*m3)/(m2 + x)), where y is HDL-C, x is TG, and m1, m2, and m3 are constants. Participants completed a questionnaire to document medical history, clinically important lifestyle factors, and family history. Many of those included in this study attended a tertiary lipid clinic and had other dyslipidemias. Study participants were subdivided into four groups based on subphenotypes of dyslipidemia. We used a cutoff for defining hypertriglyceridemia as recommended by the National Cholesterol Education Program Adult Treatment Panel III (1); TG ≥150 mg/dl. Participants were considered to have high LDL-C if the value was ≥160 mg/dl, except in the case of four participants under 18 years old where the 90th percentile sex-and age-adjusted values were used (32). There were five kinships included among the 204 participants. There were two sib pairs, one mother daughter pair, and two father son pairs. All study participants gave written informed consent prior to their enrollment in the study, which adhered to the World Medical Association Declaration of Helsinki and was approved by the UCSF Institutional Review Board as part of the UCSF Human Research Protection Program. Children were included with parental consent.

DNA preparation and biochemical analyses
Blood samples were collected, after an overnight fast, in tubes containing 0.1% EDTA. Genomic DNA was extracted using the Wizard purification kit (Qiagen, Germantown, MD). Plasma was obtained after centrifugation at 3,000 rpm for 20 min at 4 • C. Levels in plasma of total cholesterol (TC), HDL-C, and TG were measured using an automated chemical analyzer (COBAS Chemistry analyzer) as previously described (29,33). HDL-C was determined after precipitation of apolipoprotein-B-containing lipoproteins with dextran sulfate and magnesium (34). Levels of LDL-C were calculated using the Friedewald method (35) when TG levels were below 400 mg/dl. For some participants, including five with TG concentrations above 400 mg/dl, LDL-C was determined by sequential ultracentrifugation. Levels in plasma of prebeta HDL, measured as apoA1 protein content, were determined as previously described (36) .

Whole-exome sequencing
Genomic DNA samples were sequenced at the Yale Center for Genome Analysis (n = 192) or at the UCSF Genomics CORE (n = 12) as previously described (37,38).
Samples sequenced at Yale were exon-captured using IDt xGen target capture kit followed by 99 base-paired-end sequencing on the Illumina platform.
Samples sequenced at the UCSF were exon-captured by Roche NimbleGen SeqCap EZ library probe, and the captured libraries were sequenced on the HiSeq2500. Processing of image files was performed using a standard protocol. Raw image files were analyzed and converted to base calls by realtime analysis using the recommended default settings. Realtime analysis output base call files (*.bcl) were converted to FASTQ files with consensus assessment of sequence and variation using bcl2fastq pipeline.
Quality control, read alignment, and variant calling Blue Collar Bioinformatics (Bcbio v1.0.0) was used to run the QC pipeline. We used Burrows-Wheeler Aligner (39) (BWA v0.7.15) paired-end mode to map reads to the human reference genome (hg19). BAM files obtained were used in subsequent steps. Picard (http://broadinstitute.github.io/ picard/) (v2.5.0) was used to sort mapped reads into coordinate order and to ensure all mate-pair information was properly updated. Picard was also used to mark duplicate and low-quality reads, defined as those with low mapping quality score, that were unpaired or unmapped, or that failed a platform/vendor quality check.  (50). Annovar output was filtered further based on a list of common gene variants provided with the Ingenuity Variant Analysis suite. The top 50th percent of these were removed from our call set. We then filtered by gnomAD minor allele frequency (MAF). MAF cutoff threshold of 0.01 was used to identify heterozygous variants with 0.05 for the homozygous variants. For platform-specific variants, we removed those found in more than 40% of the samples. This remaining variant set was used for the "candidate gene analysis" described in the following.
For binomial analysis, variants were called using GATK Haplotype Caller and annotated using Annovar. High-quality variants which passed GATK variant quality score calibration with a read depth (DP) ≥8, a genotype quality score (GQ) ≥20, a mapping quality (MQ) ≥40, at least three supporting reads, and not falling in the low-complexity regions were kept (51). Rare variants with MAF ≤1E-05, 1E-04 or 1E-03 in ExAC, 1000 Genome, and NHLBI Exome Sequencing Project databases were examined. Damaging variants including the loss-of-function (LoF) variants and damaging missense (D-Mis) variants were considered for the analysis. LoF variants are defined as stop-gain, stoploss, frameshift insertions/deletions, splice site disruption, and start-loss. D-Mis variants are nonsynonymous variants predicted as deleterious by MetaSVM (52).

Final variant assessment
Candidate gene analysis. For the candidate gene analysis approach, we evaluated the NGS data after the post-variant calling analysis to determine the degree to which individual variants were causative, firstly in 594 candidate genes including those in lipid metabolism pathways plus GWAS hits (supplemental Table S1). This is an expanded list of genes that we previously reported (53). We also filtered using a subset of 104 of these genes that have been associated with plasma levels of HDL-C (24)(25)(26)54) or that are empirically related (supplemental Table S2). In both cases, we separately evaluated the data for heterozygous variants and for homozygous variants. Rare variants for those with elevated either LDL-C or TG were separately also filtered against lists of genes associated with levels of LDL or TG as appropriate (24,25).
Because this was a research discovery study, and we were more interested to look at the overall deleterious gene burden in this cohort with low HDL-C, and it was not designed to provide individual clinical assessments, we did not strictly follow the American College of Medical Genetics (ACMG) guidelines (55). Rather, we developed an overall damaging prediction based on 10 separate variant impact prediction tools and used an overall 80th percentile cutoff to retain deleterious variants. Those between the 50th and 80th percentiles were considered to be of undetermined significance and the rest (<50th percentile) likely benign. Of 66 variants with a "pathogenic" entry in ClinVar, by this approach, we classified only six as likely benign. Those variants with a damaging prediction over the 50th percentile and an unequivocal ClinVar pathogenic entry were considered pathogenic. The 10 algorithms we used were SIFT, PolyPhen (HVAR), MutationTaster, MutationAssessor, FATHMM, PROVEAN, METASVM, MetaLR, MCAP, and FATHMM-MKL.
Binomial analysis. A one-tailed binomial test was used to examine the enrichment of an observed number of damaging dominant variants in each gene in comparison with expectation as described before (37). The expected number of variants was calculated based on de novo mutability through the following formula: where 'i' denotes the 'ith' gene and 'N' denotes the total number of damaging dominant variants. The de novo mutability for each type of variant in each gene was calculated based on trinucleotide contexts as described by Samocha et al. (56) and was adjusted by base-pair coverage of 204 exome sequencing data in our cohort.
PANTHER analysis. For genes with recessive genotypes at MAF of ≤0.001, we used PANTHER (Protein Analysis Through Evolutionary Relationships) gene ontology analysis (57) to functionally classify the genes. We used Fisher's exact test with false discovery rate correction to compare our list with the PANTHER human database.
Copy number variant analysis. Copy number variations were called using XHMM (eXome-Hidden Markov Model) software (58). GATK DepthOfCoverage was first used to calculate mean read coverage from the aligned sequencing file. The output data were then normalized by removing the variance component with variance >70%, and a z-score was calculated. Afterward, the hidden Markov-based model called copy number variants (CNVs) and calculated the quality scores.
Only high-quality CNVs spanning at least three exons and with a quality score ≥90 were kept and visually inspected. The remaining CNVs were further annotated with frequencies in 1000 Genome and DECIPHER databases. Only rare CNVs with a frequency ≤1 × 10 −3 in 1000 Genome and DECIPHER as well as an in-cohort frequency ≤10% were kept.
Other statistical analysis. Lipid, lipoprotein, demographic, and clinical data were analyzed using PASW Statistics for Apple Macintosh (IBM Corp., New York, NY). Continuous variables were checked for normality, and those with skewed distributions mathematically transformed prior to testing. Body mass index (BMI), TC, and TG were log-transformed and LDL-C square root transformed. P-values were calculated using an unpaired t-test for parametric variables and using Fisher's exact test for categorical variables.

RESULTS
Clinical and other characteristics of the study participants are presented in Table 1. The study subjects were selected because they had plasma HDL-C below the 10th percentile for the GRA population. The mean level of HDL-C was 27.8 ± 6.4 mg/dl (range: 4-36 mg/dl) ( Table 1) and is substantially lower than the mean value of 58.9 ± 19.9 mg/dl that we reported recently for a control cohort (59). There were a considerable number of participants (124; 60.8%) who presented with an additional dyslipidemia. Plasma levels of LDL-C and TG were used to allocate each participant to one of four clinically meaningful groups ( Table 2) that emphasize these other dyslipidemias. These are: 1 Isolated Low HDL-C; 2 Low HDL-C plus high LDL-C; 3 Low HDL-C plus high TG; 4 Low HDL-C plus combined hyperlipidemia. There were 80 participants (39.2%) in group 1, 23 (11.3%) in group 2, 77 (37.7%) in group 3, and 24 (11.8%) in group 4. With respect to differences with group 1, there were significantly more females in group 4. HDL-C levels were lower in group 3, and TC higher in groups 2, 3, and 4. Also, the mean age in group 2 was significantly lower. Levels of prebeta HDL were significantly higher in the two groups with high TG (Table 2), consistent with our recent study (59) in which the cohort was stratified according to the same dyslipidemia criteria.
The participants, as self-assessed, were primarily of white European ancestry (76.5%) but included those of Hispanic, East Asian, South Asian, and African-American descent. The self-assessed ethnicity percentages agree very closely with those determined by principal component analysis (Data not shown). Table 1 shows that there were significant differences in ethnicity between males and females, with more males being European and nearly three times as many females being of Hispanic ancestry. Study participants were generally overweight, and 8.6% had CHD, with BMI, body mass index (kg/m 2 ); CAD, coronary artery disease; MI, myocardial infarction.
Mean values for age, BMI, and lipids are ± SD. a Self-defined ethnicity. b P-value between females and males. P-values were calculated by t-test for parametric variables and by Fisher's exact test for categorical variables. BMI, total cholesterol, and triglycerides were log-transformed, and LDL-cholesterol square root transformed, prior to testing. more males being affected than females. Few had diabetes though a considerable number suffered from hypertension, and there were few smokers.

Candidate gene analysis
Among the whole cohort of 204 participants, after filtering against 594 lipid metabolism candidate genes (supplemental Table S1), we detected a total of 460 potentially damaging heterozygous variants (40 of which were considered to be pathogenic) and 10 that were homozygous (one of which was known to be pathogenic; LPL p.G215E) ( Table 3). There were 591 individual occurrences of these variants distributed among 192 participants. All these variants are listed in supplemental Table S3 We detected a total of 34 rare ABCA1 variants among 38 (18.6%) of the 204 individuals studied, with 20 of the variants considered to be potentially damaging or pathogenic. These 20 mutations (one person carried two ABCA1 variants) are listed in Table 4, which includes lipid and lipoprotein measurements. Six of the variants reported here are novel. Two of these were frameshifts, and one was a nonsense mutation. There was a ClinVar entry on seven of the others. Four ClinVar entries stated they were "likely benign," and one was of "uncertain significance," despite their high damaging prediction rating. Those participants with deleterious ABCA1 variants, when compared to noncarriers, had a lower level of LDL-C (103 ± 32 mg/dl vs. 131 ± 57; P = 0.040). There were no other significant differences in lipid or lipoprotein measurements, including prebeta HDL (4.20 ± 1.36 mg/dl vs. 4.30 ± 1.39; P = 0.802).
Among the 104 HDL-C candidate genes (supplemental Table S2), there was a total of 120 probably damaging variants (Fig. 1), 116 of which were heterozygous (eight considered to be pathogenic) and four were homozygous (one pathogenic). There were 110 participants who had at least one potentially damaging HDL candidate gene variant, 31 of whom had 2 and 10 had 3 each. Of these 110 individuals, 46 were among group 1 (Isolated low HDL group; Table 2), eight among group 2 (Low HDL-C plus high LDL-C), 45 among group 3 (Low HDL-C plus high TG), and 11 among group 4 (Low HDL-C plus combined hyperlipidemia). The percentages in these groups were 57.5%, 34.8%, 58.4%, and 45.8%, respectively. Hence, the   frequencies were lower in the two groups with elevated levels of LDL-C (groups 2 and 4). This perhaps reflects the impact of the high frequency of damaging LDLR mutations on lowering HDL-C among these two groups. The 11 HDL genes with the highest occurrence of damaging variants were ABCA1 (n = 20), STAB1 (n = 9), OSBPL1A (n = 8), CPS1 (n = 8), CD36 (n = 7), LRP1 (n = 6), and ABCA8 (n = 6), GOT2 (n = 5), AMPD3 (n = 5), WWOX (n = 4), and IRS1 (n = 4) (Fig. 1). All the damaging variants among HDL candidate genes are listed in supplemental Table S4. Three probably damaging rare variants were found in the APOA1 gene, which codes for the major protein of HDL, and two in LCAT (Table 5). None have been reported in ClinVar. We noticed also that one person was homozygous for a possibly damaging LCAT SNP (p.Ser232Thr; no ClinVar entry), but the frequency here (gnomAD MAF 0.0176) is above the inclusion cutoff we used for rare heterozygous variants and the damaging prediction score was borderline at the 80th percentile cutoff. However, the frequency was considerably higher in our cohort (MAF 0.0466), with 17 heterozygotes and 1 homozygote. Data relating to this LCAT SNP are included in Table 5 (but excluded from Fig. 1). Two individuals were homozygous for LPL mutations, p.Asp36Asn (TG 295 mg/dl) and p.Gly215Glu (TG 406 mg/dl). The first of these has an equivocal entry in ClinVar, and the second is listed as pathogenic.
We found a total of 14 different rare variants in the LDLR gene among 14 participants. One variant was found in two individuals, and one other carried two variants. The subjects with these variants are listed in Fig. 2 along with lipid levels. Twelve LDLR variants are potentially deleterious mutations, with two having borderline damaging prediction evaluations and equivocal ClinVar entries. There were two frameshift, two nonsense, and one deletion mutation. The 12 individuals with LDLR mutations unequivocally classified as either pathogenic or probably damaging each have an LDL-C above 160 mg/dl and are among those with subphenotypes 2 or 3 ( Table 2). Those with the 12    Fig. S1). The LDLR mutations were by far the most numerous with 12 occurrences (not including the two with borderline scores, equivocal ClinVar entries and normal levels of LDL-C). There were two mutations in each of ABCG5, APOA1, HPR, and IRF2BP2.
Notable among the damaging mutations found in the other lipid metabolism candidate genes were those in ABCC1, ABCC2, and ABCC3 (supplemental Table S3). The total numbers of participants with damaging mutations in these three genes were 11, 8, and 9, respectively. In total, there were 26 who carried one or more of these variants. These 26 were evenly distributed across the four dyslipidemia subphenotypes ( Table 2). These variants therefore were not linked with the presence of high TG or LDL-C, only with low HDL.
As part of our post variant calling analysis, we filtered for all genes with MAF <0.05 for homozygosity of damaging variants. Of note, there were two individuals with a damaging homozygous mutation (p.Arg83Gln; rs8140287) in the ISX gene, which is highly and exclusively expressed throughout the intestines. The homozygous frequency of this variant is expected to be 5-fold less than seen here in this cohort. ISX downregulates intestinal expression of SR-BI (scavenger receptor class B, type I; SCARB1) (60), an HDL receptor that mediates the selective uptake of HDL-C. Two brothers were heterozygous carriers of a rare missense mutation (c.3019C>G; p.Pro1007Ala) in the Niemann-Pick type C gene (NPC1, an HDL-C candidate gene). In the ClinVar record for this variant (rs80358257), it is described as pathogenic and linked to Niemann-Pick type C disease.

Binomial and PANTHER analysis results
Binomial analysis for rare (MAF ≤ 0.001) heterozygous D-Mis or LoF variants identified the ABCA1, LDLR, HK3, and CFTR genes with genome-wide statistical significance (Fig. 3). This was based on gene size, de novo mutability, and case-control analysis, and where there was reliable coverage in control data sets. ABCA1 and LDLR are included in our set of lipid metabolism-related candidate genes (supplemental Table S1). ABCA1 is included in our set of HDL-C candidate genes (supplemental Table S2).
Hexokinase 3 (HK3) phosphorylates intracellular glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. Cystic fibrosis transmembrane conductance regulator (CFTR) is also known as an ATP-binding cassette, subfamily C, member 7 (ABCC7). The results for these two genes (HK3 and CFTR) are less convincing than those for ABCA1 and LDLR because of the inflation in the Q-Q plot.
The results showing the functional classification of genes by PANTHER analysis of recessive D-Mis + LoF genotypes at MAF of ≤0.001 are presented in Table 6. Several metabolism-related gene ontology (GO) terms show a large magnitude of enrichment with significant P-values. Among the top 25 terms are five related to lipid metabolism pathways.

Copy number variants
Rare copy number gains and losses were examined using eXome-Hidden Markov Model (XHMM; https:// atgu.mgh.harvard.edu/xhmm/) software. A total of 49 occurrences of duplications (supplemental Table S5) and 20 occurrences of deletions (supplemental Table S6) were found in the cohort. Among them, two duplications and two deletions were recurrent in two individuals. Pathway analysis using the genes that are altered in copy number did not yield any significantly over-represented GO terms (data not shown). No CNVs were found among our list of 104 HDL candidate genes (supplemental Table S2). A large, homozygous, 11,534-bp deletion, which includes the whole of exon 9, in the MTTP gene (microsomal triglyceride transfer protein) is a very likely cause of low HDL-C in one subject. This person has the phenotype of abetalipoproteinemia (TC 28 mg/dl; TG 6 mg/dl; LDL-C 6 mg/ The significance of the difference between the observed and expected number of heterozygous variants was calculated using a onesided binomial test. ABCA1 and LDLR show genome-wide threshold significance of increased burden of damaging heterozygous variants at both MAFs. In addition, HK3 and CFTR are significant at an MAF of ≤0.001. LoF, loss-of-function. dl; HDL-C 22 mg/dl), a disorder caused by defects in this gene (61).

DISCUSSION
In this research discovery study, we aimed to gain a better understanding of the genetic basis of low levels of plasma HDL-C. Binomial analysis revealed four genes to carry potentially damaging rare variants (MAF ≤ 0.001) with genome-wide statistical significance among our cohort with low levels of plasma HDL-C. These were ABCA1, LDLR, HK3, and CFTR. PANTHER functional classification analysis revealed significant enrichment of lipid metabolism-related GO pathways. Although no CNVs were found for any genes on our candidate gene lists, a large deletion was found in the MTTP gene in a person with the phenotype of abetalipoproteinemia, and this is certainly the cause of his low HDL-C (22 mg/dl).
For the potentially damaging variants we discovered that were on our list of lipid metabolism-related candidate genes, the most prevalent were among ABCA1 and LDLR. A total of 38 among the 204 studied carried rare ABCA1 variants in our study (18.6%). This is somewhat lower than the 26.9% found in a recent report (17), though that cohort (n = 202) had a lower mean level of HDL-C (18 mg/dl) than our present study (27.8 ± 6.4 mg/dl; range 4-36 mg/dl). When we employed stringent cutoffs using 10 variant impact prediction tools, 19 of the 38 individuals with rare ABCA1 variants carried potentially damaging mutations (one carried 2), that is, 9.3% of the cohort of 204.
We detected a higher prevalence of damaging, rare LDLR variants (12 of 204 participants) than a recent similar study of subjects with low HDL-C (4 of 202) (17). However, in that study, the mean plasma level of LDL-C was somewhat lower than here (107 mg/dl vs. 128 mg/ dl). A significant percentage (23%; 47) of our cohort had elevated LDL-C (≥160 mg/dl), and all 12 with damaging LDLR mutations fell within this group. These 12 are characterized as having a diagnosis of familial hypercholesterolemia (FH). Among 18 kindred with FH and known LDLR gene deleterious mutations in our GRA collection, we have analyzed the baseline levels of LDL-C and HDL-C for a total of 128 genotyped individuals. Among these were 54 wild type, 69 heterozygotes, and five homozygotes. LDL-C values were, respectively, 127  though it has been suggested recently that it may be connected to impaired RCT (63,64). In addition to those in ABCA1, there were numerous other probably damaging or pathogenic variants found among other HDL candidate genes. Notable were the nine occurrences in STAB1, eight in each of OSBPL1A and CPS1, and six in each of LRP1 and ABCA8. STAB1 codes for stabilin 1, a multifunctional scavenger receptor (65), and associates with HDL levels in GWASs (24,25). A nonsense mutation in the OSBPL1A gene has previously been reported to be causal for low HDL-C and shown to decrease cellular cholesterol efflux capacity (66). Although its function in lipid metabolism is unclear, it is thought that the OSBPL proteins are phospholipid/sterol transporters with OSBPL1A, regulating interactions between the endoplasmic reticulum and the late endocytic compartment (66). CPS1, which codes for carbamoyl phosphate synthetase I, associates with HDL levels in GWASs (24,25). It is responsible for converting ammonia to carbamyl phosphate in the liver. It is unclear how heterozygous LoF mutations can affect HDL-C levels. In GWASs, LRP1 has been reported to associate with plasma levels of HDL-C and TG (24,26). This gene codes for the large LDL receptor-related protein 1, which has homology to the LDLR. It is also referred to as the α2-macroglobulin receptor. LRP1 is widely expressed, notably in the liver, adipocytes, ovary, mammary gland, fibroblasts, and central nervous system (CNS). It is a multifunctional endocytic receptor thought, pertinently here, to be involved in the clearance of chylomicron remnants by the liver (67). ABCA8 (ATP-binding cassette, subfamily A, member 8) is widely expressed, notably in the heart, skeletal muscle, and liver and codes for a lipophilic xenobiotic transporter. It has recently been shown in mice to be responsible for the efflux of cholesterol and taurocholate across the hepatic sinusoidal membrane (68). Others have reported that ABCA8 colocalizes with ABCA1 and that mutations in ABCA8 are associated with lower levels of plasma HDL-C (69). In addition to these findings was the presence of the LCAT SNP, rs4986970 (p.S232T) in 18 subjects at a frequency (MAF 0.047) here higher than in GnomAD (0.017), ExAC (0.018), or TOPMED (0.017). This SNP has been previously reported to associate with HDL levels in two Danish studies (16). The prediction tools we used did not indicate that this LCAT SNP was especially damaging, though five of the 10 returned a "damaging" score. It is not reported in ClinVar. One participant, of Japanese ancestry, was homozygous for two damaging mutations in the LILRA3 gene. This gene was strongly associated with levels of HDL-C in GWASs (24)(25)(26)54). However, results of genetic studies among a Japanese population cast doubt on the extent to which these particular LILRA3 mutations here can be classed as causal (70).
One notable finding in these studies was the high prevalence of potentially damaging rare variants among three lipid metabolism candidate genes, all members of the ATP-binding cassette subfamily C. There were 11 occurrences with ABCC1, eight with ABCC2, and nine with ABCC3. In total, there were 26 participants who carried one or more of these variants. These 26 were evenly distributed across the four dyslipidemia subphenotypes, that is, they were not associated with the presence of high TG or LDL-C. These genes code for proteins previously known as multidrug resistance proteins, MRP1, MRP2, and MRP3. ABCC2 and ABCC3 proteins are also referred to as canalicular multispecific organic anion transporters, CMOAT and CMOAT2. As with ABCC1, these genes are expressed mainly in the liver apical canalicular membrane and act to transport conjugated anionic compounds, including conjugates of bile salts into bile. It is of further interest in this respect that there were 13 occurrences of damaging variants in CFTR among our study group with low HDL-C. This gene was not on our list of lipid candidate genes, and the variants were revealed by binomial analysis. However, it is also a member of ATPbinding cassette subfamily C and is also referred to as ABCC7.
Apart from the LDLR gene, the significance of the potentially damaging variants found among other non-HDL candidate genes is unclear. The potentially damaging variants in the LPA, PDIA2, APOBEC1, BRCA2, PCCB, AGPAT2, CREBBP, and MYL5 genes are in each case fairly evenly distributed between the four different dyslipidemia subphenotypes. The numerous CYP1A1 variants are, except for one person with high TG, found among those with isolated low HDL-C. The six participants with APOB mutants are among the isolated low HDL-C group, except for 1 with high LDL-C. The ACACB and BCMO1 mutants are distributed among the high-TG and isolated low-HDL groups.
We tested whether the plasma levels of prebeta HDL were associated with genes with the most prevalent numbers of variants, restricting our analysis to ABCA1, LDLR, and those individuals carrying one or more of the ABCC1, ABCC2, or ABCC3 transporters. In no case was there a difference in the level of prebeta HDL between carriers and noncarriers. Here all the deleterious ABCA1 rare variants were heterozygous. With homozygous cases of ABCA1 deficiency (Tangier disease), almost all the apoA1in plasma is found in the form of small prebeta HDL particles (71).
Because of the ubiquity of lipid metabolism in biology and the many roles of HDL beyond lipid transport, per se, it is likely that alterations in genes with roles in HDL metabolism will have impacts broadly in human biology. A large population study revealed significant increases in cancer associated with low levels of HDL, specifically, myeloma, myeloproliferative tumors, breast, lung, and nervous system cancers (72). A critical structural or functional role for the WWOX gene in the CNS would account for the observation that mutations at that locus are associated with neurodevelopmental and neurodegenerative disorders, including Alzheimer disease (73). A number of the gene loci identified with low HDL levels in this study are associated with cancer. The IRS1 (insulin receptor substrate 1) protein has a role in DNA repair and has been implicated in medulloblastoma, breast cancer, and osteosarcoma (74). There are many metabolic activities associated with LRP-1. It is also involved in tumorogenesis and tumor progression (75,76). STAB1 has an association with bladder cancer and acute myelogenous leukemia (77). Because CPS1 has a reputed antioncogenic role, a damaging mutation might promote oncogenesis (78). CD36 is upregulated in many cancer cell types (79,80). ABCA8 inhibits proliferation and metastasis of hepatocellular carcinoma (81). AMPD3 is highly expressed in gastrointestinal stromal tumors (82) and is upregulated in lung cancers (83). ABCA1 mutations are associated with chronic myelogenous leukemia (84). WWOX has been identified as a tumor-suppressive gene. WWOX deficiency has a role in the expression of the estrogen receptor and is associated with triple-negative breast cancer (85). WWOX contains fragile elements that are susceptible to translocations and deletions (86). Clearly, LoF mutations in genes with oncoprotective roles would be expected to increase the risk of malignancy. However, gain of function in tumor promotor genes could also be oncogenic. WWOX, as a fragile gene, is likely to undergo deletions and rearrangements that make it a prominent candidate for diverse cancers. The demonstration of a significant relationship of diminished HDL levels to cancer in a population study suggests many of the functional roles of HDL in cell replication remain to be discovered. Their discovery holds promise of new venues of treatment for cancer. The prominent roles of lipid metabolism in the CNS indicate that there will be important impact on neurocognitive and neurodegenerative disorders.
In conclusion, we wished to establish a better understanding of the causes of inherited low levels of plasma HDL-C. Our aim was to provide a valuable resource for compiling a catalog of gene variants that are deemed potentially damaging or pathogenic. We detected at least one damaging mutation in an HDL candidate gene in 110 participants, a little over half of the study cohort. Clearly there must be either other genes involved or variation in promoters, enhancer, silencer elements, etc., not detectable in our study, that play a role. A limitation of this study with regard to the candidate gene approach is that because our goal was to determine the relative burden of deleterious variants in a panel of HDL candidate genes among individuals with low HDL-C and we only undertook exome sequencing of such individuals, we were not able to compare the prevalence of mutations among a cohort considered to have normal levels. It must also be noted that while the whole-exome sequencing methodology employed here has some power to detect structural variants (CNVs), it often lacks sensitivity. More advanced techniques such as Linked-Read wholegenome sequencing allow for much higher success in structural variant detection as we have recently found in other studies (38,87). A number of the gene loci identified here have significant roles in cell biology and are also associated with cancer and neurodegenerative diseases, providing promising venues for the molecular understanding of these disorders and possible roles for HDL in their etiology.

Data availability
All relevant data are contained within the manuscript.