Genetic meta-analysis of 15,901 African Americans identifies variation in EXOC3L1 is associated with HDL concentration.

Meta -analyses of European populations has suc-cessfully identiﬁ ed genetic variants in over 150 loci associated with lipid levels, but results from additional ethnicities remain limited. Previously, we reported two novel lipid loci identiﬁ ed in a sample of 7,657 African Americans using a gene-centric array including 50,000 SNPs in 2,100 candidate genes. Initial discovery and follow-up of signals with P < 10 (cid:2) 5 in additional African American samples conﬁ rmed CD36 and ICAM1 . Using an additional 8,244 African American female samples from the Women’s Health Initiative SNP Health Association Resource genome-wide association study dataset, we further examined the previous meta-analyses results by attempting to replicate 20 additional putative lipid signals with P < 10 (cid:2) 4 . Replication conﬁ rmed rs868213, located in a splice donor region of exocyst complex component 3-like 1 ( EXOC3L1 ) as a novel signal for HDL (additive allelic effect (cid:3) = 0.02; P = 1.4 × 10 (cid:2) 8 ; meta-analyses of discovery plink2) ( 11 ). Principal components were calculated using Eigen-strat ( 12 ). Sensitivity analysis including and excluding covariates did not alter our ﬁ ndings. Meta-analyses were performed by two independent analysts using a ﬁ xed-effect inverse-variance ap-proach in two different software packages, MANTEL (http://www. broadinstitute.org/~debakker/mantel.html) ( 13 ) and METAL (http://www.sph.umich.edu/csg/abecasis/metal/) ( 14 ). Results were highly concordant, reﬂ ecting a robust data analyses pipeline.

IBC array genotypes and phenotypes (see Table 1 ). Summary level statistics for SNPs selected for replication were provided from a second independent cohort of African American females genotyped from the WHI SNP Health Association Resource (SHARe) GWAS dataset, termed here WHI-SHARe. There was no sample overlap between the WHI and WHI-SHARe genotyping initiatives. Further details of the populations, genotyping, and quality control analysis have been published elsewhere ( 6,8 ). All participants provided informed written consent. Institutional Review Boards of each Candidate Gene Association Resource (CARe) cohort reviewed and approved the cohort's interaction with CARe. The study described here was approved by the Committee on the Use of Humans as Experimental Subjects of the Massachusetts Institute of Technology.
All participants were greater than 21 years of age. Lipid phenotypes were taken from baseline or fi rst measurements for all fasting individuals, as described in the original reports. All measurements were converted to millimoles per liter, with TC and HDL-C measurements converted from milligrams per deciliter by dividing by 38.67, and TG measurements converted from milligrams per deciliter by dividing by 88.57. TG and HDL values were log-transformed to satisfy normality. LDL-C was calculated according to the Friedewald formula: L ‫ف‬ TC Ϫ HDL Ϫ kTG, where k is 0.45 for millimoles per liter (or 0.20 if measured in milligrams per deciliter) ( 10 ). If TG values were >4.51 mmol/l (>400 mg/dl), then LDL-C was treated as a missing value.

Statistical methods
Gender stratifi ed association analysis was performed in each participating study using an additive genetic model including age, sex, type 2 diabetes diagnosis, body mass index, and smoking history as covariates, as well as adjusting for 10 principal components of ancestry in PLINK (https://www.cog-genomics.org/ plink2) ( 11 ). Principal components were calculated using Eigenstrat ( 12 ). Sensitivity analysis including and excluding covariates did not alter our fi ndings. Meta-analyses were performed by two independent analysts using a fi xed-effect inverse-variance approach in two different software packages, MANTEL (http://www. broadinstitute.org/~debakker/mantel.html) ( 13 ) and METAL (http://www.sph.umich.edu/csg/abecasis/metal/) ( 14 ). Results were highly concordant, refl ecting a robust data analyses pipeline.

African American meta-analysis
Pruning of discovery cohort meta-analysis results for independence from previously reported loci and 10 Ϫ 4 < P <10 Ϫ 5 yielded three SNPs associated with plasma TC, three SNPs associated with LDL-C, six SNPs associated with HDL-C, and seven SNPs associated with TG (20 total lipid-SNP associations). Each of these SNPs was carried forward for replication in 8,244 WHI-SHARe African American females.
The results for all 20 SNPs in the discovery and WHI-SHARe replication samples are presented in Table 2 . In the replication sample, six SNPs were associated with the same lipid trait and with the same direction of effect as the discovery report with P < 0.05 ( Table 2 ). The strongest association observed was rs868213 near exocyst complex component 3-like 1 ( EXOC3L1 ) ( P = 4.4 × 10 Ϫ 4 ) with increased HDL concentration ( ␤ = 0.014). Meta-analysis of discovery and replication studies led to a genome-wide signifi cant Greater than 150 lipid-associated loci have been discovered using genome-wide association studies (GWASs) mainly based on individuals of European ancestry ( 1,2 ). The number of variants discovered and the power to detect variants of lower allele frequency and effect size is directly proportional to the number of participants in the study, demonstrated by the success of gradually larger meta-analyses ( 1 ). Genetic variants generally appear to exert similar effects across ethnicities if tested in a suffi ciently powered cohort, but differences in allele frequency and regional linkage disequilibrium (LD) between ethnicities allows for the identifi cation of novel contributors to lipid metabolism (3)(4)(5). While efforts to identify additional variants associated with blood lipids in African-American populations have been successful ( 3,(6)(7)(8), the identifi cation of novel genetic loci in nonEuropean populations has been hampered by inadequate power due to limited sample size and genotyping platform designs made for optimal SNP coverage of European populations ( 3,9 ).
The Institute for Translational Medicine and Therapeutics (ITMAT)-Broad-Candidate Gene Association Resource (IBC) array [also referred to as the CardioChip or HumanCVD Beadchip (Illumina)] was specifi cally designed to capture ‫ف‬ 50,000 SNPs across ‫ف‬ 2,100 loci, with pathways known or postulated to have roles in lipid traits captured at higher density than conventional GWAS arrays ( 9 ). SNPs were specifi cally selected from African HapMap samples to be included on the array to ensure coverage of African populations ( 9 ). Results of genetic associations with total cholesterol (TC), LDL cholesterol (LDL-C), HDL cholesterol (HDL-C), and TG concentrations in European-derived cohorts using the IBC array, as well as meta-analysis of multi-ethnic cohorts, have been published ( 2,6 ). Follow-up replication of signals with P < 10 Ϫ 5 from the discovery cohort in African American samples confi rmed two previously unreported lipid loci ( CD36 and ICAM1 ) ( 6 ), which have since been validated independently ( 8 ).
We sought to improve the power to detect novel signals in African American populations by increasing sample size for meta-analyses using additional independent study samples. A meta-analysis of the original six African American populations (n = 7,657) and the replication population (n = 8,244) yielded a total sample size of 15,901. SNPs selected for replication in additional African Americans had 10 Ϫ 4 < P < 10 Ϫ 5 in our previous analysis ( 6 ) and independence from known lipid-related loci calculated by a minor allele frequency-specifi c r 2 threshold (r 2 < 0.3 if minor allele frequency > 1%, r 2 < 0.6 if minor allele frequency < 1 %).

Sample collection
The six discovery cohorts [Atherosclerosis Risk in Communities (ARIC), Coronary Artery Risk Development in Young Adults (CARDIA), Cardiovascular Health Study (CHS), Multi-Ethnic Study of Atherosclerosis (MESA), Women's Health Initiative (WHI), and Jackson Heart Study (JHS)] were included from our previous report ( 6 ), and all studies contributed individual-level large-scale public lipid GWAS datasets. The IBC lipid genetics meta-analysis found a nonstatistically signifi cant association of the minor allele of rs868213 with increased HDL in 66,240 Europeans ( ␤ = 0.0071; P = 0.10) ( 2 ). The larger Global Lipid Genetics Consortium (http://www. sph.umich.edu/csg/abecasis/public/lipids2013/) found that in 92,817 individuals of European ancestry, the minor allele of rs868213 was associated with an increase of HDL ( ␤ = 0.054; P = 1.01 × 10 Ϫ 7 ) ( 1 ). Other nearby SNPs were genome wide signifi cantly associated with HDL (rs2233455, ␤ = 0.056; P = 2.27 × 10 Ϫ 14 , 12 kb away), but the locus was not identifi ed as distinct from the associations seen with SNPs in LCAT ( Fig. 1 ), likely due to higher levels of LD signal at the EXOC3L1 locus ( P = 1.4 × 10 Ϫ 8 ) for association with HDL. Located on the long arm of chromosome 16, rs868213 is at position 67,220,457 (Hg19 build37). The minor allele of rs868213 is observed in the studied African American cohorts at 43%, while HapMap populations indicate allele frequencies of 2% in JPT (Japanese in Tokyo), 4% in CHB (Han Chinese in Beijing), 6% in CEPH (Utah residents with Northern and Western European ancestry), 8% in MXL (people with Mexican ancestry in Los Angeles, CA), and 52% of YRI (Yoruba in Ibadan, Nigeria).
We then assessed whether the lower minor allele frequency of rs868213 in European populations could account for the lack of HDL-rs868213 association from accessible loss of residues 429 to 461 ( 17 ), though functional consequences of the deletion remain unknown.

DISCUSSION
Building upon previous genetic association studies, we attempted to replicate nominally associated variants in a larger meta-analysis including 15,901 African Americans. The most interesting fi nding was the genome-wide signifi cant association of rs868213 with HDL-C. Found in a gene-rich region 800,000 bp upstream of LCAT , SNPs surrounding rs868213 were found to be associated with HDL in meta-analyses of European populations, but were overlooked due to LD with LCAT SNPs. Higher rs868213 allele frequency and lower LD between rs868213 and LCAT SNPs in African Americans increased power to detect the novel observed in European populations. The lead LCAT SNP, rs255052, is >800 kb upstream and in minimal LD with rs868213 in African Americans (r 2 = 0.158). An additional SNP, rs3729639, found 5 kb away and in strong LD with rs868213 in African Americans, was found associated with HDL ( ␤ = 0.09; P = 1.9 × 10 Ϫ 11 ) in a CARe African American GWAS, but was not identifi ed as independent of LCAT ( 15 ).
To further assess the independence of the rs868213-HDL association from upstream LCAT in an African American population, a secondary analysis was performed with the minor allele dosage of rs255052 included as a covariate in the model. The lead SNP in the LCAT locus, rs255052, is prevalent in both European and African American populations and has been consistently associated with HDL concentrations. The rs868213-HDL association remained signifi cant despite correction for rs255052 (discovery P = 0.048; replication P = 0.086; meta-analysis P = 0.0080).

In silico analysis of rs868213 and EXOC3L1
EXOC3L1 is highly expressed in developing human umbilical vein endothelial cells, and siRNA studies have shown that the gene plays an important role in tubular network formation ( 16 ). In silico bioinformatic evaluation of rs868213 identifi ed that it alters a putative splice site consensus site of EXOC3L1 . Signifi cant changes in splicing regulatory protein binding are expected with the nucleotide change produced by rs868213, according to information theory analysis (supplementary Fig. 1). Thus, we reviewed ENCODE RNA-seq data for evidence of alternative splicing and possible exon skipping within EXOC3L1 . In four out of thirteen cell lines evaluated, including HeLa (cervical carcinoma), MCF7 (breast tumor), HCT6 (HCT116; colon tumor), and NHLF (lung fi broblasts) cell lines, alternative splicing of the exon immediately following rs868213 was observed (supplementary Fig. 2). Exclusion of the exon would remove residues 429 to 461 of the EXOC3L1 protein. UniProt annotation indicates that amino acids 1-370 are required for interaction with other components of the exocyst complex and should be maintained despite the ethnicities of additional non-genome-wide signifi cant variants suggests many variants and genes are yet to be discovered ( 1 ). Each associated locus is also likely to include multiple common, uncommon, and rare variants which independently contribute to the variance of the trait. Moreover, multiple different genes within a gene-rich locus may be contributing to trait variation. Collection of the broad swath of variants and incorporation into suitable models will undoubtedly improve our predictive abilities and explain "missing heritability." As technological development reduces the cost and increases the resolution of genetic information acquisition, efforts to obtain and biobank samples will need to focus upon improving the representation of nonEuropean populations.
In conclusion, the current meta-analysis of 15,901 African Americans identifi ed an association between rs868213 in EXOC3L1 and HDL-C.
The Candidate Gene Association Resource (CARe) Consortium wishes to acknowledge the support of the National Heart, Lung, and Blood Institute and the contributions of the research in stitutions, study investigators, fi eld staff, and study participants. The authors thank the contributors and patients of the WHI-SHARe project, a long-term national health study. The WHI-SHARe GWAS had data cleaning and harmonization performed at the Fred Hutchinson Cancer Research Center in Seattle, WA. association with HDL in the current study. Maintenance of the rs868213-HDL association after correction for the effect of lead LCAT SNP rs255052 further supports an independent mechanism.
Bioinformatic analysis identifi ed that rs868213 is located within an EXOC3L1 putative splice site consensus site, with variable exon skipping observed within RNA sequencing data, supporting a potential functional consequence. Extensive analysis revealed little evidence for an eQTL explaining the observed association. High-throughput genome confi rmation capture analysis of the LD region extending around rs868213 indicated no evidence for physical interaction between rs868213 and the LCAT locus, further supporting independence of the association from LCAT . EXOC3L1 is required for the exocytosis of insulin granules from pancreatic ␤ cells, and is differentially methylated and expressed in patients with type 2 diabetes ( 20 ). Barkefors et al. ( 21 ) reported that the close homolog EXOC3L2 is also expressed extensively in the endothelium, and siRNA studies demonstrate that EXOC3L2 silencing inhibits VEGF receptor 2 phosphorylation and VEGFAdirected migration of cultured endothelial cells. Although EXOC3L1 and other members of the exocyst complex are unprecedented as drug targets, druggable binding pockets were located in EXOC3L1 using structure-based evaluation via the DoGSiteScorer tool ( 22 ), indicating the potential for investigational pharmacological modifi cation of EXOC3L1 activity.
After evaluation of all data across the rs868213 locus, we found EXOC3L1 to be the most likely candidate on the basis of both functional impact and biological plausibility . EXOC3L1 is a component of the exocyst complex, an evolutionarily conserved multisubunit protein complex implicated in molecular traffi cking and tethering secretory vesicles to the plasma membrane ( 23 ). The exocyst complex has an important role in polarized vesicular traffi cking of transmembrane proteins, specifi cally including the LDL receptor ( 24 ) and the HDL scavenger receptor class B member I (SR-BI) ( 25 ). Considering all the available genetic and bioinformatic evidence, we propose that variation in EXOC3L1 function, particularly in the vascular endothelium where this gene is abundantly expressed, could lead to altered exocyst function impacting HDL homeostasis.
The current study raises three important points for interpretation of future genetic association studies in lipid traits: 1 ) large GWAS meta-analysis results are publicly available and should be used as a reference and in silico replication when examining putative regions; 2 ) as power to detect variants of modest effect size has increased, researchers should be exhaustive in their attempts to explain differences observed between studies; and perhaps most importantly 3 ) due to smaller haplotype blocks and additional areas of recombination, African populations can be very helpful for identifying additional signals within large GWAS signals from Europeans .
Should we make efforts to continue to grow population datasets from additional ethnicities? Despite greater than 150 lipid loci identifi ed in meta-analyses of Europeans, the concordance of direction of effect between cohorts and