Multiple susceptibility loci at chromosome 11q23.3 are associated with plasma triglyceride in East Asians[S]

Genetic studies of plasma TG levels have identified associations with multiple candidate loci on chromosome11q23.3, which harbors a number of genes, including BUD13, ZNF259, and APOA5-A4-C3-A1. This study aimed to examine whether these multiple candidate genes on the 11q23.3 regions exert independent effects on TG levels or whether their effects are confounded by linkage disequilibrium (LD). We performed a genome-wide association study and consequent fine-mapping analyses on TG levels in two Korean population-based cohorts: the Korea Association Resource study (n = 8,223) and the Healthy Twin study (n = 1,735). A total of 301 loci reached genome-wide significance level in pooled analysis, including 10 SNPs with weak LD (r2 < 0.06) clustered on 11q23.3: ApoA5 (rs651821, rs2075291); ZNF259 (rs964184, rs603446); BUD13 (rs11216126); Apoa4 (rs7396851); SIK3 (rs12292858); PCSK7 (rs199890178); PAFAH1B2 (rs12420127), and SIDT2 (rs2269399). When the inter-dependence between alleles was examined using conditional models, five loci on BUD13, ZNF259, and ApoA5 showed possible independent associations. A haplotype analysis using five SNPs revealed both hyper- and hypotriglyceridemic haplotypes, which are relatively common in Koreans (haplotype frequency 0.08–0.22). Our findings suggest the presence of multiple functional loci on 11q23.3, which might exert their effects on plasma TG level independently or through complex interactions between functional loci.

We used the segmented haplotype estimation and imputation tool, SHAPEIT2, which scales linearly with the number of haplotypes to effi ciently construct haplotypes ( 21 ) for determining phase, and then conducted imputation using IMPUTE2 ( 22 ). The 1000 Genomes Project's haplotypes phase I in NCBI build 37 (hg19) of Asian ancestry were used as a reference panel, and the markers with imputation quality score greater than 0.9 were used for fi nemapping studies. The imputation process resulted in 4,166,520 and 4,174,873 markers in discovery (KARE) and replication (HT) samples.

Statistical analysis
The plasma TG level was adjusted for age, sex, and BMI using a linear regression model for association tests. For identical twins, we chose one twin randomly. For familial relationship, we corrected genetic correlation arising from polygenic background by decorrelating kinship coeffi cients calculated from the genetic markers. The genome-wide signifi cant level for adjusting the type I error level was defi ned from original genotyped marker sets (352,228) to be P = 1.41E-07 after Bonferroni correction. All locations on a physical map are referred to build 37 of the human genome reference map ( hg19).

Dissecting the 11q23.3 region
Within the 11q23.3 region, we focused on chromosome 11:116,600,001-117,100,000 where previous GWAS fi ndings were localized ( 16,17,23,24 ). In order to identify the independent effects of candidate loci on TG, we used the following methods. First, we selected markers which showed genome-wide signifi cance level in the discovery set and tested whether they were replicated in validation set. Second, for those selected markers, we examined LD ( r 2 < 0.06) and chose one top hit in each separate LD block (non-LD markers). Third, we performed a series of conditional analyses by adding one non-LD marker into the linear regression model as a covariate and tested all remaining non-LD SNPs for the association with TG level. If a locus showed persistent association with TG with adjustment of other top hit SNPs, we considered it as being a reasonable candidate for independent susceptibility loci regulating TG level. Finally, we conducted a haplotype analysis to further dissect the 11q23.3 region. For the haplotype analysis, we used HAPLOVIEW (version 4.0), which uses an accelerated expectation maximization algorithm to calculate haplotype frequencies (http://www.broad.mit.edu/mpg/ haploview/). The effect of each haplotype on TG level was determined using the haplo.stats package (version 1.4.4) operated in the R language (version 2.14, available at http://www.r-project.org), which implements the expectation-maximization algorithm . In addition, the multiplicative interactions between two SNPs were tested by including both SNPs (assuming a general model) and an interaction term (product of two main effects) in a logistic regression model in PLINK ( 25 ).

RESULTS
The overall study design is provided in supplementary Fig. 1. The characteristics of the study participants are given in Table 1 . The KARE participants were older and had higher BMI than those of the HT. The mean plasma TG level was higher in the KARE population (1.80 mmol/l) than in the HT participants (1.30 mmol/l).
The enrichment of markers for the 11q23.3 region (114,600,001-121,300,000 bp as GRCh38) by imputation increased the number of SNPs or single nucleotide variants previous studies have indicated multiple independent susceptibility loci of hypertriglyceridemia ( 13,14 ), and to examine the possibility of the existence of more than one causal variant in this region .

Subjects
Two population cohorts of Koreans with genome-wide genetic marker information were involved in this study; the Korea Association Resource study (KARE), consisting of 8,842 individuals, and the Healthy Twin study (HT) with 3,079 individuals ( 19,20 ). The KARE served as discovery data for associated loci and validation was performed using the HT data.
The KARE is a prospective cohort study started in 2001 as a part of the Korean Genome Epidemiology Study in the rural Ansung and suburban Ansan areas, which recruited 10,038 healthy participants aged 40-69 years at baseline. Both base areas are located in Gyeonggi Province, close to Seoul, the capital of the Republic of Korea. Information on the health status and healthrelated behaviors of the participants was collected through a standardized questionnaire. Blood samples were drawn from an antecubital vein after more than 8 h of fasting. Serum TG level was measured using standard enzymatic method in a centralized laboratory. Details of the KARE are described in previous reports ( 13,15 ). The genome-wide genetic marker information was available for a total of 8,842 individuals. After excluding individuals who had a diagnosis of diabetes or were using lipid-lowering agents (n = 619), 8,223 individuals (3,858 males and 4,365 females, with 82 related participants) were included for analysis.
The HT of Korea is also a component of the Korean Genome Epidemiology Study, a cohort of adult same-sex twin pairs older than age 30 and their fi rst-degree family members, who have been recruited since 2005. Most protocols are shared between the KARE and the HT, and more detailed protocols are available in previous literature ( 20 ). A total of 3,079 study participants (n = 1,217 men, n = 1,862 women, from 17-81 years of age) were enrolled since 2005; 531 monozygotic twin pairs, 120 dizygotic twin pairs, and 1,777 non-twin family members in 661 families. The genotype data were available for a total of 1,857 individuals. Of these 1,857 individuals, those who reported using lipid-lowering drugs or had a diagnosis of diabetes (n = 122) were excluded, resulting in 1,735 individuals (672 males and 1,063 females) ranging in age from 17 to 81 years (mean = 44.1 years; SD = 13 years). Institutional Review Boards at Seoul National University, Seoul Samsung Hospital, and Busan Inje Baik Hospital approved the conduct of the cohort studies used in this study.

Genotyping, quality control, and imputation
Both studies performed genome-wide dense SNP marker analysis; the KARE using Affymetrix Genome-Wide Human SNP Array GeneChip version 5.0 and the HT using the Affymetrix Genome-Wide Human SNP Array GeneChip version 6.0. For the KARE, markers violating the Hardy-Weinberg equilibrium ( P < 10E-06, 38,364 markers), genotype call rates below 95% (17,926 markers), and minor allele frequency (MAF) < 0.01 (92,050 markers) were deleted, leaving 352,228 markers for the subsequent analysis. For the HT, markers with Hardy-Weinberg equilibrium <0.001, MAF <0.01, and genotype missing rate >0.05 were excluded (n = 292,653); using family relationship, markers violating Mendelian consistency in more than three families (n = 11,456) or multimarker errors resembling close double-recombination were further deleted (n = 47,594); leaving 537,159 markers.
When we performed a two-way conditional analysis for 10 SNPs, markers on the proximal 11q23.3 region ( BUD13 , ZNF259 , and ApoA5 ) did not materially affect or was affected their genome-wide signifi cance levels by mutual adjustment ; while the markers in the distal region ( SIK3 , PCSK7 , PAFAH1B2 , and SIDT2 ) showed a decrease in their signifi cance levels ( Table 2 ). The distal part loci genes showed a marked decrease in their signals below regional signifi cance level (301 markers by Bonferroni correction; P = 0.00016) on mutual adjustment . Interestingly, the adjustment for the effect of rs2075291/ ApoA5 resulted in stronger signals for rs964184/ ZNF259 (from P = 1.3E-29 in its single marker models to P = 9.6E-40 in conditional analysis). Similarly, when adjusted for the effect of rs603446/ ZNF259 , the statistical signifi cance of rs11216126/ BUD13 became stronger (from 8.9E-15 to 4.6E-32).

DISCUSSION
Our fi ndings from conditional analysis suggest that multiple variants in the 11q23.3 genomic region may be independently associated with TG, and one more locus in the distal region might be another candidate . When we fi xed each top hit marker in the series conditional models, most loci showed a decrease in strength of association, but some loci showed even stronger association, particularly in the models between loci in the proximal 11q23.3 region. This paradoxical increase in the signifi cance level from subsequent haplotype analysis corresponds with the negative interaction effects in gene-gene interaction analysis. The haplotype analysis in the proximal region shows two haplotypes that increase TG level. These two hypertriglyceridemic haplotypes, ACCAC and ACGCC , are characterized by the presence of allele C of rs651821/ ApoA5 , but have different changes in ZNF259 (rs964184 C>G) or ApoA5 (rs2075291 C>A). Conventional genetic analysis based on single SNPs might underestimate or even mislead an association when multiple functional variants are present on a haplotype including the locus, particularly when the directions of the functional loci confl ict ( 27 ). The substitutions of allele C>G at the rs603446/ ZNF259 and A>C at the rs11216126/ BUD13 are expected to lower TG based on our fi ndings. Despite weaker LD between the SNPs ( r 2 < 0.06), alleles with different directions cosegregate in the haplotype which might dilute the effect of each SNP; hypotriglyceridemic allele G at rs603446/ ZNF259 cosegregates with hypertriglyceridemic allele A of rs11216126/ BUD13 ; likewise, the TG lowering allele C at rs1121626/ BUD13 coexists with the C allele of rs603446/ ZNF259 which increases TG . Thus, previous reports on these loci might represent diluted effects. We believe that the functional variants captured by the haplotype might have different mutational origin, if we consider the most common ACCCT haplotype as a wild-type. Additionally, two haplotypes were found to be protective of hypertriglyceridemia (CCCCT and ATCCT), and these protective haplotypes might have resulted from two diverging changes from the wild-type haplotypes. In a similar vein, this fi nding also suggests that the variants captured by these protective haplotypes might be independent. The presence of both harmful and protective haplotypes in the 11q23.3 region is supported by the results of interaction analysis. Findings from interaction analyses suggest that two variants, both tagged by China were reported to develop severe hypertriglyceridemia with mean TG levels of 25.89 ± 5.05 mmol/l ( 29 ). The rare allele at SNP site rs964184 ( ZNF259 ) has been reported to be associated with TG in both European and Asian populations ( 8,11,16 ). The MAF of rs964184 in the Korean population is 0.218, while it is less than 1% in Caucasians (supplementary Table 4).
Many of the top-hit SNPs in our study have been reported in previous studies on TG genetics. Two GWASs conducted in Korea, the KARE project and the Health Examinee (HEXA), both showed that the rs603446/ ZNF259 is strongly associated with TG ( 13 ). SNP rs7396851 ( Apoa4) was also associated with plasma TG in an isolated Pacifi c Islander Kosrae population ( 30 ).
LD structure comparisons across populations show that LD blocks in the 11q23.3 region in Koreans and Asians are stronger and larger than other populations (supplementary Fig. 3). The strong LD structure of Koreans confers advantage in reliably constructing haplotypes which tag functional loci. In Europeans, who have weaker LD across the 11q23.3 region, multiple independent associations with TG level are not shown . If rare variants infl uence the TG level near the ApoA5 gene region, poor LD structure in the European ancestry population may not allow SNP markers to tag the association (supplementary Fig. 3). Here, we show that haplotypes "ACCAC" and "ACGCC" were associated with increased TG in our Korean study samples ( Table 3 ). However, these haplotypes are not reconstructed in the 1000 Genomes Pilot I CEU population because the genotyping on rs2075291 was not available.
We reviewed public databases to see whether the markers on the 11q23.3 region have been reported to modify the gene expression level, throughout literature and geneexpression quantitative loci (eQTL) databases. The recent report of Yao et al. ( 31 ) suggested cis -regulation between rs964184 ( ZNF259 ) and PCSK7 , SIDT2 , TAGLN , and BUD13 , and trans -regulation of ZNF259 (rs964184) with TMEM165 , YPEL5 , PPM1B , and OBFC2A , suggesting that the effect of rs964184 on TG is mediated through PCSK7 . Further, this is in line with an analysis of our previous eQTL study in 10 metabolic traits conducted in Korean populations; rs651821 was defi ned as the eQTL-SNP hot spot of TAGLN with signifi cant eQTL excess ( 32 ). It was also reported that rs603446 ( ZNF259 ) has cis -regulatory effects over the expression of adiposity-associated genes in NCBI's eQTL database ( 26 ).
ApoA5 (rs651821 and rs2075291) gene, have strong synergistic effects to increase TG level. Additionally, interaction analysis indicates that the joint effect between ApoA5 (rs2075291) and ZNF259 (rs964184), albeit less signifi cant, plays a role in increasing the TG level. A negative interaction was also found between ApoA5 and BUD13 , which is consistent with the results from conditional models (paradoxical increase in signifi cance levels) or haplotype analysis (both protective and harmful haplotypes). In summary, we believe that the fi ndings from different analyses converge on the presence of multiple functional loci on the 11q23.3 region. Our fi ndings also suggest the proximal part might have a more complex network, with both agonistic and antagonistic regulations, than the distal part.
We did not include SNPs in the distal region of 11q23.3 in haplotype or interaction analysis, but our fi ndings of conditional analysis suggest that another variant in the distal region ( SIK3 , PCSK7 , PAFAH1B2 , and SIDT2 ) affects plasma TG as well. Thus, all four loci in the distal 11q23.3 region show very similar patterns of changes in the conditional analysis; associations with TG are not substantially affected, or even become stronger, by adjusting proximal region variants, while the associations decrease dramatically upon considering another locus within the distal part of 11q23.3. Thus, we consider that the possibility of another candidate variant on the distal region ( SIK3-PCSK7-PAFAH1B2-SIDT2 gene complex) is not suffi cient, at least by this study alone.
It is beyond the scope of our study to pinpoint the functional variants or to fi gure out exactly how many genes are functioning in TG regulation. It would be, however, a logical inference from our fi ndings that at least one harmful and one protective variant in the proximal region are participating in TG regulation. ApoA5 and ZNF259 are the most probable genes hosting these variants. Studies involving people of different ancestries indicate that variants on ApoA5 and ZNF259 (rs651821, rs2075291, and rs964184) are associated with plasma TG levels ( 6,16,24,28 ). Particularly, rs2075291 ( ApoA5 ) is a nonsynonymous mutation in the coding region (Gly185Cys) mainly found in Asian populations. This "Asian-specifi c" phenomenon is in part caused by the difference in MAF across populations (only 1% MAF in Caucasians and 7% in Koreans) (supplementary Table 4). Individuals homozygous for rs2075291 (reported as TT genotype, in fact AA as forward strand ) in The "haplo.glm" function implemented in the "haplo.stats" R package was used to calculate the coeffi cient ␤ and P values for each haplotype compared with the reference haplotype, which was set as the ACCCT haplotype. The same covariates used for genotype analysis were applied in haplotype analysis. Haplotype analysis was performed in 8,223 unrelated individuals in the KARE. Alleles in haplotypes were presented in order of polymorphisms rs11216126/ BUD13 , rs964184/ ZNF259 , rs603446/ ZNF259 , rs2075291/ ApoA5 , and rs651821/ ApoA5 . CI, confi dence interval . Changes of nucleotides are underlined.