|
Advertisement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Journal of Lipid Research, Vol. 46, 1416-1425, July 2005 Genome-wide linkage analyses and candidate gene fine mapping for HDL3 cholesterol: the Framingham Study
* Department of Biostatistics, Boston University, Boston, MA 02118 Published, JLR Papers in Press, April 1, 2005. DOI 10.1194/jlr.M400382-JLR200
1 To whom correspondence should be addressed. e-mail: qyang{at}bu.edu
High density lipoprotein cholesterol (HDL-C) is inversely associated with coronary heart disease and has a genetic component; however, linkage to HDL-C is not conclusive. Subfractions of HDL, such as HDL3-C, may be better phenotypes for linkage studies. Using HDL3-C levels measured on 907 Framingham Heart Study subjects from 330 families around 1987, we conducted a genome-wide variance components linkage analysis with 401 microsatellite markers spaced 10 centimorgan (cM) apart. Nine candidate genes were identified and annotated using a bioinformatics approach in the region of the highest linkage peak. Twenty-eight single nucleotide polymorphisms (SNPs) were selected from these candidate genes, and linkage and family-based association fine mapping were conducted using these SNPs. The highest multipoint log-of-the-odds (LOD) score from the initial linkage analysis was 3.7 at 133 cM on chromosome 6. Linkage analyses with additional SNPs yielded the highest LOD score of 4.0 at 129 cM on chromosome 6. Family-based association analysis revealed that SNP rs2257104 in PLAGL1 at 143 cM was associated with multivariable adjusted HDL3 (P = 0.03). Further study of the linkage region and exploration of other variants in PLAGL1 are warranted to define the potential functional variants of HDL-C metabolism.
Supplementary key words high density lipoprotein high density lipoprotein 3 microsatellite marker single nucleotide polymorphism PLAGL1
The inverse association of high density lipoprotein cholesterol (HDL-C) concentrations and risk of coronary heart disease was initially observed in the 1960s and 1970s and is now well established (1). Several studies have provided significant evidence supporting strong familial aggregation of HDL-C levels, but inconsistent results have been reported on whether this is attributable to a gene of major effect (26). HDL is a heterogeneous mixture of particles of different sizes, densities, and compositions (7, 8). Differential chemical precipitation separates HDL particles into two major subfractions: HDL2 and HDL3. As thoroughly reviewed by Wilson (9), a number of studies have found that lower concentrations of HDL2-C and HDL3-C are significantly associated with increased coronary heart disease risk, suggesting an important role of these two subfractions in coronary heart disease. However, there have been inconsistent findings across populations. To obtain a better understanding of the genetic influences on HDL levels, some studies have examined the more metabolically homogenous subfractions. A segregation study using families of patients who underwent coronary arteriography found that a major gene explains 34% of the total variation in HDL3-C and 9% in HDL-C, but the variation in HDL2-C is likely mostly the result of environmental factors (10). Using gradient gel electrophoresis to further differentiate three subclasses (HDL3a, HDL3b, and HDL3c) within HDL3 and two subclasses (HDL2a and HDL2b) within HDL2, a genome-wide linkage study (11) was performed on the five subclasses, and significant linkage [log-of-the-odds (LOD) > 3] to HDL2a and suggestive linkage (LOD > 2) to HDL3a and HDL2b was found. These studies suggest that the subfractions may provide better phenotypes than HDL-C levels for genetic studies. Other investigators (1218) have examined the association between HDL and HDL subfractions and candidate genes such as hepatic lipase, LPL, cholesteryl ester transfer protein (CETP), LCAT, and apolipoproteins A-I, A-II, A-IV, B, C-III, and E. Similar to the outcomes of the segregation studies, there is considerable heterogeneity regarding the significance of these associations in different populations. To identify chromosome regions likely to contain quantitative trait loci (QTLs) affecting HDL3-C concentration, we performed a genome-wide linkage study using 401 microsatellite markers and 330 extended families from the Framingham Heart Study. Because this analysis provided significant evidence of linkage to a region on chromosome 6, we identified a number of candidate genes within this promising region of the highest LOD score. A number of single nucleotide polymorphisms (SNPs) in these candidate genes were selected and genotyped to further map possible variants that may affect HDL3-C concentration.
Subjects The study subjects are members of the 330 largest extended families in the Framingham Heart Study. The selection criteria and study design of the Framingham Heart Study have been described in detail previously (19, 20). In brief, the study began in 1948 with the enrollment of 5,209 men and women, referred to as the original cohort, from Framingham, Massachusetts, who have undergone biennial examinations. Starting in 1971, 5,124 individuals, adult children of the original cohort, and their spouses, were recruited and are referred to as the offspring cohort. The members of the offspring cohort were examined every 4 years (except for an 8 year gap between the first and second examinations). Subjects in the Framingham Heart Study were ascertained without regard to any trait values. In the mid to late 1990s, 1,702 subjects from the 330 largest extended pedigrees were genotyped for a set of 401 microsatellite markers spaced 10 centimorgan (cM) apart. The analysis reported here was conducted on a subset of 907 offspring subjects who had HDL3-C levels measured at examination cycle 4 around 1987 and genotypic data of the CETP polymorphism. The sample included 729 sibling pairs, 37 half-sibling pairs, 41 avuncular pairs, and 489 first cousin pairs. CETP was genotyped on study participants who attended examinations 4 or 5 between 1987 and 1995 and for whom we had DNA. Participants may have missed both exams as a result of death, serious illness, moving out of Massachusetts, or other unknown reasons. Of those 1,089 subjects who attended examination 4 or 5, had DNA, and were members of the families that were included in the genome scan, 282 (23%) subjects were excluded because their CETP genotypes were not available. All subjects provided informed consent before each clinic visit and were examined under the standard protocol for the Framingham Offspring Study approved by the Institutional Review Board at Boston Medical Center (Boston, MA).
Genotyping of microsatellite markers
Measurement of lipids
Definition of traditional risk factors
Heritability and linkage analyses To reduce the variability caused by known risk factors, we calculated the standardized residuals from sex-specific multiple linear regression models adjusted for the traditional risk factors. We further calculated normalized deviate scores transformed from ranked standardized residuals for heritability and linkage analyses to avoid potentially inflated false-positive rates attributable to nonnormality. The linear regressions were conducted separately for men and women to account for different associations and to further adjust for oral contraceptive medicine, estrogen therapy, and menopause in women. The heritability and linkage analyses were conducted using MERLIN (23), which is based on a variance components methodology. A measure of the evidence for linkage, the LOD score, is log base 10 of the likelihood ratio of the model with a QTL effect to that without such effect associated with the marker locus under evaluation.
Bioinformatic identification of candidate genes in the linkage region
Protein sequences of encoded genes were searched with BLASTP against a "nonredundant" protein set and with TBLASTN against the mouse genome as described above. To provide further functional characterization, protein sequences were also analyzed for a range of functional domains [Pfam release 8.0 (25), http://pfam.wustl.edu; InterPRO release 5.3 (26), http://dip.doe-mbi.ucla.edu/], including putative transmembrane domains [TM-HMM version 2.0 (27), http://www.cbs.dtu.dk/services/TMHMM/], for putative subcellular localization [TargetP version 1.01 (28), http://www.cbs.dtu.dk/services/TargetP/], for potential N-linked glycosylation sites (NetNglyc version 1.0; R. Gupta, E. Jung, and S. Brunak, unpublished data, 2002; www.cbs.dtu.dk/services/NetNGlyc/), and for the presence and location of putative signal peptide cleavage sites [SignalP version 2.0.b2 (29), www.cbs.dtu.dk/services/SignalP/], all as a means to further characterize and understand the functions of these proteins, especially as they relate to lipid biochemistry.
SNP selection and genotyping
Determination of the genetic map for the SNPs and microsatellite markers Genetic locations in centimorgan of the SNPs were extrapolated from the existing Marshfield genetic map for microsatellite markers (31) and then were merged into the Marshfield genetic map to form a new map for linkage analysis. The extrapolation was as follows. First, the ratio of centimorgan distance to physical distance was calculated between two adjacentmicrosatellite markers; second, for each SNP between two adjacent markers, the genetic distance between this SNP and one of the markers was the physical distance times the ratio of centimorgan distance to physical distance between the two markers.
Linkage disequilibrium structure at the candidate genes For each pair of SNPs within the same candidate gene or located less than 500 kb apart, a measure of the strength of linkage disequilibrium (LD), D' (32), was calculated using Haploview (http://www.broad.mit.edu/personal/jcbarret/haploview/).
Family-based association analyses with SNPs in the candidate genes In haplotype analyses using FBAT (36), multiple marker haplotypes instead of single marker alleles were used in the score statistic. For markers with ambiguous phase, the test statistic was a weighted average of the score statistics over all compatible phase configurations. Two types of hypothesis tests were performed: one was haplotype specific for a difference between a specific haplotype against all other haplotypes combined; the other was a global test for a difference among all of the haplotypes that generalized the score statistic to a Chi-square statistic. The latter test was robust to false positives that could arise from multiple testing using the first type of test.
Characteristics of the study subjects Means and standard deviations for relevant anthropometric, behavioral, and biochemical risk factors are presented in Table 2. The mean HDL3-C concentrations were 38.5 and 46.2 mg/dl in men and women, respectively. Men had higher values for BMI, percentage of cigarette smokers, alcohol consumption, physical activity index, percentage on anticholesterol treatment, and percentage using ß-blockers.
Heritability and genome-wide linkage analysis with microsatellite markers The heritability estimate of fully adjusted HDL3-C concentration was 0.43 with a standard error of 0.08. Maximum multipoint LOD scores that were greater than 1 from the genome-wide linkage analyses using the scan with microsatellite markers are presented in Table 3. The highest multipoint LOD score was 3.7 at 133 cM on chromosome 6 (Fig. 1), nearest to marker GATA23F08. The next highest LOD scores are on chromosome 3 (LOD = 1.8 at 5 cM) and chromosome 8 (LOD = 1.6 at 8 cM).
We conducted additional linkage analyses for HDL2-C and HDL-C. Linkage signals for HDL-C were found on the same location as HDL3-C, but none of the LOD scores was >2. There was only suggestive evidence of linkage on chromosome 11 (LOD = 2.2 at 91 cM) for fully adjusted HDL2. We did not proceed with fine mapping for these two traits.
Identification of candidate genes in the linkage region
Determination of the genetic map for the SNPs and microsatellite markers The physical and genetic locations for both SNPs and existing microsatellite markers in the 2-LOD region are presented in Table 4. In Fig. 2, we plotted the multipoint LOD scores as well as FBAT P values from association analyses with both physical and genetic locations on the same figure. Locations of the SNPs and microsatellite markers are also marked on the figure.
LD structure at the candidate genes Twenty-eight SNPs in these candidate genes were selected and typed for further analyses (Table 4). All of the SNPs were in HWE or close to HWE (P > 0.02). LD structure of SNPs in the nine candidate genes is displayed in Fig. 3.
The LD between SNPs within the same candidate gene was generally strong (D' > 0.8) except for CTGF and AIG-1. For VNN1, VNN3, and VNN2, which were located within 82 kb of each other, strong LD existed only between SNPs in VNN3 and VNN2.
Fine mapping added 28 SNPs in the candidate genes to the existing map
The multipoint LOD scores on chromosome 6 after incorporating the SNPs into the existing map are presented in Fig. 1. The highest LOD score on chromosome 6 was 4.0 at 129 cM on the microsatellite marker GATA23F08. The 2-LOD support interval spans between 124 and 144 cM and covers seven of the nine candidate genes: NCOA7, CTGF, VNN1, VNN3, VNN2, AIG-1, and PLAGL1. Significant association (P < 0.05) was found with two SNPs, rs15960 (CTGF) and rs2257104 (PLAGL1), in family-based association analyses. The A allele of rs15960 was associated with lower levels of HDL3-C (Z = 2.95, P = 0.0032) in crude analyses without adjusting for traditional risk factors. The association was not significant after adjusting for traditional risk factors (Z = 1.3, P = 0.2). The A allele of rs2257104 has an additive effect associated with lower levels of HDL3-C (Z = 2.2, P = 0.026) in a fully adjusted model.
However, little of the linkage signal was explained by rs2257104. The proportion of total phenotypic variation explained by rs2257104 is 0.007. After adjusting for rs2257104, the difference in the multivariate LOD score at the SNP location is In haplotype analyses of SNPs within a candidate gene, significant global P values (P < 0.05) were only found with SNPs in PLAGL1 for fully adjusted HDL3-C (Table 5) by assuming a dominant effect with each haplotype. There were six haplotypes with frequencies >0.01. The haplotype h6 containing C, A, C, and T alleles at rs1884087, rs2257104, rs2076684, and rs2064495, respectively, had a frequency of 0.02 and was associated with lower HDL3-C levels (haplotype-specific P = 0.02). A borderline significant global P value (P = 0.07) was found with haplotypes in CTGT that contained rs15960 and rs928501, again only in the crude model.
We have found significant evidence of linkage at 133 cM on chromosome 6 for HDL3-C concentration using a 10 cM microsatellite marker map. This result was further confirmed by fine mapping using SNPs discovered among candidate genes located within the QTL region. Only two previously published studies found evidence of linkage to HDL-C or its subclasses on chromosome 6. Coon et al. (37) reported a maximum LOD score of 1.82 at 107 cM on chromosome 6 for HDL-C. Canizales-Quinteros et al. (38) recently identified a linkage region between 73 and 80 cM on chromosome 6 predisposing to increased HDL-C levels. These two regions were 30 and 60 cM, respectively, from our HDL3-C linkage peak on chromosome 6. The region of chromosome 6 in which we obtained the highest LOD is close to the region where a 4.64 LOD score at 144 cM was found for BMI measured at examination cycle 1 in the Framingham Heart Study (39). Our results were based on the HDL3-C measured during examination cycle 4, 16 years after the time of examination cycle 1. For BMI measured at examination cycle 4, the highest LOD score on chromosome 6 was also obtained at the same region, but the value of the LOD scores was 1.43. BMI was used as a covariate in our analysis, so our linkage result on chromosome 6 is not likely to be confounded by BMI. There has been a great deal of heterogeneity in the linkage findings of HDL-C and its subparticles. Significant findings (LOD > 3) were found on chromosomes 8 and 15 using 477 Mexican Americans of the San Antonio Family Heart Study (11), on chromosome 5 using 1,027 Caucasians of the Family Heart Study (6), on chromosome 12 using 534 pairs of siblings of the Quebec Family Study, on chromosome 10q11 using 1,109 individuals from 92 low HDL-C or hyperlipidemia families, and on chromosome 6 between 73 and 80 cM for increased HDL-C levels (38). It is difficult to explain the lack of replication across these studies. Multiple factors, including phenotype definition, ethnicity, gene-environment interaction, power, and false positives, could contribute to the lack of replication of significant results. Although there is no simple solution to all of these problems, fine mapping in the linkage region may prove an effective tool to discover the existence of any functional variants as the cost of genotyping declines. The genetic linkage data for HDL3-C was used to select a region on chromosome 6 from 6q22.33 to 6q24.3, and a bioinformatics analysis produced nine candidate genes. Each of these genes encodes a protein with the likelihood of playing a role in lipid homeostasis by physical interaction with lipid moieties, by regulating lipid biosynthesis or catabolism, by stearoyl hormone-dependent regulation of gene expression, or by association with manifestations of cardiovascular disease. By including the 28 SNPs in the nine candidate genes into the linkage analyses, the linkage information content of the marker data has been improved, thereby providing stronger information with which to evaluate linkage. Furthermore, the increase in the LOD score after incorporating the SNPs provides additional evidence that at least one QTL exists in this region.
Family-based association analyses revealed that two SNPs in two of the candidate genes were associated with HDL3-C. However, only rs2257104 in PLGAL1 was significant after adjusting for all of the traditional risk factors. PLAGL1, or pleiomorphic adenoma gene-like 1, encodes a putative coactivator of hormone-dependent nuclear receptors with several C2H2-type zinc finger domains. SNP rs2257104 is located at We noted that the A allele of rs2257104 resided on all three haplotypes (h2, h5, and h6) with negative Z statistics (Table 5), which was consistent with the result from single SNP association analyses of this marker. However, only the h6 haplotype was statistically significant compared with all others combined. None of the other SNPs in PLAGL1 had an allele that resided on haplotypes that corresponded only to positive or only to negative Z statistics. In sum, these results suggested that the significance of the global haplotype test is most likely attributable to rs2257104.
The estrogen receptor initiates transcription of select genes after binding of ligand and translocation of the receptor-ligand complex to the nucleus (41). Important to lipid physiology, it has been demonstrated that estrogen, via estrogen receptor and coactivators, promotes transcription of the ApoE (42) and ApoA1 genes (4345). However, for the following reasons, we do not believe that the gene encoding the estrogen receptor (ESR1) is directly responsible for the genetic determinant of HDL3-C levels: ESR1 maps to 154.8 cM (6q25.1) or The study subjects are those with CETP polymorphism typed, because we were also interested in testing whether some of the known lipid candidate genes, such as CETP and hepatic lipase, could explain the linkage peaks. We did not find that adjusting for CETP or hepatic lipase changed the linkage signals much on chromosome 6, which suggested lack of evidence for interaction between QTLs on chromosome 6 and these two known lipid candidate genes. However, when we used all of the subjects with HDL3-C without regard to whether they were typed for CETP genotype, the maximum LOD score decreased to 1.8 on chromosome 6 in the analyses with the microsatellite markers. Subjects with CETP genotyped were younger on average (48 vs. 50 years) than subjects with missing CETP, which may have resulted in a more homogeneous group to yield better linkage signals. Otherwise, there were no significant differences between those with CETP genotyping and those without.
Regulation of HDL metabolism, lipid homeostasis, and the determinants of cardiovascular health is complex and undoubtedly involves the functions and interactions of many different genes. On the long arm of chromosome 6 are several genes, which were not considered in this study, but whose roles in regulating cardiovascular health via lipid levels ought to be considered as candidates in affecting lipid metabolism. The LPA gene cluster, including APOARGC (apolipoprotein A-related gene C), LPA [lipoprotein Lp(a)], and possible pseudogenes LPAL1 and LPAL2, encodes genes in which certain variations are known to be risk factors for cardiovascular disease (46) but that are not related to the HDL3-C trait presented in this study. In addition, this cluster maps to 6q26-q27, A number of previous genome-wide linkage analyses of cardiovascular anthropometric and lipid phenotypes were based on the same genome scan used in this study. The phenotypes studied included triglycerides (47), BMI (39), waist circumference (48), and blood pressure (49, 50). Our finding of significant linkage on chromosome 6 may be subject to an inflated false-positive rate as a result of multiple testing. Because these anthropometric or lipid phenotypes may not be independent, however, a Bonferroni correction would be too conservative and is thus not suitable for correction of multiple testing here. As the LOD score increased in the fine mapping, the likelihood of a false positive is reduced for our study (51). Furthermore, the chromosome 6 location of the peak LOD score is close to the locations of peak LOD scores of BMI (39) that may be in the lipid metabolism pathway. Considering all of these factors, we believe that our results, although not completely conclusive, can still be a valuable reference for further research in HDL3 or lipid metabolism in general. In summary, a genome-wide scan of 401 microsatellite markers in 330 extended families of the Framingham Heart Study has revealed a promising and statistically significant linkage with HDL3-C concentrations on chromosome 6 (multipoint LOD = 3.7 at 133 cM). None of the classical genes associated with HDL metabolism are located within this region. Bioinformatic analyses of the region between 125 and 150 cM suggested the presence of nine interesting genes in which we further genotyped 28 SNPs. Linkage analyses incorporating those SNPs into the current marker map increased the linkage information of marker data in that region and resulted in a multipoint LOD score of 4.0 at 129 cM. Family-based association analyses revealed that SNP rs2257104 in PLAGL1 was associated with fully adjusted HDL3-C. Further study of variants in PLAGL1 and increased SNP density in the linkage region are warranted to more clearly define the potential functional variants.
The authors thank all of the individuals who participated in the Framingham Heart Study. This work was supported in part by Contracts N01-HC-25195 and 1-38038 and Grants HL-54776, P50-HL-63494, and RO1-HL-65230 from the National Heart, Lung, and Blood Institute and by Contracts 53-K06-5-10 and 58-1950-9-001 from the U. S. Department of Agriculture Research Service.
Submitted on
October 4, 2004
This article has been cited by other articles:
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Advertisement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||