|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Journal of Lipid Research, Vol. 46, 1416-1425, July 2005
Copyright © 2005 by American Society for Biochemistry and Molecular Biology








* Department of Biostatistics, Boston University, Boston, MA 02118
Department of Endocrinology, Nutrition, and Diabetes, Boston University, Boston, MA 02118

Department of Mathematics and Statistics, Boston University, Boston, MA 02118
Nutrition and Genomics Laboratory, Jean Mayer-United States Department of Agriculture Human Nutrition Research Center on Aging, Tufts University, Boston, MA 02111
** Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139
Published, JLR Papers in Press, April 1, 2005. DOI 10.1194/jlr.M400382-JLR200
1 To whom correspondence should be addressed. e-mail: qyang{at}bu.edu
| ABSTRACT |
|---|
|
|
|---|
10 centimorgan (cM) apart. Nine candidate genes were identified and annotated using a bioinformatics approach in the region of the highest linkage peak. Twenty-eight single nucleotide polymorphisms (SNPs) were selected from these candidate genes, and linkage and family-based association fine mapping were conducted using these SNPs. The highest multipoint log-of-the-odds (LOD) score from the initial linkage analysis was 3.7 at 133 cM on chromosome 6. Linkage analyses with additional SNPs yielded the highest LOD score of 4.0 at 129 cM on chromosome 6. Family-based association analysis revealed that SNP rs2257104 in PLAGL1 at
143 cM was associated with multivariable adjusted HDL3 (P = 0.03). Further study of the linkage region and exploration of other variants in PLAGL1 are warranted to define the potential functional variants of HDL-C metabolism.
Supplementary key words high density lipoprotein high density lipoprotein 3 microsatellite marker single nucleotide polymorphism PLAGL1
| INTRODUCTION |
|---|
|
|
|---|
HDL is a heterogeneous mixture of particles of different sizes, densities, and compositions (7, 8). Differential chemical precipitation separates HDL particles into two major subfractions: HDL2 and HDL3. As thoroughly reviewed by Wilson (9), a number of studies have found that lower concentrations of HDL2-C and HDL3-C are significantly associated with increased coronary heart disease risk, suggesting an important role of these two subfractions in coronary heart disease. However, there have been inconsistent findings across populations. To obtain a better understanding of the genetic influences on HDL levels, some studies have examined the more metabolically homogenous subfractions. A segregation study using families of patients who underwent coronary arteriography found that a major gene explains 34% of the total variation in HDL3-C and 9% in HDL-C, but the variation in HDL2-C is likely mostly the result of environmental factors (10). Using gradient gel electrophoresis to further differentiate three subclasses (HDL3a, HDL3b, and HDL3c) within HDL3 and two subclasses (HDL2a and HDL2b) within HDL2, a genome-wide linkage study (11) was performed on the five subclasses, and significant linkage [log-of-the-odds (LOD) > 3] to HDL2a and suggestive linkage (LOD > 2) to HDL3a and HDL2b was found. These studies suggest that the subfractions may provide better phenotypes than HDL-C levels for genetic studies.
Other investigators (1218) have examined the association between HDL and HDL subfractions and candidate genes such as hepatic lipase, LPL, cholesteryl ester transfer protein (CETP), LCAT, and apolipoproteins A-I, A-II, A-IV, B, C-III, and E. Similar to the outcomes of the segregation studies, there is considerable heterogeneity regarding the significance of these associations in different populations.
To identify chromosome regions likely to contain quantitative trait loci (QTLs) affecting HDL3-C concentration, we performed a genome-wide linkage study using 401 microsatellite markers and 330 extended families from the Framingham Heart Study. Because this analysis provided significant evidence of linkage to a region on chromosome 6, we identified a number of candidate genes within this promising region of the highest LOD score. A number of single nucleotide polymorphisms (SNPs) in these candidate genes were selected and genotyped to further map possible variants that may affect HDL3-C concentration.
| METHODS |
|---|
|
|
|---|
10 centimorgan (cM) apart. The analysis reported here was conducted on a subset of 907 offspring subjects who had HDL3-C levels measured at examination cycle 4 around 1987 and genotypic data of the CETP polymorphism. The sample included 729 sibling pairs, 37 half-sibling pairs, 41 avuncular pairs, and 489 first cousin pairs. CETP was genotyped on study participants who attended examinations 4 or 5 between 1987 and 1995 and for whom we had DNA. Participants may have missed both exams as a result of death, serious illness, moving out of Massachusetts, or other unknown reasons. Of those 1,089 subjects who attended examination 4 or 5, had DNA, and were members of the families that were included in the genome scan, 282 (23%) subjects were excluded because their CETP genotypes were not available.
All subjects provided informed consent before each clinic visit and were examined under the standard protocol for the Framingham Offspring Study approved by the Institutional Review Board at Boston Medical Center (Boston, MA).
Genotyping of microsatellite markers
DNA specimens were obtained from blood samples routinely collected during examinations of original and offspring subjects between 1987 and 1991 and between 1995 and 2000. DNA was extracted from the buffy coat of whole blood specimens using a Qiagen Blood and Cell Culture DNA Maxi Kit. A genome-wide scan with 401 microsatellite markers over the 22 chromosomes, at an average of 1 marker every 10 cM, was completed by Mammalian Genotyping Service (Marshfield, WI) in 2000. The marker screen set 9 and genotyping protocols are available at the website of the Center for Medical Genetics, Marshfield Medical Research Foundation (http://research.marshfieldclinic.org/genetics/).
Measurement of lipids
Fasting venous blood samples were collected, and plasma was separated from blood cells by centrifugation and immediately used for the measurement of lipids. HDL3-C concentrations were measured as described by Gidez et al. (21).
Definition of traditional risk factors
Body mass index (BMI; in kg/m2) was calculated using the measurements of weight and height. Alcohol consumption was reported as the usual number of drinks (of comparable ethanol content) per week. Cigarette smoking (yes/no) was defined by whether a person had a history of regularly smoking cigarettes. Physical activity index was derived by taking a weighted average of the number of hours spent in five different energy expenditure categories (sleep, 1.0; sedentary, 1.1; slight, 1.5; moderate, 2.4; and heavy, 5.0). Medication treatment variables (yes/no; including anticholesterol therapy, ß-blockers, and oral contraceptives) were defined by whether a person took such treatment in the past year. Menopause status (yes/no) was defined by whether a woman's menstrual periods had stopped for 1 year. Estrogen therapy (yes/no) was defined by whether a woman was taking estrogen therapy after menopause.
Heritability and linkage analyses
Linkage information content at each locus was calculated in GENEHUNTER. Linkage information content at a locus for a pedigree is defined as how closely the exact allele identity-by-descendent sharing at this locus can be determined for every relative pair (22). Linkage information content at a locus for a sample of pedigrees is a sum of information content at this locus over all of the pedigrees in the sample. The higher the linkage information content, the better the power to detect linkage if the locus is close to the QTL.
To reduce the variability caused by known risk factors, we calculated the standardized residuals from sex-specific multiple linear regression models adjusted for the traditional risk factors. We further calculated normalized deviate scores transformed from ranked standardized residuals for heritability and linkage analyses to avoid potentially inflated false-positive rates attributable to nonnormality. The linear regressions were conducted separately for men and women to account for different associations and to further adjust for oral contraceptive medicine, estrogen therapy, and menopause in women. The heritability and linkage analyses were conducted using MERLIN (23), which is based on a variance components methodology. A measure of the evidence for linkage, the LOD score, is log base 10 of the likelihood ratio of the model with a QTL effect to that without such effect associated with the marker locus under evaluation.
Bioinformatic identification of candidate genes in the linkage region
We conducted a bioinformatic search of candidate genes within or slightly beyond the 2-LOD support interval of the linkage peak (125150 cM or 6q22.336q24.3) on chromosome 6 (Fig. 1), the only chromosome on which significant linkage was found. MapViewer at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/) was used to obtain predicted genes. Similarity searches, using these predicted genes and genomic sequence as input, were performed with the BLAST algorithm (24) (http://www.ncbi.nlm.nih.gov/BLAST/).
|
SNP selection and genotyping
Twenty-eight SNPs in the candidate genes were selected from the public SNP database (http://www.ncbi.nlm.nih.gov/SNP) based on the following rationale. First, even distribution across a gene was sought with three, four, or five SNPs. Second, SNPs were selected based on their potential to be functional: altering protein sequence (particularly in functional domains), affecting mRNA splicing, or altering promoter activity. Third, attention was given to SNPs in regions of either protein-coding or noncoding sequence that is conserved in other species, notably the mouse. Fourth, SNPs must have a minor allele frequency of at least 5%. SNP genotyping was conducted using the ABI Prism SnapShot multiplex system (Applied Biosystems, Foster City, CA) and was described previously (30). The primers and probes used for genotyping are displayed in Table 1.
|
Linkage disequilibrium structure at the candidate genes
Each SNP was evaluated for deviation from Hardy-Weinberg equilibrium (HWE) by randomly selecting one subject from each family and comparing the genotype frequencies of those unrelated subjects with the expected frequencies under HWE in a Chi-square statistic of 2 degrees of freedom.
For each pair of SNPs within the same candidate gene or located less than 500 kb apart, a measure of the strength of linkage disequilibrium (LD), D' (32), was calculated using Haploview (http://www.broad.mit.edu/personal/jcbarret/haploview/).
Family-based association analyses with SNPs in the candidate genes
A score statistic that captures the covariance between marker genotype and trait was calculated using FBAT (33, 34) to test for association. The score statistic divided by its variance was distributed as standard normal (Z). This statistic conditions on parental genotypes to control for bias caused by potential population admixture. Population admixture can be characterized as differential marker allele and trait distribution among unobserved subpopulations. When parental genotypes were incomplete, the distribution of the score statistic was conditioned on the genotypes of available parents and the whole sibship. Because the association was tested with markers in the linkage region, an empirical variance of the score statistic that took into account the dependence among relatives was used in computing the conditional distribution (35).
In haplotype analyses using FBAT (36), multiple marker haplotypes instead of single marker alleles were used in the score statistic. For markers with ambiguous phase, the test statistic was a weighted average of the score statistics over all compatible phase configurations. Two types of hypothesis tests were performed: one was haplotype specific for a difference between a specific haplotype against all other haplotypes combined; the other was a global test for a difference among all of the haplotypes that generalized the score statistic to a Chi-square statistic. The latter test was robust to false positives that could arise from multiple testing using the first type of test.
| RESULTS |
|---|
|
|
|---|
|
|
Identification of candidate genes in the linkage region
Nine genes that code for proteins that are candidates, either because of a likely direct role in regulating HDL3-C level or a more indirect mechanism mediated via a role in some aspect of lipid biochemistry, were identified between 125 and 150 cM. These nine candidate genes are NCOA7, CTGF, VNN1, VNN3, VNN2, AIG-1, PLAGL1, LOC285746, and FLJ14735 (LRP11). Locations and annotations of those candidate genes are listed in Table 4.
|
In Fig. 2, we plotted the multipoint LOD scores as well as FBAT P values from association analyses with both physical and genetic locations on the same figure. Locations of the SNPs and microsatellite markers are also marked on the figure.
|
|
Fine mapping added 28 SNPs in the candidate genes to the existing map
Compared with existing microsatellite markers, incorporating the 28 SNPs into the existing 10 cM map increased the linkage information content of genotype data in the region (Fig. 4). The largest increases occurred at
134 and 140 cM, and each was
0.1.
|
Significant association (P < 0.05) was found with two SNPs, rs15960 (CTGF) and rs2257104 (PLAGL1), in family-based association analyses. The A allele of rs15960 was associated with lower levels of HDL3-C (Z = 2.95, P = 0.0032) in crude analyses without adjusting for traditional risk factors. The association was not significant after adjusting for traditional risk factors (Z = 1.3, P = 0.2). The A allele of rs2257104 has an additive effect associated with lower levels of HDL3-C (Z = 2.2, P = 0.026) in a fully adjusted model.
However, little of the linkage signal was explained by rs2257104. The proportion of total phenotypic variation explained by rs2257104 is 0.007. After adjusting for rs2257104, the difference in the multivariate LOD score at the SNP location is
0.04.
In haplotype analyses of SNPs within a candidate gene, significant global P values (P < 0.05) were only found with SNPs in PLAGL1 for fully adjusted HDL3-C (Table 5) by assuming a dominant effect with each haplotype. There were six haplotypes with frequencies >0.01. The haplotype h6 containing C, A, C, and T alleles at rs1884087, rs2257104, rs2076684, and rs2064495, respectively, had a frequency of 0.02 and was associated with lower HDL3-C levels (haplotype-specific P = 0.02). A borderline significant global P value (P = 0.07) was found with haplotypes in CTGT that contained rs15960 and rs928501, again only in the crude model.
|
| DISCUSSION |
|---|
|
|
|---|
30 and 60 cM, respectively, from our HDL3-C linkage peak on chromosome 6. The region of chromosome 6 in which we obtained the highest LOD is close to the region where a 4.64 LOD score at 144 cM was found for BMI measured at examination cycle 1 in the Framingham Heart Study (39). Our results were based on the HDL3-C measured during examination cycle 4, 16 years after the time of examination cycle 1. For BMI measured at examination cycle 4, the highest LOD score on chromosome 6 was also obtained at the same region, but the value of the LOD scores was 1.43. BMI was used as a covariate in our analysis, so our linkage result on chromosome 6 is not likely to be confounded by BMI.
There has been a great deal of heterogeneity in the linkage findings of HDL-C and its subparticles. Significant findings (LOD > 3) were found on chromosomes 8 and 15 using 477 Mexican Americans of the San Antonio Family Heart Study (11), on chromosome 5 using 1,027 Caucasians of the Family Heart Study (6), on chromosome 12 using 534 pairs of siblings of the Quebec Family Study, on chromosome 10q11 using 1,109 individuals from 92 low HDL-C or hyperlipidemia families, and on chromosome 6 between 73 and 80 cM for increased HDL-C levels (38). It is difficult to explain the lack of replication across these studies. Multiple factors, including phenotype definition, ethnicity, gene-environment interaction, power, and false positives, could contribute to the lack of replication of significant results. Although there is no simple solution to all of these problems, fine mapping in the linkage region may prove an effective tool to discover the existence of any functional variants as the cost of genotyping declines.
The genetic linkage data for HDL3-C was used to select a region on chromosome 6 from 6q22.33 to 6q24.3, and a bioinformatics analysis produced nine candidate genes. Each of these genes encodes a protein with the likelihood of playing a role in lipid homeostasis by physical interaction with lipid moieties, by regulating lipid biosynthesis or catabolism, by stearoyl hormone-dependent regulation of gene expression, or by association with manifestations of cardiovascular disease. By including the 28 SNPs in the nine candidate genes into the linkage analyses, the linkage information content of the marker data has been improved, thereby providing stronger information with which to evaluate linkage. Furthermore, the increase in the LOD score after incorporating the SNPs provides additional evidence that at least one QTL exists in this region.
Family-based association analyses revealed that two SNPs in two of the candidate genes were associated with HDL3-C. However, only rs2257104 in PLGAL1 was significant after adjusting for all of the traditional risk factors. PLAGL1, or pleiomorphic adenoma gene-like 1, encodes a putative coactivator of hormone-dependent nuclear receptors with several C2H2-type zinc finger domains. SNP rs2257104 is located at
143 cM, which had a LOD score of 1.9 in the linkage analyses incorporating SNPs into the current map. For the SNPs near the maximum LOD score at 128 cM, we did not find any association for fully adjusted HDL3-C. Although it is possible that the SNPs we typed near the location of the maximum LOD score are not in linkage disequilibrium with a QTL at that location, it is also possible that the QTL was not at the location of the maximum LOD score. Simulation studies (40) have shown that compared with a 10 cM map, further fine mapping using linkage analysis did not result in much reduction in the average location error of maximum LOD score from the QTL. Although only nine candidate genes in the 25 cM interval were investigated and there was considerable distance between these candidate genes, we feel that these nine genes represent the best candidates based on extensive bioinformatics analysis. Nonetheless, we may have missed genes that are located between or outside of these candidate genes but are associated with HDL3-C. Increasing the density of SNP by genotyping more SNPs within and outside of the nine candidate genes in the linkage region would increase the likelihood of detecting a true association.
We noted that the A allele of rs2257104 resided on all three haplotypes (h2, h5, and h6) with negative Z statistics (Table 5), which was consistent with the result from single SNP association analyses of this marker. However, only the h6 haplotype was statistically significant compared with all others combined. None of the other SNPs in PLAGL1 had an allele that resided on haplotypes that corresponded only to positive or only to negative Z statistics. In sum, these results suggested that the significance of the global haplotype test is most likely attributable to rs2257104.
The estrogen receptor initiates transcription of select genes after binding of ligand and translocation of the receptor-ligand complex to the nucleus (41). Important to lipid physiology, it has been demonstrated that estrogen, via estrogen receptor and coactivators, promotes transcription of the ApoE (42) and ApoA1 genes (4345). However, for the following reasons, we do not believe that the gene encoding the estrogen receptor (ESR1) is directly responsible for the genetic determinant of HDL3-C levels: ESR1 maps to 154.8 cM (6q25.1) or
7 Mb beyond our 2-LOD interval; ESR1 is at a LOD score of 0.9, whereas the maximum LOD score seen here is 4.0 after incorporating the SNPs in the linkage analyses. Furthermore, a reanalysis of the linkage data in which we adjusted for either or both of the genotypes of two common variants in ESR1, PvuII and C1335G, generated a similar maximum LOD score (difference in LOD score of <0.17) on chromosome 6.
The study subjects are those with CETP polymorphism typed, because we were also interested in testing whether some of the known lipid candidate genes, such as CETP and hepatic lipase, could explain the linkage peaks. We did not find that adjusting for CETP or hepatic lipase changed the linkage signals much on chromosome 6, which suggested lack of evidence for interaction between QTLs on chromosome 6 and these two known lipid candidate genes. However, when we used all of the subjects with HDL3-C without regard to whether they were typed for CETP genotype, the maximum LOD score decreased to 1.8 on chromosome 6 in the analyses with the microsatellite markers. Subjects with CETP genotyped were younger on average (48 vs. 50 years) than subjects with missing CETP, which may have resulted in a more homogeneous group to yield better linkage signals. Otherwise, there were no significant differences between those with CETP genotyping and those without.
Regulation of HDL metabolism, lipid homeostasis, and the determinants of cardiovascular health is complex and undoubtedly involves the functions and interactions of many different genes. On the long arm of chromosome 6 are several genes, which were not considered in this study, but whose roles in regulating cardiovascular health via lipid levels ought to be considered as candidates in affecting lipid metabolism. The LPA gene cluster, including APOARGC (apolipoprotein A-related gene C), LPA [lipoprotein Lp(a)], and possible pseudogenes LPAL1 and LPAL2, encodes genes in which certain variations are known to be risk factors for cardiovascular disease (46) but that are not related to the HDL3-C trait presented in this study. In addition, this cluster maps to 6q26-q27,
10 Mb downstream of LETAL. Several IDDM (insulin-dependent diabetes mellitus) loci have been identified but not narrowed to a specific gene. IDDM5, IDDM8, and IDDM15, although mapping to 6q24-q27, 6q25-q27, and 6q21, respectively, are not considered further because the gene or genetic element responsible for the observed phenotype has not been discovered and because, unlike type II diabetes, which has been shown to have important links to cardiovascular disease via HDL, such a connection to type I diabetes has not been established.
A number of previous genome-wide linkage analyses of cardiovascular anthropometric and lipid phenotypes were based on the same genome scan used in this study. The phenotypes studied included triglycerides (47), BMI (39), waist circumference (48), and blood pressure (49, 50). Our finding of significant linkage on chromosome 6 may be subject to an inflated false-positive rate as a result of multiple testing. Because these anthropometric or lipid phenotypes may not be independent, however, a Bonferroni correction would be too conservative and is thus not suitable for correction of multiple testing here. As the LOD score increased in the fine mapping, the likelihood of a false positive is reduced for our study (51). Furthermore, the chromosome 6 location of the peak LOD score is close to the locations of peak LOD scores of BMI (39) that may be in the lipid metabolism pathway. Considering all of these factors, we believe that our results, although not completely conclusive, can still be a valuable reference for further research in HDL3 or lipid metabolism in general.
In summary, a genome-wide scan of 401 microsatellite markers in 330 extended families of the Framingham Heart Study has revealed a promising and statistically significant linkage with HDL3-C concentrations on chromosome 6 (multipoint LOD = 3.7 at 133 cM). None of the classical genes associated with HDL metabolism are located within this region. Bioinformatic analyses of the region between 125 and 150 cM suggested the presence of nine interesting genes in which we further genotyped 28 SNPs. Linkage analyses incorporating those SNPs into the current marker map increased the linkage information of marker data in that region and resulted in a multipoint LOD score of 4.0 at 129 cM. Family-based association analyses revealed that SNP rs2257104 in PLAGL1 was associated with fully adjusted HDL3-C. Further study of variants in PLAGL1 and increased SNP density in the linkage region are warranted to more clearly define the potential functional variants.
| ACKNOWLEDGMENTS |
|---|
Manuscript received October 4, 2004 and in revised form March 2, 2005.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
B. Kaess, M. Fischer, A. Baessler, K. Stark, F. Huber, W. Kremer, H. R. Kalbitzer, H. Schunkert, G. Riegger, and C. Hengstenberg The lipoprotein subfraction profile: heritability and identification of quantitative trait loci J. Lipid Res., April 1, 2008; 49(4): 715 - 723. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sherva, P. Yue, G. Schonfeld, and R. J. Neuman Evidence for a quantitative trait locus affecting low levels of apolipoprotein B and low density lipoprotein on chromosome 10 in Caucasian families J. Lipid Res., December 1, 2007; 48(12): 2632 - 2639. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Watanabe, S. Soderlund, A. Soro-Paavonen, A. Hiukka, E. Leinonen, C. Alagona, R. Salonen, T.-P. Tuomainen, C. Ehnholm, M. Jauhiainen, et al. Decreased High-Density Lipoprotein (HDL) Particle Size, Pre{beta}-, and Large HDL Subspecies Concentration in Finnish Low-HDL Families: Relationship With Intima-Media Thickness Arterioscler. Thromb. Vasc. Biol., April 1, 2006; 26(4): 897 - 902. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||