Advertisement
J. Lipid Res.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Originally published In Press as doi:10.1194/jlr.M600372-JLR200 on November 15, 2006

Papers In Press, published online ahead of print February 1, 2007
J. Lipid Res., doi:10.1194/jlr.M600372-JLR200
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
M600372-JLR200v1
48/2/434    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Thompson, J. F.
Right arrow Articles by Hyde, C. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Thompson, J. F.
Right arrow Articles by Hyde, C. L.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Journal of Lipid Research, Vol. 48, 434-443, February 2007
Copyright © 2007 by American Society for Biochemistry and Molecular Biology

High-density genotyping and functional SNP localization in the CETP geneboxs

John F. Thompson1,*, Linda S. Wood*, Eve H. Pickering{dagger}, Bryan DeChairo§ and Craig L. Hyde{dagger}

* Pharmacogenomics, Pfizer Global Research and Development, Groton, CT 06340
{dagger} Statistical Applications, Pfizer Global Research and Development, Groton, CT 06340
§ Molecular Profiling, Pfizer Global Research and Development, Groton, CT 06340

boxs The online version of this article (available at http://www.jlr.org) contains supplemental data in the form of one figure and one table. Back

Published, JLR Papers in Press, November 15, 2006.

1 To whom correspondence should be addressed. e-mail: john.f.thompson{at}pfizer.com


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The cholesteryl ester transfer protein gene (CETP) has been the subject of hundreds of genetic analyses that typically focus on a small number of polymorphisms within a single ethnic group. Furthermore, the extent of DNA beyond the transcribed sequence from which single nucleotide polymorphisms (SNPs) may influence CETP expression has not been well defined. To better understand the role of natural variation in modulating CETP and high density lipoprotein-cholesterol (HDL-C) levels, dense genotyping of CETP and regions up to 15 kb on either side of the gene was carried out on >2,000 individuals. A complex, nonlinear set of linkage disequilibrium bins was found, with many bins interspersed along the DNA sequence and spread over large regions of the gene. Bins assigned based on large numbers of individuals matched the small subset of SNPs that had been assigned to bins previously with a small number of individuals. Associations of known functional SNPs with HDL-C were found, but there were suggestions that there are additional functional SNPs not characterized previously. Narrowing of the set of likely functional SNPs was accomplished by comparing associations observed in different ethnic groups. The promoter SNP most highly associated with HDL-C that is likely to be functional, position –4,502, alters a consensus transcription factor binding site.

Supplementary key words genetics • high density lipoprotein • single nucleotide polymorphism • cholesteryl ester transfer protein


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The importance of the cholesteryl ester transfer protein gene (CETP) in affecting high density lipoprotein-cholesterol (HDL-C) levels in humans was originally detected when individuals lacking active protein were identified based on high HDL-C levels (1). Since then, individuals lacking CETP as well as those with varying levels of CETP or with variant sequences have been studied extensively (reviewed in Refs. 2, 3). In addition to the complete null mutations, many single nucleotide polymorphisms (SNPs) in CETP have been found to be reproducibly associated with protein mass/activity and/or HDL-C. When sample sizes are large enough, there is a high degree of consistency across studies and populations. Results with the closely linked phenotypes of CETP mass/activity and HDL-C are highly replicated, but associations with other, more complex phenotypes, such as cardiovascular disease, have been less easily replicated. However, when studies are of sufficient size and properly designed, associations with the more complex phenotypes can often be found. For example, a meta-analysis of the TaqIB SNP showed that the allele associated with low CETP was also associated with high HDL-C and lower levels of coronary artery disease (4). When studies examine only one or a small number of SNPs, integration of results with other studies can be challenging.

In addition to the numerous amino acid variants that have been detected in CETP (5), there is also evidence that promoter SNPs are even more significantly associated with HDL-C than those that change a single amino acid. Common promoter polymorphisms at positions –629 and –971 and a variable repeat sequence have all been reproducibly associated with CETP and/or HDL-C levels (610). These associations at the 5' end of the gene are stronger than those observed with SNPs causing amino acid changes at the 3' end of the gene. The functionality of the –629 SNP has been linked to changes in an Sp1/Sp3 binding site (7, 11), whereas the SNP at –971 was shown not to affect transcription (8). No results have been reported on the functional role of the variable repeat sequence. Large segments of the promoter have been fused to reporter genes to determine functional regions, but only changes at positions –629 and –38 have been examined as a function of naturally occurring polymorphisms (7, 11).

Linkage disequilibrium (LD) within CETP makes the identification of functional SNPs difficult. When only low-density genotypic information was available, genome structure was generally approximated by linear collections of haploblocks. Within the CETP gene, initial studies showed two haploblocks, one covering the promoter region and the 5' half of the gene and the second covering the 3' half of the gene and including many nonsynonymous SNPs (6, 10, 12). As higher density SNP information has become available, it became apparent that a linear collection of haploblocks is a poor approximation of genome structure. An improved description consists of multiple LD bins that are interspersed along the linear DNA sequence, with each bin containing a distinct subset of SNPs (13). These bins are described empirically and, for any given region of the genome, the extent and number of LD bins vary across different ethnicities. Typically, there are cutoffs for both minor allele frequency (MAF; 5%) and extent of LD (R2 > 0.8) to define SNPs within each bin (13).

We have previously published association studies with 20 SNPs in and near the CETP gene with ~2,500 individuals (5, 10, 11, 14, 15). Although this has provided significant insight into the functional nature of CETP SNPs, there are still many unanswered questions about the role of different SNPs, how they are linked to each other, and the detailed genomic structure surrounding CETP. To help answer some of these questions, we have genotyped 63 additional SNPs in >2,000 individuals and integrated that data with other sources of information to generate a highly detailed map of CETP and its association with HDL-C.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Samples and genotyping
The Atorvastatin Comparative Cholesterol Efficacy and Safety Study (ACCESS) (16) was designed to determine the safety and efficacy profile of atorvastatin compared with other HMG-CoA reductase inhibitors when used to treat patients with National Cholesterol Education Program LDL-C criteria. Whole blood from participating subjects was obtained with appropriate institutional review and appropriate informed consent documentation that defined the study design and provided an assessment of the risks and benefits associated with study participation. A second European cohort came from a previous Pfizer clinical trial in the cardiovascular area that recruited healthy patients (n = 664). DNA from another African-American cohort (n = 250) was purchased from Genomics Collaborative, Inc. (Cambridge, MA). All laboratory tests were performed at a central laboratory (Medical Research Laboratories, Highland Heights, KY) certified by the National Heart, Lung, and Blood Institute/Centers for Disease Control Part III Program. HDL-C was measured in a fasting sample. No subfraction analysis was done.

Genomic DNA was extracted from whole blood using the PureGene DNA isolation system (Gentra) according to the manufacturer's protocol. Some SNPs discussed here were first reported elsewhere (5, 10, 11, 14, 15) and genotyped as described in those publications. SNPs reported for the first time here were genotyped using either TaqMan or SNPlex technology according to the manufacturer's instructions (Applied Biosystems, Foster City, CA).

Statistical analysis
The goal of the statistical analysis was to test for significant genetic associations between HDL-C levels and CETP SNPs across European (n = 3,129) and African (n = 420) subjects. A small number of Asian subjects (n = 36) were also used for comparison of fitted genotype effects, although this population was considered too small for hypothesis testing. The largest cohort studied was from ACCESS and included individuals with European (2,465), African-American (170), and Asian (36) ancestry, after removing subjects who were outliers (beyond 5 sigma) and/or who had missing data for critical phenotypes. Demographic information for the genotyped individuals is listed in Table 1 .


View this table:
[in this window]
[in a new window]

 
TABLE 1. Population demographics

 
It was determined via standard inspection of qq-plots that a log transformation was appropriate for the HDL-C response. Unfortunately, the variance in log(HDL-C) varied significantly by cohort as well as by gender, although to a lesser degree. However, within the ACCESS cohort, the African females exhibited much higher variance than either the African males or the Europeans of either gender. Hence, variance was allowed to vary by three factors: cohort, ethnicity, and gender.

The model used had log(HDL-C) as the response and genotype (coded as a three-level factor) as the main effect to be tested for. Explanatory covariates, all of which were significant against log(HDL-C), were as follows: age, gender, ethnicity, cohort, and alcohol consumption (coded as a three-level factor on weekly alcohol consumption: no drinks, 1 to <10 drinks, and 10 or more drinks, based on the distribution of values).

A generalized least-squares model was used, allowing for the heterogeneous variance components described above. Genotype significance for individual ethnicities was evaluated by likelihood comparisons of the full model with one with the genotypes of the targeted ethnicity assigned to a single factor, using a Chi-square test. The final model used for overall significance was

Formula 1(1)
This model tested the hypothesis for overall genotype effect, and in addition, it was compared against an identical model with an ethnicity x genotype interaction term added, and the significance of this interaction term was evaluated using a Chi-square test comparing the deviances of the two models.

The hypothesis tests were validated with a permutation test. Specifically, because many of the SNPs tested were in high LD with each other (and hence were far from independent), multiple-testing adjustment was performed by comparing the rank-ordered hypothesis test results against 5,000 permutations in which sample IDs were permuted within each ethnicity and then remerged to the full set of genotype values, thus preserving both the underlying LD structure and the explanatory covariates as they related to HDL-C while still simulating the null hypothesis for genotype effects. The adjusted P value for each SNP is the maximum (worst) of the false discovery rate so calculated and the point estimate of its P value from the individual permutation test.

In generating effects plots, the Asian population was merged with the African and European populations, then the entire data set was fit to a null model (i.e., one without a genotype term), and the residuals were plotted by ethnicity and genotype using box and whisker plots. Genotypes are denoted by A, B, or C, with A denoting the wild-type homozygotes (as defined by empirical observation of the pooled population), B denoting the heterozygotes, and C denoting the homozygotes in the minor allele, still based on empirical observation over all subjects. These definitions remained fixed across ethnicities, even if the MAF crossed the 50% barrier going from one ethnicity to another.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Genomic structure and LD blocks
To gain maximal information about the genetic structure surrounding CETP, databases and the literature were searched for all SNPs and other polymorphisms. The region we examined included 15 kb upstream of the CETP gene, the 22-kb gene, and 13 kb downstream of the gene. Segments of this region have been resequenced in 10 to 200 individuals (12, 13, 15, 1721), with the greatest focus on the promoter and exons. Many laboratories carried out the sequencing in multiple ethnic groups. Within the heavily sequenced regions, all common SNPs have been identified, but some introns and regions outside of the gene have not been as well characterized for variation.

Only two sets of genotype data published to date span the entire 50 kb region of interest, HapMap (http://www.hapmap.org/cgi-perl/gbrowse/hapmap20_B35/) and those SNPs published by Hinds et al. (13) (http://genome.perlegen.com/browser/index.html). Within this 50 kb region, there are 103 HapMap SNPs with nonzero MAF, including a subset of 87 with a frequency of >5% in at least one ethnic group. Similarly, there are 33 SNPs reported by Hinds et al. (13) (all nonzero MAF) in this region. Thirteen of these 33 SNPs are not genotyped in the HapMap set. Using the individual genotype data from the data sets for which they are available, an initial set of LD bins was determined as described (13). The resulting bins were compared across data sets to the extent possible to determine which bins could be combined. Because all of these data sets have a limited number of individuals and none includes complete sequence information over the entire region, many of the LD bins are poorly defined. To better understand the nature of these bins and to collapse them into a smaller set, SNPs were chosen from across the region for analysis in a large, multi-ethnic population. Genotype data from the original databases and the literature were generated using a variety of techniques. Some SNPs are not amenable to genotyping by particular technologies, and not all could be assayed by the SNPLex technology used here, preventing us from obtaining a complete set of genotypes. To the extent possible, some SNPs unique to each data set were included so that comparison across studies could be accomplished.

All SNPs that we genotyped were in Hardy-Weinberg equilibrium with P > 0.05, with the exception of one SNP each in individuals of European and African ancestry. Both of these SNPs were P > 0.03 and thus within the expected range of normal variation for a study with this number of SNPs tested. Of the 103 published HapMap SNPs, we generated more in-depth data for 40 of them, including 38 with MAF > 5% in at least one population. Of the 33 Hinds et al. (13) SNPs, we generated data for 29 of them. By generating genotypes for thousands of individuals that includes SNPs from each of these sets of individuals, we are able to establish LD bins representing SNPs spanning all of these data sets. A summary of the allele frequency for each ethnicity for polymorphisms with MAF > 5% for which we have generated data (reported previously or here for the first time) is shown in Fig. 1 . For completeness, uncommon SNPs and an additional 101 SNPs for which others have published individual genotype data are also included in supplementary Table I. Some SNPs have multiple dbSNP identifiers, and we have used the one chosen by National Center for Biotechnology Information and listed alternative numbers. Even though the group size for some populations is relatively small (20 to 50 individuals), the minor allele frequencies are consistent across studies within the same ethnic group (see supplementary Table I).


Figure 1
View larger version (64K):
[in this window]
[in a new window]

 
Fig. 1. Cholesteryl ester transfer protein (CETP) single nucleotide polymorphisms (SNPs). The dbSNP nomenclature for each SNP examined by us with minor allele frequency (MAF) > 5% in at least one population is listed in the first column. A more complete list of SNPs and additional data for all SNPs are provided in supplementary Table I. Trials [AC = ACCESS (for Atorvastatin Comparative Cholesterol Efficacy and Safety Study); PF = Pfizer Phase II cardiovascular trial; GC = Genomics Collaborative Incorporated (GCI) African-American cohort] in which individuals were genotyped are listed in the second column. Bin numbers for each ethnic group are listed in columns 3–5, with shaded blocks representing tagging SNPs identified using HapMap samples for individuals of European (column 3) and African (column 5) ancestry. P values for association with high density lipoprotein-cholesterol (HDL-C) generated using a generalized least-squares method and adjusting for covariates but not for multiple testing are provided in columns 6, 8, and 10, with the smallest P values listed as <0.00001 even though some are <10–10. False-discovery rates generated from a permutation analysis that better corrects for multiple testing, linkage disequilibrium (LD) structure, and violation of modeling assumptions are provided in columns 7, 9, and 11. Because only 5,000 permutations were done, the smallest P values are listed as <0.001, but these could be substantially smaller if more permutations were attempted. Of note, the permutation test also revealed the conservative nature of the underlying model, as many uncorrected point estimates for P values were more significant in the permutation test than in the original model (particularly among Caucasians); this is the reason why many of the adjusted false-discovery rates are actually lower than the unadjusted P values. In columns 6–11, <x means the P value is >x/10 but not >x, unless x is one of the minimum listed values cited above, or unless x = 0.05, in which case the value is >0.01 but not >0.05. All SNPs were tested for association. In column 12, the exon/intron positions are provided. In column 13, the position relative to the start and end of transcription and SNPs that are located within the coding sequence is listed. The nucleotide position on chromosome 16 in Build 35 is listed in column 14. Columns 15–17 provide allele frequencies in the ACCESS trial.

 
An LD chart with individual R2 values for all of the SNPs we genotyped in 2,458 individuals with European ancestry with MAF > 5% is shown in supplementary Fig. I. This includes 56 SNPs across 50 kb. Using Haploview, seven LD blocks are identified that span 32 kb and include 50 SNPs (Fig. 2 ). Over the same genomic region, 59 SNPs genotyped in HapMap have MAF > 5% in the 90 person CEU population. Thirty-one of these SNPs are identical to those we genotyped. With the HapMap SNPs, six LD blocks are identified with Haploview that cover 25 kb and include 43 SNPs. Although the blocks in the CEU and ACCESS populations align well for the most part, there are differences in both the number of blocks and their boundaries, primarily in the 5' region of the gene. Because of the much larger ACCESS population, many more SNPs are incorporated into the LD blocks, including three that were genotyped by HapMap but not placed in LD blocks. Thus, the large number of individuals genotyped allows many more SNPs to be placed in LD blocks, but the relevance of these blocks to the detailed genomic structure is also much more apparent, as perhaps best visualized by the "checkerboard" pattern of LD in block 7, 3 to 8 kb downstream of the CETP gene. In addition, most of the weak LD interactions that appear in the HapMap population disappear with the much larger ACCESS population.


Figure 2
View larger version (16K):
[in this window]
[in a new window]

 
Fig. 2. The dbSNP identifiers (column 1) and positions relative to the CETP gene (column 2) are shown for all SNPs with MAF > 5% that we genotyped in ACCESS or that are in HapMap for individuals of European ancestry. LD blocks as determined by Haploview are listed as 1 through 7, skipping 5 in HapMap, and are shown for the SNPs genotyped by us (column 3) or by HapMap (column 4). If 0 is listed, the SNP is not present in a block. If no number is listed, the SNP was not genotyped. Contiguous blocks are shaded light gray, and discordant areas are shaded dark gray.

 
LD bins
Initial determination of LD bins was carried out to select SNPs for genotyping, but these bins were redefined after the complete set of genotypes was obtained. Numbering of LD bins is arbitrary. We have used the same numbers across ethnicities where possible, but it is clear that the boundaries for these bins are not the same across ethnic groups. As noted previously by Hinds et al. (13), the LD bins are highly noncolinear, with significant interdigitation of SNPs in different bins.

When we define LD bins using our data, the SNPs grouped together are very consistent within an ethnic group compared with those generated by Hinds et al. (13) or using the same definitions with HapMap data, even though both sets had far fewer individuals. The only discrepancies found with either data set are with SNPs that are very close to either the MAF or R2 cutoff in one population or the other. Otherwise, there is perfect agreement for LD bin composition.

Using the cutoffs of MAF > 5% and R2 > 0.8 for LD bin determination and all available data, 102 SNPs can be placed in 49 bins for individuals of European ancestry. The most populated bin contains eight SNPs, and the largest span covered by a single bin is 10,731 bp. For HapMap samples, 40 tagging SNPs representing 59 SNPs are identified. Nearly identical bins are generated with only two HapMap tagging SNPs falling into the same bin as defined here. When the additional 43 SNPs not found in HapMap are added, this required only an additional nine bins, confirming that there is a point of diminishing returns for genotyping but that genotyping at a frequency of greater than one SNP per kilobase still has the potential to generate useful data.

For individuals of African ancestry, there are 97 SNPs in 66 bins. The most populated bin contains seven SNPs, and the largest spans 6,621 bp. The larger number of bins and shorter extent of DNA sequence covered by each bin in those of African-American versus European ancestry are similar to what has been observed previously (13). Our bins match well with the African Americans characterized by Hinds et al. (13) but poorly with the Nigerian HapMap data. This highlights the difficulties of comparing across populations and ensuring that ancestry is matched appropriately. We have not attempted a comparison of our Asian ancestry bins with others because of the small number of individuals genotyped.

Associations with HDL-C
In addition to characterizing the genomic structure surrounding CETP, we also wanted to determine how the association with HDL-C was superimposed on that structure. Among individuals of European ancestry, the strongest associations are clearly in the promoter but span a very broad region. We have extended coverage to within 3 kb of the neighboring upstream gene, HERPUD1, 15 kb from the CETP transcriptional start. Our finding that a SNP >10 kb away (rs9989419) from the start site is associated with HDL-C suggests that distal interactions may play a role in regulating CETP levels. However, much of this association appears to arise from LD with nearby SNPs. All of the SNPs most highly associated with HDL-C among individuals of European ancestry are in LD bin 8. LD bins 6 and 10 are interspersed in this part of the promoter region but are orders of magnitude less significantly associated with HDL-C based on point estimates (Fig. 1). Several singleton SNPs in the promoter are also associated with HDL-C, but not as strongly as those in bin 8. One of the most highly associated SNPs, rs183130 in bin 8, has a consistent effect across ethnic groups, as shown in Fig. 3 . The sequence conservation for this region among primates and the sequence surrounding two other functional SNPs is shown in Fig. 4 . In each region, the chimpanzee sequence is identical to the human sequence, whereas other species have up to several changes. For each of the putative transcription factor binding sites, some of these nonconserved positions would be predicted to affect protein binding.


Figure 3
View larger version (10K):
[in this window]
[in a new window]

 
Fig. 3. The plot depicts the ordered genotype of rs183130 for each ancestry on the horizontal axis [labeled as ancestry genotype, where the genotype is named A, B, or C for homozygotes in the most common allele (among pooled ancestries), heterozygotes, or homozygotes in the less common allele, respectively]. The vertical axis represents the values of log(HDL) after adjusting for nongenotypic covariates in the model. The boxes in the plot are bounded above and below by the 75th and 25th percentiles, respectively (the quartiles), and the error bars/whiskers extend an additional 1.5 times the interquartile range from either boundary of the box. Points outside this extension are plotted as outliers. The horizontal line inside the box denotes the median. The deviation from mean HDL-C for each of the rs183130 genotypes is shown for the three ethnicities examined.

 

Figure 4
View larger version (38K):
[in this window]
[in a new window]

 
Fig. 4. The sequence for 25 bp on either side of the three SNPs in transcription factor consensus sites is shown for humans and up to five other primates. Dusky titi is not shown for rs183130 because the sequence conservation was too low in that region. Each SNP is shown in reverse color, and putative transcription factor binding sites are shown above each sequence. The chimpanzee sequence was obtained from the University of California, Santa Cruz website (http://www.genome.ucsc.edu/cgi-bin/hgGateway?org = Chimp and db = panTro2), and all other sequences were obtained from the Lawrence Berkeley Laboratory website (http://pga.lbl.gov/cgi-bin/get_gene?id = 131).

 
At the other end of the gene, the associations with HDL-C are not as strong as observed in the promoter region. Several nonsynonymous SNPs in this 3' region have been shown to be functional, with effects on CETP activity or stability. There may be functional SNPs in addition to the nonsynonymous SNPs. rs289748, which is >7,000 bp from the end of transcription, is associated with HDL-C in individuals of both European (P < 0.001) and African (P < 0.02) ancestry. This SNP is not tightly linked with other SNPs tested in either group. The functional source of this association is unknown.

All associations were tested for gender effects as well. Two SNPs in individuals of European ancestry and one SNP in individuals of African-American ancestry yielded gender-genotype interactions with P values between 0.03 and 0.05, not significant after correction for multiple testing. Interactions between SNPs were also tested. Because of the number of SNPs and comparisons involved, this was done only for individuals of European ancestry in the ACCESS trial. Four pairs of SNPs show strong, nonadditive interactions, even after correction for multiple testing. The most significant interaction, between rs12920974 and rs4783961, has an uncorrected P value of 1.7 x 10–8, remaining significant even after correction for 2,616 tests. Both of these SNPs are in the promoter region. The HDL-C means for each combination of genotypes are shown in Fig. 5 . The other SNP pairs that remain significant after multiple testing correction all include rs7203286 (in the distal promoter region) in combination with rs820299 (intron 2), rs158477 (intron 9), or rs4783961 (promoter).


Figure 5
View larger version (12K):
[in this window]
[in a new window]

 
Fig. 5. Mean HDL-C values for each combination of genotypes for rs12920974 and rs4783961. Definitions for the box and whiskers are as described for Fig. 3.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Low HDL-C is known to be a major risk factor for cardiovascular disease (reviewed in Ref. 22). HDL-C is affected by a variety of environmental factors such as alcohol intake, estrogen administration (23), and exercise (24) as well as by a host of genetic factors. The effect of CETP variation on HDL-C is robust (2) but can be obscured when small samples or particular SNPs are examined. For example, of the SNPs we examined, there are five within the CETP gene and one in the proximal promoter with minor allele frequencies of >10% that are not associated with HDL-C (P > 0.05) among the >2,400 individuals of European ancestry. If one were to look exclusively at these SNPs, one would mistakenly conclude that CETP is not associated with HDL-C levels.

All common CETP variants have, at most, modest effects on either CETP mass or activity. If the most significantly associated SNP, rs183130, is examined, the mean HDL-C for the common versus rare homozygote varies only from 46.3 to 49.7 mg/dl among individuals of European ancestry, a difference of <10%. In contrast, both gender and alcohol consumption have larger effects on HDL-C, with European females in the ACCESS trial having significantly higher HDL-C (54 mg/dl) than European males (44 mg/dl). Similarly, European males who consume >10 drinks per week have higher HDL-C (50.7 mg/dl) than those who consume none (41.9 mg/dl). Only rare, nonfunctional CETP variants have a large impact on HDL-C, and these also have a protective effect with respect to disease in large, prospective studies (25). Thus, the impact of common CETP SNPs can be readily observed on a population basis, but these are of little value when examining small numbers of individuals.

It is possible to overinterpret the effect of CETP on HDL-C and attribute any large change in HDL-C to CETP variants of modest functional significance, even when known environmental effects such as exercise are present (26). Even the high HDL-C induced by extreme exercise, such as running marathons (27), is not always protective for cardiovascular disease. Risk of sudden cardiac death is generally attributable to a variety of defects in cardiac structural and channel proteins, with >50% of such deaths attributed to hypertrophic cardiomyopathy (28), independent of CETP genotype.

When data for only a limited number of SNPs were available, it was most convenient to describe the genomic structure of CETP and other genes as a series of large haploblocks. Initial work was consistent in showing that a large number of SNPs in the promoter and 5' region of the gene were in LD with each other, whereas another set of SNPs in the 3' region constituted another haploblock (6, 10, 12). This simplified view of the gene was useful as a rough approximation but is not accurate when comparing SNPs supposedly in the same haploblock but really having little linkage. With much more data now available, it is clear that the more detailed approach of using LD bins or tagging SNPs from nonlinear parts of the genome is necessary for an accurate view. Although there are many approaches and definitions that can be used for defining LD bins and tagging SNPs, similar results are obtained across populations with the same ancestry. The bins defined by Hinds et al. (13) with only 24 individuals were nearly identical to those defined by us with >2,000 individuals.

Horne et al. (21) identified a set of tagging SNPs for CETP that overlap the SNPs examined here. Despite using a very different approach, many SNPs represented by their tagging SNPs fall into separate bins, as defined here. The overlap is not perfect, but the same overall picture of the genomic structure is generated. However, comparison of our data with this set of tagging SNPs also highlights the need to characterize extensive regions of DNA sequence for a given gene. Their most distal SNP examined was only 631 bp from the start of transcription; thus, the LD bins with the most significant associations with HDL-C were not tagged.

The promoter SNP at –629 has been shown to affect Sp1 binding in vitro, reporter activity in cells, and CETP mass levels in humans (7, 10, 11). In contrast, the TaqIB SNP (rs708272) is among the most extensively studied, but there has been no indication that it exerts any functional effect. Although TaqIB is not in the same LD bin as the –629 SNP, it is in reasonably high LD (R2 = 0.72) in individuals of European ancestry. Furthermore, TaqIB is in the same LD bin as other SNPs that span a large region of the gene, including SNPs in intron 2 (R2 = 0.85), intron 5 (R2 = 0.81), and intron 7 (R2 = 0.91) in individuals of both European and Asian ancestry. Any of these SNPs could potentially have some functional effect, or it could arise from some other uncharacterized SNP in this 9-kb region. The TaqIB SNP is also associated with HDL-C in African Americans, but it is a singleton in terms of LD bins. Unlike individuals of European and Asian ancestry, in whom the most strongly linked SNPs are 3' to TaqIB, the SNPs most strongly linked to TaqIB among individuals of African ancestry are in the promoter with rs183130 (R2 = 0.68) and the variable number of tandem repeats (R2 = 0.64) being in tightest LD. This suggests that the functional SNP(s) assessed when TaqIB was examined in Africans is different from the functional SNP(s) assessed in Asians and Europeans.

Even though the promoter –629 SNP has been shown to have functional effects, it is clear that other SNPs in the promoter are also independently associated with HDL-C. The most highly associated SNPs are those in LD bin 8. However, among individuals of European ancestry, there are seven SNPs spread over 6,500 bp in bin 8, six of which we genotyped. Each could be examined on its own for functionality, but that is a challenging and not always fruitful endeavor. The availability of results from multiple ethnic groups makes it possible to decrease the number of potentially functional SNPs by taking advantage of the different LD structures. All six of the bin 8 SNPs we examined in Europeans are highly associated with HDL-C. Among Asians, all five of the SNPs in the homologous bin are also associated with HDL-C. Among Africans, the seven SNPs from bin 8 in Europeans are split into four separate bins, and only one of them is associated with HDL-C, rs183130 at position –4502. Although it is possible that there are distinct functional SNPs in the different ethnic groups, the consistent results (Fig. 3) with this SNP suggest that rs183130 may be functional. Additional experiments will be necessary to confirm this.

When 11 promoter SNPs that are associated with HDL-C among individuals of European ancestry are scanned for transcription factor binding sites (29), the only one that results in a change is rs183130. When a G is present on the bottom strand (GGGATTCTCC), an 8:10 match to the consensus site for nuclear factor {kappa}B (GGGGYNNCCY) described by Ghosh, May, and Kopp (30) and a 10:10 match to the consensus site (GGGRDTYYCC) described by Liu et al. (31) are found. The alteration from G to A creates a mismatch in both consensus sequences. The Liu et al. (31) consensus is particularly interesting in that it also appears to bind members of the Sp1 family of proteins that have been shown to be important in regulating CETP at the proximal promoter SNPs at –629 and –38 (7, 11).

Large meta-analyses of some CETP SNPs have been published, and these show a consistent but variable association with HDL-C, CETP mass/activity, and other phenotypes (2, 3). Results in studies with large numbers of individuals with European ancestry are in agreement with our results (supplementary Table I). Promoter SNPs rs12149545, rs4783961, and rs1800775 are significantly associated with HDL-C (6, 8, 32, 33). Similarly, results with large numbers of individuals of Asian ancestry found rs3764261 and the VNTR highly associated with HDL-C (9), as we found. All of these studies have focused on the promoter sequence within 3,300 bp of the transcriptional start site. Most functional characterization of the promoter has been restricted to the proximal 3,000 bp (34), with only limited analysis beyond that region (35).

In addition to the univariate analyses, nonadditive interactions between SNPs may also be important, as seen with other lipid-related genes (36). Our observation that the association of some promoter SNPs (rs4783961 at –971) is nonadditive with other SNPs mirrors previous findings (33) in which that SNP's function in vitro was dependent on other nearby SNPs. The fragment tested in vitro was only 1,707 bp long, so it did not include the SNP we found most significant, rs12920974 at –2,940, but the concordance of results clearly shows the complexity of genetic modulation of transcription.

The data provided here yield a means of comparing results across many studies by determining which SNPs are likely to yield similar information and which will not. The extensive genotyping also allows other investigators to compare their populations with an independent set to test whether differences in allele frequencies found in a case-control study might be attributable to problems with the control population rather than a true association. As stated above, the SNPs tested here are all in Hardy-Weinberg equilibrium, unlike some control populations described elsewhere. By examining the CETP gene in detail and across populations, we are able to predict which SNPs are likely to be functional. This approach is generalizable to other genes in which robust associations are found.

Manuscript received August 18, 2006 and in revised form November 10, 2006.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
  1. Brown, M. L., A. Inazu, C. B. Hesler, L. B. Agellon, C. Mann, M. E. Whitlock, Y. L. Marcel, R. W. Milne, J. Koizumi, H. Mabuchi, et al. 1989. Molecular basis of lipid transfer deficiency in a family with increased high-density lipoproteins. Nature. 342: 448–451.[CrossRef][Medline]

  2. Boekholdt, S. M., and J. F. Thompson. 2003. Natural genetic variation as a tool in understanding the role of CETP in lipid levels and disease. J. Lipid Res. 44: 1080–1093.[Abstract/Free Full Text]

  3. Boekholdt, S. M., J-A. Kuivenhoven, G. K. Hovingh, J. W. Jukema, J. J. P. Kastelein, and A. van Tol. 2004. CETP gene variation: relation to lipid parameters and cardiovascular risk. Curr. Opin. Lipidol. 15: 393–398.[CrossRef][Medline]

  4. Boekholdt, S. M., F. M. Sacks, J. W. Jukema, J. Shepherd, D. J. Freeman, A. D. McMahon, F. Cambien, V. Nicaud, G. J. de Grooth, P. J. Talmud, et al. 2005. Cholesteryl ester transfer protein TaqIB variant, high-density lipoprotein cholesterol levels, cardiovascular risk, and efficacy of pravastatin treatment—individual patient meta-analysis of 13,677 subjects. Circulation. 111: 278–287.[Abstract/Free Full Text]

  5. Lloyd, D. B., M. E. Lira, L. S. Wood, L. K. Durham, T. B. Freeman, G. Preston, X. Qiu, E. Sugarman, P. Bonnette, A. Lanzetti, et al. 2005. Cholesteryl ester transfer protein variants have differential stability but uniform inhibition by torcetrapib. J. Biol. Chem. 280: 14918–14922.[Abstract/Free Full Text]

  6. Corbex, M., O. Poirier, F. Fumeron, D. Betouille, A. Evans, J. B. Ruidavets, D. Arveiler, G. Luc, L. Tiret, and F. Cambien. 2000. Extensive association analysis between the CETP gene and coronary heart disease phenotypes reveals several putative functional polymorphisms and gene-environment interaction. Genet. Epidemiol. 19: 64–80.[CrossRef][Medline]

  7. Dachet, C., O. Poirier, F. Cambien, M. J. Chapman, and M. Rouis. 2000. New functional promoter polymorphism, CETP/-629, in cholesteryl ester transfer protein (CETP) gene related to CETP mass and high density lipoprotein cholesterol levels: role of Sp1/Sp3 in transcriptional regulation. Arterioscler. Thromb. Vasc. Biol. 20: 507–515.[Abstract/Free Full Text]

  8. Le Goff, W., M. Guerin, V. Nicaud, C. Dachet, G. Luc, D. Arveiler, J-B. Ruidavets, A. Evans, F. Kee, C. Morrison, et al. 2002. A novel cholesteryl ester transfer protein promoter polymorphism (–971G/A) associated with plasma high-density lipoprotein cholesterol levels: interaction with the TaqIB and –629C/A polymorphisms. Atherosclerosis. 161: 269–279.[CrossRef][Medline]

  9. Lu, H., A. Inazu, Y. Moriyama, T. Higashikata, M. Kawashiri, W. Yu, Z. Huang, T. Okamuri, and H. Mabuchi. 2003. Haplotype analyses of cholesteryl ester transfer protein gene promoter: a clue to an unsolved mystery of TaqIB polymorphism. J. Mol. Med. 81: 246–255.[Medline]

  10. Thompson, J. F., L. K. Durham, M. E. Lira, C. Shear, and P. M. Milos. 2005. CETP polymorphisms associated with HDL cholesterol may differ from those associated with cardiovascular disease. Atherosclerosis. 181: 45–53.[CrossRef][Medline]

  11. Thompson, J. F., D. B. Lloyd, M. E. Lira, and P. M. Milos. 2004. CETP promoter SNPs in Sp1 binding sites affect transcription and are associated with HDL cholesterol. Clin. Genet. 66: 223–228.[CrossRef][Medline]

  12. Thompson, J. F., M. E. Lira, L. K. Durham, R. W. Clark, M. J. Bamberger, and P. M. Milos. 2003. Polymorphisms in the CETP gene and association with CETP mass and HDL levels. Atherosclerosis. 167: 195–204.[CrossRef][Medline]

  13. Hinds, D. A., L. L. Stuve, G. B. Nilsen, E. Halperin, E. Eskin, D. G. Ballinger, K. A. Frazer, and D. R. Cox. 2005. Whole-genome patterns of common DNA variation in three human populations. Science. 307: 1072–1079.[Abstract/Free Full Text]

  14. Lira, M. E., D. B. Lloyd, S. Hallowell, P. M. Milos, and J. F. Thompson. 2004. Highly polymorphic repeat region in the CETP promoter induces unusual DNA structure. Biochim. Biophys. Acta. 1684: 38–45.[Medline]

  15. Lloyd, D. B., J. M. Reynolds, M. T. Cronan, S. P. Williams, M. E. Lira, L. S. Wood, D. R. Knight, and J. F. Thompson. 2005. Novel variants in human and monkey CETP. Biochim. Biophys. Acta. 1737: 69–75.[Medline]

  16. Ballantyne, C. M., T. C. Andrews, J. A. Hsia, J. H. Kramer, and C. Shear. 2001. Correlation of non-high-density lipoprotein cholesterol with apolipoprotein B: effect of 5 hydroxymethylglutaryl coenzyme A reductase inhibitors on non-high-density lipoprotein cholesterol levels. Am. J. Cardiol. 88: 265–269.[CrossRef][Medline]

  17. Cargill, M., D. Altshuler, J. Ireland, P. Sklar, K. Ardlie, N. Patil, N. Shaw, C. R. Lane, E. P. Lim, N. Kalyanaraman, et al. 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22: 231–238.[CrossRef][Medline]

  18. Morabia, A., E. Cayanis, M. C. Costanza, B. M. Ross, M. S. Flaherty, G. B. Alvin, K. Das, and T. C. Gilliam. 2003. Association of extreme blood lipid profile phenotypic variation with 11 reverse cholesterol transport genes and 10 non-genetic cardiovascular disease risk factors. Hum. Mol. Genet. 12: 2733–2743.[Abstract/Free Full Text]

  19. Chasman, D. I., D. Posada, L. Subrahmanyan, N. R. Cook, V. P. Stanton, and P. M. Ridker. 2004. Pharmacogenetic study of statin therapy and cholesterol reduction. J. Am. Med. Assoc. 291: 2821–2827.[Abstract/Free Full Text]

  20. Stanssons, P., M. Zabeau, G. Meersseman, G. Remes, Y. Gansemans, N. Storm, R. Hartmer, C. Honisch, C. P. Rodi, S. Bocker, et al. 2004. High-throughput MALDI-TOF discovery of genomic sequence polymorphisms. Genome Res. 14: 126–133.[Abstract/Free Full Text]

  21. Horne, B. D., J. F. Carlquist, L. A. Cannon-Albright, J. B. Muhlestein, J. T. McKinney, M. J. Kolek, J. L. Clarke, J. L. Anderson, and N. J. Camp. 2006. High-resolution characterization of linkage disequilibrium structure and selection of tagging single nucleotide polymorphisms: application to the cholesteryl ester transfer protein gene. Ann. Hum. Genet. 70: 524–534.[CrossRef][Medline]

  22. Linsel-Nitschke, P., and A. R. Tall. 2005. HDL as a target in the treatment of atherosclerotic cardiovascular disease. Nat. Rev. Drug Discov. 4: 193–205.[CrossRef][Medline]

  23. Lamon-Fava, S. 2002. High-density lipoproteins: effects of alcohol, estrogen, and phytoestrogens. Nutr. Rev. 60: 1–7.[Medline]

  24. Kelley, G. A., K. S. Kelley, and Z. V. Tran. 2004. Aerobic exercise and lipids and lipoproteins in women: a meta-analysis of randomized controlled trials. J. Womens Health. 13: 1148–1164.[CrossRef]

  25. Curb, J. D., R. D. Abbott, B. L. Rodriguez, K. Masaki, R. Chen, D. S. Sharp, and A. R. Tall. 2004. A prospective study of HDL-C and cholesteryl ester transfer protein gene mutations and the risk of coronary heart disease in the elderly. J. Lipid Res. 45: 948–953.[Abstract/Free Full Text]

  26. Sirtori, C. R., L. Calabresi, D. Baldassarre, G. Franceschini, A. B. Cefalu, and M. Averna. 2006. CETP levels rather than polymorphisms as markers of coronary risk: healthy athlete with high HDL-C and coronary disease—effectiveness of probucol. Atherosclerosis. 186: 225–227.[CrossRef][Medline]

  27. Hartung, G. H., J. P. Foreyt, R. E. Mitchell, I. Vlasek, and A. M. Gotto, Jr. 1980. Relation of diet to high-density-lipoprotein cholesterol in middle-aged marathon runners, joggers, and inactive men. N. Engl. J. Med. 302: 357–361.[Abstract]

  28. Firoozi, S., S. Sharma, and W. J. McKenna. 2003. Risk of competitive sport in young athletes with heart disease. Heart. 89: 710–714.[Abstract/Free Full Text]

  29. Prestridge, D. S., and G. Stormo. 1993. Signal Scan 3.0. New database and program features. Comput. Appl. Biosci. 9: 113–115.[Abstract/Free Full Text]

  30. Ghosh, S., M. J. May, and E. B. Kopp. 1998. NF-{kappa}B and Rel proteins: evolutionarily conserved mediators of immune responses. Annu. Rev. Immunol. 16: 225–260.[CrossRef][Medline]

  31. Liu, A., P. W. Hoffman, W. Lu, and G. Bai. 2004. NF-kB site interacts with Sp factors and up-regulates the NR1 promoter during neuronal differentiation. J. Biol. Chem. 279: 17449–17458.[Abstract/Free Full Text]

  32. Klerkx, A. H. E. M., M. W. T. Tanck, J. J. P. Kastelein, H. O. F. Molhuizen, J. W. Jukema, A. H. Zwinderman, and J. A. Kuivenhoven. 2003. Haplotype analysis of the CETP gene: not Taq1B, but the closely linked –629C->A polymorphism and a novel promoter variant are independently associated with CETP concentration. Hum. Mol. Genet. 12: 111–123.[Abstract/Free Full Text]

  33. Frisdal, E., A. H. E. M. Klerkx, W. Le Goff, M. W. T. Tanck, J-P. Lagarde, J. W. Jukema, J. J. P. Kastelein, M. J. Chapman, and M. Guerin. 2005. Functional interaction between –629C/A, –971G/A, and –1337C/T polymorphisms in the CETP gene is a major determinant of promoter activity and plasma CETP concentration in the REGRESS study. Hum. Mol. Genet. 14: 2607–2618.[Abstract/Free Full Text]

  34. LeGoff, W., M. Guerin, L. Petit, M. J. Chapman, and J. Thillet. 2003. Regulation of human CETP expression: role of Sp1 and Sp3 transcription factors at promoter sites –690, –629, and –37. J. Lipid Res. 44: 1322–1331.[Abstract/Free Full Text]

  35. Williams, S., L. Hayes, L. Elsenboss, A. Williams, C. Andre, R. Abramson, J. F. Thompson, and P. M. Milos. 1997. Sequencing of the cholesteryl ester transfer protein 5' regulatory region using artificial transposons. Gene. 197: 101–107.[CrossRef][Medline]

  36. Hamon, S. C., J. H. Stengard, A. G. Clark, V. Salomea, E. Boerwinkle, and C. F. Sing. 2004. Evidence for non-additive influence of single nucleotide polymorphisms within the apolipoprotein E gene. Ann. Hum. Genet. 68: 521–535.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Circ Cardiovasc GenetHome page
J. F. Thompson, C. L. Hyde, L. S. Wood, S. A. Paciga, D. A. Hinds, D. R. Cox, G. K. Hovingh, and J. J.P. Kastelein
Comprehensive Whole-Genome and Candidate Gene Analysis for Response to Statin Therapy in the Treating to New Targets (TNT) Cohort
Circ Cardiovasc Genet, April 1, 2009; 2(2): 173 - 181.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
R. Karchin
Next generation tools for the annotation of human SNPs
Brief Bioinform, January 1, 2009; 10(1): 35 - 52.
[Abstract] [Full Text] [PDF]


Home page
Circ Cardiovasc GenetHome page
I. M. Heid, E. Boes, M. Muller, B. Kollerits, C. Lamina, S. Coassin, C. Gieger, A. Doring, N. Klopp, R. Frikke-Schmidt, et al.
Genome-Wide Association Analysis of High-Density Lipoprotein Cholesterol in the Population-Based KORA Study Sheds New Light on Intergenic Regions
Circ Cardiovasc Genet, October 1, 2008; 1(1): 10 - 20.
[Abstract] [Full Text] [PDF]


Home page
JAMAHome page
A. Thompson, E. Di Angelantonio, N. Sarwar, S. Erqou, D. Saleheen, R. P. F. Dullaart, B. Keavney, Z. Ye, and J. Danesh
Association of Cholesteryl Ester Transfer Protein Genotypes With CETP Mass and Activity, Lipid Levels, and Coronary Risk
JAMA, June 18, 2008; 299(23): 2777 - 2788.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
M600372-JLR200v1
48/2/434    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Thompson, J. F.
Right arrow Articles by Hyde, C. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Thompson, J. F.
Right arrow Articles by Hyde, C. L.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 All ASBMB Journals   Journal of Biological Chemistry 
 Molecular and Cellular Proteomics   ASBMB Today 
Advertisement
spacer
Advertisement
Advertisement