Novel gene-by-environment interactions: APOB and NPC1L1 variants affect the relationship between dietary and total plasma cholesterol.

Cardiovascular disease (CVD) is the leading cause of death in developed countries. Plasma cholesterol level is a key risk factor in CVD pathogenesis. Genetic and dietary variation both influence plasma cholesterol; however, little is known about dietary interactions with genetic variants influencing the absorption and transport of dietary cholesterol. We sought to determine whether gut expressed variants predicting plasma cholesterol differentially affected the relationship between dietary and plasma cholesterol levels in 1,128 subjects (772/356 in the discovery/replication cohorts, respectively). Four single nucleotide polymorphisms (SNPs) within three genes (APOB, CETP, and NPC1L1) were significantly associated with plasma cholesterol in the discovery cohort. These were subsequently evaluated for gene-by-environment (GxE) interactions with dietary cholesterol for the prediction of plasma cholesterol, with significant findings tested for replication. Novel GxE interactions were identified and replicated for two variants: rs1042034, an APOB Ser4338Asn missense SNP and rs2072183 (in males only), a synonymous NPC1L1 SNP in linkage disequilibrium with SNPs 5′ of NPC1L1. This study identifies the presence of novel GxE and gender interactions implying that differential gut absorption is the basis for the variant associations with plasma cholesterol. These GxE interactions may account for part of the “missing heritability” not accounted for by genetic associations.

their surveys had у 70 blank items of a total of 131 questions. This food frequency survey has been validated against two 1 week diet records taken approximately six months apart ( 16 ). As well, the inferred intake of dietary fats has been validated against total lipid levels ( 17,18 ).

Total lipid levels
Lipid measurements were performed on fasting plasma. Standard enzymatic methods were utilized to determine the levels of total cholesterol ( 19,20 ). We estimated "pre-therapy" total cholesterol levels in statin users. We based these values on 41 individuals in the CLEAR cohort that had repeat lipid measures before and after initiation of statin pharmacotherapy, and found that total cholesterol decreased by 25.1% ( 10 ). Using these previously calculated values, we imputed pretreatment total cholesterol levels in statin users (n = 425) within the CLEAR cohort by increasing measured cholesterol levels by 25.1%. A comparison of the pre-and poststatin adjustment total cholesterol levels is presented in Table 1 . These values from a subset of the CLEAR cohort are consistent with those found in a recent meta-analysis of 134,537 subjects on statins ( 21 ).

Genotyping
SNPs for the 772 subjects comprising the discovery set were genotyped using the Illumina HumanCVD BeadChip ( 14 ). Duplicate genotyping for 34 individuals showed 99.7% consistency in calls. An additional 356 subjects comprising the replication set were genotyped by TaqMan for the SNPs in Table 3 , with duplicate genotyping in 12 individuals showing 99.9% consistency in calls. SNPs were fi ltered with cutoffs of: minor allele frequency <1%, Hardy-Weinberg equilibrium rejected at P < 10 Ϫ 4 , or callrate <97%.

SNP selection
Due to the power considerations related to multiple testing, we limited our testing for gene-by-environment (GxE) interactions to SNPs: a ) identifi ed as predictive of total cholesterol by Teslovich et al. (7) ; b ) also predictive for total cholesterol in our cohort; c ) expressed in the stomach, small intestines, or pancreas; and d ) involved in cholesterol handling (this is tracked by SNP in supplementary Table 1 and supplementary Fig. IV). Teslovich et al. ( 7 ) identifi ed SNPs at 53 genetic loci that were signifi cant at the genome-wide level for association with total cholesterol levels (these SNPs are hereafter referred to as "lead SNPs"). We identifi ed relevant SNPs from our Illumina HumanCVD BeadChip genotyped data using SNAP Proxy Search to parse for SNPs that were in strong linkage disequilibrium (LD) ( r 2 > 0.80) ( 22 ) in the 1000 Genomes database with predictive SNPs reported by Teslovich et al. We identifi ed such SNPs in our data for only 19 of their 53 identifi ed loci. Due to the inclusion of multiple APOB SNPs and its physiologic relevance to cholesterol absorption and total cholesterol, we followed up on one APOB SNP that was primarily reported to be associated with variation in high-density lipoprotein (HDL) and triglycerides. While the effect of this SNP is not addressed in their paper, Teslovich and Kathiresan (personal communication) provided us with the P value for the total cholesterol effect, which was 1.75 × 10 Ϫ 19 . This added APOB SNP (rs1042034) was not in LD with the other identifi ed APOB SNP ( r 2 = 0.094). We next utilized Tissue-Specifi c Gene Expression and Regulation (TIGER) ( 23 ) to examine the tissue expression of these 22 loci to exclude genes not expressed in the stomach, small intestine, or pancreas, with separate confi rmation of nonhepatic gastrointestinal gene expression through the total European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database ( 24 ). We excluded genes for transcription cholesterol, thereby infl uencing the potential benefi t of lifestyle changes in the prevention of CVD. Utilizing the Carotid Lesion Epidemiology and Risk (CLEAR) CAAD case-control cohort, we investigated whether there was any evidence of single nucleotide polymorphisim (SNP)-dietary cholesterol interactions on prediction of total cholesterol levels, to better understand the potential infl uence of the genetic variation on the handling of dietary cholesterol and the potential downstream pathogenesis of CVD.

Ethics statement
Institutional Review Boards at the University of Washington, Virginia Mason Medical Center, and Veterans Affairs Puget Sound Health Care approved the study. Written informed consent was obtained from all participants.

Sample
The study population for these analyses consisted of 1,128 subjects (772 subjects in the discovery subset and 356 subjects in the replication subset) from the previously described CLEAR study (8)(9)(10)(11)(12)(13), who had both diet and genetic data. The discovery set had Illumina HumanCVD BeadChip ( 14 ) genotypes, while the replication cohort was genotyped later for SNPs of interest, as outlined in the genotyping methods section. This cohort includes 358 CAAD cases, 639 controls, and 131 subjects of other phenotypes, including moderate carotid artery obstruction (15-49% by ultrasound), as well as coronary artery and peripheral artery disease. Only Caucasian subjects were analyzed due to under-representation of minority samples in this primarily Seattle Veterans-based cohort. Ancestry was confi rmed using STRUCTURE with three ancestral groups ( 15 ). Medication use, including statins, oral hypoglycemics, and insulin injection were ascertained from pharmacy and medical records, as well as subject report. Diabetes status was determined by a ) if a subject was on an oral hypoglycemic or insulin, or b ) if their hemoglobin A1C was у 6.5%. CLEAR cohort exclusion criteria included familial hypercholesterolemia, total fasting cholesterol >400 mg/dl, hypocoagulable state and/ or the use of Coumadin, post-organ transplant, or the inability to consent. Descriptive statistics for the cohort are presented in Table 1 .

Survey methods
Subjects were asked to complete the standardized Harvard food frequency questionnaire developed by The Health Professionals Follow-Up Study (https://regepi.bwh.harvard.edu/ health/nutrition.html) at enrollment. The survey asked about: a ) the average frequency of intake over the previous year of specifi ed portions of 131 foods; and b ) the use of vitamins and mineral supplements, including the dose and duration of use. Questions regarding brand of multivitamins and cereal used were asked to clarify the quantities of specifi c vitamin supplementation. Surveys were returned to Harvard School of Public Health and Brigham and Women's Hospital where they underwent quantitative analyses to return average daily intakes of individual micronutrients (e.g., total fat, total cholesterol, etc.) and vitamins calculated from food frequency data. Natural log-transformed dietary cholesterol intake data from the food frequency questionnaire was used as the variable for interaction with SNP genotypes in this study. Subjects were excluded from analyses if: a ) their caloric intake was not between 800 and 4,200 kcal/day; or b ) sample included 358 cases with >50% carotid stenosis, 57 subjects with 15-49% carotid stenosis, and 639 controls with <15% stenosis bilaterally. Within the discovery subset, there were a signifi cantly higher proportion of cases (39.9% vs. 14.0%) compared with the replication subset ( P < 0.001). The rates of diabetes and statin use were 12.4% and 37.7% overall. The rates of statin use were signifi cantly higher in the discovery (42.7% vs. 25.0%) compared with the replication subset ( P < 0.001). Dietary and total cholesterol were highly correlated in the pooled cohort, with a Pearson pair-wise correlation coeffi cient, r = 0.80 and P < 0.001. The outcome of residual total cholesterol differed by statin use (supplementary Fig. I), but not by CAAD case-control status or sex (supplementary Figs. II and III, respectively).
The SNP selection process is described in supplementary Table I 7 ) and then selected for utilizing the criteria outlined in the "SNP selection" section of the Methods] spanning the six candidate genes studied are provided in Table 2 . The two lead SNPs within APOB are not in strong LD with each other ( r 2 = 0.094). APOB , the apolipoprotein B structural locus, and NPC1L1 each had multiple proxy SNPs genotyped by the Illumina HumanCVD BeadChip that were in strong LD with the lead SNPs identifi ed by Teslovich et al. [(7) and unpublished data] within the discovery subset of CLEAR (with APOB SNP rs1042034: rs676210 and rs2678379 r 2 = 0.996, rs673548 r 2 = 0.992; with NPC1L1 SNP rs2072183: rs17725246 r 2 = 0.848).
To investigate the association between the seven candidate SNPs and total cholesterol, we utilized linear regression adjusting for age, sex, and diabetes status. Four lead SNPs spanning three genes ( APOB , CETP , and NPC1L1 ) were signifi cant at the P < 0.05 level for association with adjusted total cholesterol levels ( Table 2 ). Because our hypothesis test relates not to the marginal SNP effect, but to the GxE interaction and because these SNPs were selected based on established associations with total cholesterol, we do not apply a Bonferroni correction to the inclusion of these SNPs for interaction testing. Only these SNPs with marginal effects on total cholesterol were carried forward to GxE interaction testing due to power concerns.
We used linear regression to investigate the possibility of GxE interactions adjusting the outcome of total cholesterol for age, sex, and diabetes status. Another linear regression then modeled the interaction between the genotypes of the four SNPs and dietary cholesterol intake on the outcome of residual total cholesterol levels (see Statistical analyses ). Results of the SNP interactions with dietary cholesterol levels on the outcome of total cholesterol in the discovery subset of CLEAR are presented in Table 2 . Of the four lead SNPs tested, only an APOB missense SNP, rs1042034 ( P = 0.0174), and an NPC1L1 synonymous SNP, rs2072183 ( P = 0.0125) had GxE interactions with dietary cholesterol that were signifi cant at the P = 0.05 level for prediction of adjusted total cholesterol. factors. In sum, seven SNPs from six genes [ ABCG8 , APOB , APOE , CETP , Niemann-Pick C1-like 1 ( NPC1L1 ), and PCSK9 ] were considered ( Table 2 ). We then tested these seven SNPs for their main effects on total cholesterol in the discovery cohort. Four SNPs (two from APOB , one each from CETP and NPC1L1 ) were significant at the P = 0.05 level and were carried forward for GxE interaction testing. For SNPs for which a GxE interaction was detected with a P р 0.05, we performed sensitivity analyses by gender, CAAD case-control status, and statin use. We then considered other SNPs in LD with these SNPs to determine if they might be better predictors than the original SNP considered.

Statistical analyses
Statistical analyses were conducted using the R statistical language (http://www.r-project.org/). Genotypes were coded in an additive model. Extreme outliers for both dietary and total cholesterol were Winsorized to values three standard deviations from the mean ( 25,26 ). The positively skewed distribution for dietary cholesterol was adjusted by using a natural log transformation. Hereafter, "dietary cholesterol" will refer to ln(dietary cholesterol).
In order to adjust for the effects of nonstatin confounding covariates on total cholesterol, we fi rst used linear regression to adjust total cholesterol levels for age, sex, and diabetes status, thereby obtaining a residual value of total cholesterol. Residual values of adjusted total cholesterol from this analysis were then utilized as the outcome variable to test for interactions between the four SNPs tested and dietary cholesterol intake through further linear regression. SNPs were considered under an additive genetic model, coded as 0, 1, and 2. Multiplicative interactions were modeled. Due to power considerations, no Bonferroni correction was utilized. Rather, we accepted a nominal P р 0.05 and relied on replication to avoid false positives. Plotting of genotypede pendent relationships between dietary and adjusted total cholesterol levels was performed using the R statistical language.
The equations in the Fig. 1 legend represent the genotypespecifi c regression equations for predicted residual total cholesterol level for a given value of dietary cholesterol. These were obtained via simplifi cation of our interaction equation [Predicted residual total cholesterol = ␣ + ␤ 1 (SNP) + ␤ 2 (dietary cholesterol) + ␤ 3 (SNP × dietary cholesterol)] such that genotype classes 0, 1, and 2 have the intercept and regression coef- respectively. Total cholesterol residuals were relocated to the mean value of total cholesterol by the addition of 207.97 to aid in interpretation of the interaction plots.

RESULTS
Demographic, clinical, dietary, and total cholesterol measures for the cohort are presented in Table 1 . The pooled sample was composed of 1,128 subjects with a mean age of 66.1 years, of which 67.4% were males. Subjects with existing Illumina CVD chip genotype data defi ned the discovery subset; the replication subset consisted of the remaining subjects with food frequency questionnaire data, which we then genotyped for SNPs that were signifi cant in GxE discovery analyses. The independent discovery (n = 772 subjects) and replication (n = 356 subjects) subsets differed signifi cantly, as the replication subset was composed of 65.7% female subjects whereas the discovery subset was only 17.4% female ( P < 0.001). The overall the marginal effect of the SNP on total cholesterol by gender and found that NPC1L1 SNP rs17725246 marginally predicted total cholesterol in males only (SNP effect ␤ coeffi cient = 5.2, P = 0.054) and not in females ( ␤ = Ϫ 0.243, P = 0.941) in this cohort. We formally tested for a SNPby-sex interaction effect in the prediction of total cholesterol and found it not signifi cant ( ␤ = Ϫ 5.258, P = 0.24); however, we identifi ed a report of this gender interaction in the literature ( 27 ). Although underpowered, we modeled a SNP-by-dietary cholesterol-by-sex interaction in the pooled analyses and found that the rs17725246-sex-dietary cholesterol interaction term was suggestive given limited power (SNP-by-dietary cholesterol-by-sex interaction beta coeffi cient = Ϫ 25.318, P = 0.096). For rs2072183, which is in high LD with rs17725246 ( r 2 = 0.72 in the pooled CLEAR cohort), SNP-by-dietary cholesterol-by-sex interaction test These two SNPs and their proxy SNPs in high LD were carried forward for sensitivity and replication testing.
Due to the expected limited power to detect higher order interactions with CAAD case-control status, statins, or sex, we instead performed sensitivity analyses stratified by CAAD case-control status, statin use, and sex. These analyses did not show differences in direction or general magnitude of effects ( ␤ coeffi cients) for the SNP genotypes, dietary cholesterol, or the dietary cholesterol-by-SNP interaction for the APOB SNPs rs1042034. However, in the discovery subset of CLEAR, the magnitude of NPC1L1 SNP effects in males and females were strikingly different and the GxE interaction was not signifi cant for females alone. For rs17725246, the interaction ␤ = 26.06 in males and 0.742 in females and the GxE interaction test P = 0.982 in females versus P = 0.0009 for males ( Table 3 ). We reanalyzed g These two APOB SNPs are not in strong LD with each other ( r 2 = 0.094). P = 0.17. Given the indication and prior report of sex interactions and the LD between NPC1L1 SNPs, we decided a priori to perform a sex-stratifi ed replication analyses for these NPC1L1 SNPs .
Results of the tests of the two lead SNPs' (rs1042034 and rs2072183, and their proxy SNPs in high LD) interactions with dietary cholesterol levels on adjusted total cholesterol within the discovery, replication, and pooled analyses are presented in Table 3 . The APOB rs1042034 SNP GxE effect did replicate in the additional independent cohort ( P = 0.04) as did the GxE interaction effect of NPC1L1 SNP rs17725246 in males ( P = 0.04), while NPC1L1 SNP rs2072183 had a marginally replicated effect ( P = 0.08).
Both of the two lead SNPs identifi ed also had other proxy SNPs in strong LD that were genotyped in our cohort. To investigate whether any of these proxies were superior at predicting interactions with dietary cholesterol on the outcome of adjusted total cholesterol levels, we replaced rs1042034 ( APOB missense SNP) with rs673548, rs2678379, or rs676210 ( Table 3 ). As well, we replaced rs2072183 ( NPC1L1 synonymous SNP) with rs17725246 in the model for GxE interactions. In pooled analyses considering both the discovery and replication sets, APOB SNP rs2678379 was only modestly more signifi cant than rs1042034 (GxE P = 2.92 × 10 Ϫ 5 vs. 9.15 × 10 Ϫ 5 ); of the two NPC1L1 SNPs rs17725246 was also modestly more significant than rs2072183 (GxE P = 2.20 × 10 Ϫ 3 vs. 7.70 × 10 Ϫ 3 ). We performed an ANOVA to estimate the percentage of total cholesterol attributed to each GxE interaction ( Table 3 ). The top APOB SNP rs2678378 accounted for approximately 0.654% of total cholesterol variance, while the top NPC1L1 SNP rs17725246 was attributed 0.507% of the variance in total cholesterol levels in a male-only subset of the cohort.
To consider the potential confounding effects of other cardiovascular covariates, we performed additional analyses where body mass index (BMI), total caloric intake, and CAAD case-control status were added individually and together to the base model (age, sex, and diabetes status) for analyses of the SNP and SNP-by-dietary cholesterol effects for APOB SNP rs2678379 and NPC1L1 SNP rs17725246 on the outcome of total plasma cholesterol. Even when considering all three additional covariates in conjunction with the base model, both SNP-by-dietary cholesterol interactions remained test-wise signifi cant (all covariates model NPC1L1 rs17725246, P = 2.44 × 10 Ϫ 3 ; APOB rs2678379, P = 2.44 × 10 Ϫ 4 ).
The effects of the interaction of dietary cholesterol intake and the genotype of APOB SNP rs2678379 on residual total cholesterol, in the pooled data, are presented in Fig. 1A . The common allele was associated with a dose-dependent steeper increase in total cholesterol levels with increased dietary cholesterol intake. For NPC1L1 SNP rs17725246, the common alleles were associated with a less steep increase in total cholesterol per allele substitution, considering only the males in the pooled CLEAR analyses (n = 760) ( Fig. 1B ). When not considering the interaction term and looking solely at the outcome of residual total cholesterol for the genotypes of these two SNPs, we observe a similar Interactions with NPC1L1 SNPs rs17725246 depends on your genotype. A recent study reported that overweight adolescents reported eating fewer calories than slimmer peers ( 31 ). While the authors hypothesized that this is due to differences in energy expenditures, and misreporting is also possible, a third possibility is that some of the differences are due to genetic factors in food absorption. We are not aware that in vivo genotype-dependent differences in the relationship between dietary and total cholesterol have previously been reported. ApoB is one of the primary lipoproteins of chylomicrons and all liver-produced lipoproteins, including very lowdensity lipoprotein (VLDL) and LDL. There are two forms of apoB: apoB48, which is synthesized exclusively in the small intestine, and apoB100, which is primarily synthesized in the liver ( 32 ) and is a ligand for the LDL receptor; defi ciencies in apoB100 or its receptor are etiological in familial hypercholesterolemia ( 33 ). ApoB48 is a component of chylomicrons produced in the small intestines to transport lipid to the liver, and is involved in the absorption of lipids ( 34 ). ApoB48 is generated through a single base substitution of a cytosine to uracil at nucleotide 6538 that results in a premature stop codon at protein position 2153 ( 32 ). However, VLDLs associated with apoB100 are also found in the small intestine ( 35 ). Apobec-1 knockout mice containing apoB100 and not apoB48 still form chylomicrons and can absorb dietary fat without noticeable defi ciencies versus wild-type mice on normal diets. However, in low-fat conditions, the lack of apoB48 results in a decrease in the rate of chylomicron assembly ( 36 ). Therefore, one potential pathway whereby the SNPs identifi ed pattern: for APOB SNP rs2678379 (supplementary Fig. V), the common allele is associated with a slight increase in total cholesterol, while the converse is true for NPC1L1 SNP rs17725246 (supplementary Fig. VI).

DISCUSSION
Total cholesterol level is perhaps the best established and most modifi able risk factor for CVD ( 2,3 ). A recent, large GWAS identifi ed over 95 loci for blood lipid phenotypes ( 4,7 ). However, together the variants identifi ed for total cholesterol accounted for only 12.4% of its heritability ( 7 ), underscoring concerns regarding the missing heritability of complex traits such as this. In addition to unidentifi ed rare variants ( 28 ), genetic interactions, both gene-by-gene between loci and GxE have been proposed to explain some of this missing heritability ( 28 ). Recent examples of genes interacting with dietary intake have been identifi ed for obesity traits ( 29,30 ). Accordingly, in this study we have demonstrated a novel and replicated fi nding that common variants at APOB in males and females (rs2678379, P = 2.92 × 10 Ϫ 5 ) and NPC1L1 in males only (rs17725246, P = 2.20 × 10 Ϫ 3 ) demonstrate GxE interactions with dietary cholesterol intake in the prediction of total cholesterol. NPC1L1 appears to have an interaction with gender as well, with dietary cholesterol predicting higher total cholesterol, per genotype, in males. These replicated GxE interactions may be related to allelic differences in the absorption or processing of dietary cholesterol. In short, the extent to which "you are what you eat" tested variants could be the functional SNP. Rs1042034 is also in LD with seven intergenic SNPs in the 1000 Genomes data, which are less likely to be the causative SNP. The NPC1L1 SNP rs2072183 is synonymous and less likely to be functional than the 5 ′ SNP in LD, rs17725246, which was modestly more signifi cant for GxE interaction in both the discovery and replication subsets. However, these are also in LD with an untyped 5 ′ SNP, rs2074547. Recent reports have identifi ed a rare NPC1L1 missense mutation which causes a differential response to ezetimibe pharmacotherapy, NPC1L1 V55M (rs119457968, located 653 bases from rs2072183, but with no LD information available in the 1000 Genomes database) ( 43 ). These and other rare coding variants nearby the SNPs identifi ed in both APOB and NPC1L1 may underlie the fi ndings of GxE interactions in this study.
Another explanation for our fi ndings may come from our studied SNPs being in close proximity to sites that alter gene expression. For example, recent studies into expression quantitative trait loci (eQTL) that infl uence hepatocyte gene expression found an enrichment of reproducible eQTL SNPs at gene starts and ends ( 44 ), including APOB SNP rs3923672, which is located in the last exon of APOB . Despite the proximity of rs3923672 to the studied APOB SNPs, it is not reported to be in strong LD ( r 2 > 0.8) in the 1000 Genomes data, and we lack the genotype of this SNP to assess its relationship to the studied SNPs in our cohort. For NPC1L1 , recent publications of the Encyclopedia of DNA Elements (ENCODE) Consortium identifi ed a region extremely close to rs17725246 as being hypersensitive to DNase I enzymatic activity and also being involved in chromatin structure and histone modifications (45)(46)(47). In addition, separate reports using chromatin immunoprecipitation sequencing have reported the genomic 5 ′ untranslated region around rs17725246 to bind the proteins GATA2 ( 48 ) and CDX2 ( 49 ). These results in conjunction with the eQTL study fi ndings of enrichment at gene starts and ends may implicate altered gene expression as a possible explanation for our fi ndings of SNP-by-dietary cholesterol interactions for APOB and NPC1L1 . Further functional studies will be required to determine which of these close-proximity variants, both coding and regulatory, are functionally responsible for the observed APOB and NPC1L1 GxE interactions.
Strengths of the present study include a large wellcharacterized community-based cohort with dietary intake information, genetic data, and total lipid phenotypes. In addition, through separate genotyping of SNPs identifi ed as signifi cant for interactions with dietary cholesterol in the discovery subset of the cohort, we were able to replicate our fi ndings in an independent subset of the CLEAR study. Limitations include the lack of ethnic diversity, which limits the generalizability of these fi ndings, and the sample size. Due to power considerations inherent in GxE interaction studies, this investigation only considered seven SNPs in six candidate loci. Larger studies could assess a wider range of variation at loci for GxE interactions and further explore the sex interaction seen for NPC1L1 . In this case, to reduce contrasts we utilized a candidate in this study could affect the relationship between dietary and total cholesterol could be at the stage of absorption or chylomicron assembly in the small intestine. Further molecular follow-up is necessary to elucidate the specifi c mechanisms by which these SNPs infl uence the relationship of dietary to total cholesterol levels.
NPC1L1 is enriched in the lumen of the small intestine and specifi cally in the brush border of enterocytes ( 37 ) and is crucially involved in the early steps of cholesterol absorption by binding nonplant sterols with its N-terminal domain ( 38 ). Mouse knockout studies of NPC1L1 resulted in a signifi cant reduction in cholesterol absorption that was unaffected by bile acid supplementation ( 37,39 ). In humans, in vitro experiments have demonstrated the potential molecular mechanisms through which nonsense variants affect cholesterol absorption: namely, through impairment of NPC1L1 recycling, localization, glycosylation, or stability ( 40 ). Recently, Miao et al. ( 27 ) reported a sexdependent difference for NPC1L1 SNP rs2072183 minor allele association with total cholesterol levels in a Mulao Chinese cohort of 688 subjects and 288 males (SNP × sex interaction, P = 0.03); while this interaction test was not signifi cant in a Han Chinese cohort of 738 subjects (274 males), the rs2072183 SNP effect in the Han subset was signifi cant in males, but not in the females. In both the Mulao and Han Chinese subsets, as well as in our data, the rs2072183 minor G allele was associated with an increase in total cholesterol in males. Although Miao et al. (27) did not correct for multiple comparisons, we did fi nd a similar effect in our data for NPC1L1 SNP rs17725246, with the SNP effect on total cholesterol (TC) marginally signifi cant in males ( P = 0.054) and not females ( P = 0.94), and the ␤ coeffi cient indicating an increase in cholesterol levels in males only . Thus, we have identifi ed a potential explanation for the reported lack of this SNP's association with total cholesterol in females: differential absorption or processing of dietary cholesterol based on NPC1L1 genotype in males, which is not detectable in females.
Previous genetic investigations have demonstrated other functions for SNPs in high LD of those identifi ed in this study in APOB and NPC1L1 . APOB SNP rs676210 ( r 2 = 0.996) was associated with a genotype-dependent response to fenofi brate ( 41 ). This is consistent with the GxE interaction we observed, and based on differential dietary cholesterol absorption or processing by an APOB variant in LD with this SNP . In addition, two SNPs further 5 ′ in the APOB gene but not in LD with the SNPs of interest here ( r 2 < 0.15) have been associated with an increased risk of biliary tract stones and cancer ( 42 ).
With association testing, it is often not possible to distinguish between the effects of SNPs in high LD to determine which is most likely the functional genetic variant driving the observed associations. The APOB SNP rs1042034 is a missense resulting in Ser4338Asn. It is in strong LD with a second missense, rs676210, which results in Ser4338Asn as well as intronic SNP rs2678379. While rs2678379 is intronic, it is in the last of 28 total introns of APOB , a position that can be regulatory. The latter SNP has a modestly smaller P value for our pooled data; however, any of the gene approach to narrow our genetic loci to only those that are expressed in the gastrointestinal system (excluding the liver and large intestine) and only tested SNPs for GxE interactions if they were at least marginally signifi cant for association with total cholesterol in these data. It is entirely possible that GxE interactions could occur when no marginal TC effect is identifi ed.
In conclusion, our study has identifi ed and replicated novel GxE interactions between SNPs in APOB in both sexes and also in NPC1L1 in males, and dietary cholesterol intake in the prediction of total cholesterol level. Such replicated interactions may account for part of the missing heritability of complex traits such as total cholesterol levels . Further work is needed to elucidate the mechanism, whether through altered gene expression or through protein function changes, by which these variants, or variants in LD, alter dietary cholesterol absorption or processing and hence alter the relationship between dietary and total cholesterol.