THOC5: a novel gene involved in HDL-cholesterol metabolism.

Although numerous genes are known to regulate serum lipid traits, identified variants explain only a small proportion of the expected heritability. We intended to identify further genetic variants associated with lipid phenotypes in a self-contained population of Sorbs in Germany. We performed a genome-wide association study (GWAS) on LDL-cholesterol, HDL-cholesterol (HDL-C), and triglyceride (TG) levels in 839 Sorbs. All single-nucleotide polymorphisms with a P value <0.01 were subjected to a meta-analysis, including an independent Swedish cohort (Diabetes Genetics Initiative; n = ∼3,100). Novel association signals with the strongest effects were subjected to replication studies in an additional German cohort (Berlin, n = 2,031). In the initial GWAS in the Sorbs, we identified 14 loci associated with lipid phenotypes reaching P values <10−5 and confirmed significant effects for 18 previously reported loci. The combined meta-analysis of the three study cohorts (n(HDL) = 6041; n(LDL) = 5,995; n(TG) = 6,087) revealed a novel association for a variant in THOC5 (rs8135828) with serum HDL-C levels (P = 1.78 × 10−7; Z-score = −5.221). Consistently, the variant was also associated with circulating APOA1 levels in Sorbs. The small interfering RNA-mediated mRNA silencing of THOC5 in HepG2 cells resulted in lower mRNA levels of APOA1, SCARB1, and ABCG8 (all P < 0.05). We propose THOC5 to be a novel gene involved in the regulation of serum HDL-C levels.

BioMedical Research. To identify inherited risk factors that predispose to type 2 diabetes (T2D), a genome-wide association study (GWAS) involving approximately 3,000 individuals from Scandinavia was performed using the 500K Affymetrix GeneChip. The initiative studied 1,464 patients with T2D and 1,467 control subjects from Finland and Sweden, each characterized for 18 clinical traits, among them anthropometric measures, lipids and apolipoproteins, and blood pressure (3,171 individ uals for HDL-C, 3,125 for LDL-C, and 3,217 for triglycerides). The samples were population based (1,022 T2D cases and 1,075 euglycemic control subjects, matched on gender, age, BMI, and region of origin) and family based (326 sibships discordant for T2D; 442 cases and 392 euglycemic control subjects). Genotyping of 500,568 SNPs was attempted in each sample. The overall call rate for passing SNPs was 99.2%. After fi ltering rare and monomorphic variants and applying stringent quality-control fi lters, highquality genotypes for 386,731 common SNPs were obtained ( 4 ).
In DGI, total cholesterol, HDL-C, and TG levels were measured in fasting blood samples drawn at the baseline examination for each study according to standard enzymatic methods ( 2 ). LDL-C levels were calculated according to Friedewald's formula, and missing values were assigned to individuals with triglycerides >400 mg/dl ( 2 ).
Berlin cohort ( Metabolic Syndrome Berlin Potsdam ) . This replication set was recruited as a cross-sectional study with focus on traits of the metabolic syndrome. Participants were recruited by paper advertisements and by the outpatient clinic of the German Institute of Human Nutrition and the Department of Endocrinology at the Charité-University Clinic, Berlin, Germany. All measurements and analyses were standardized across recruitment centers. Anthropometry included weight, height, waist and hip circumference, and skin fold measurements. Blood pressure was determined after at least 15 min of rest in a sitting position. Lipid parameters included total cholesterol, LDL-C, HDL-C, and triglycerides. The quantitative measurements of TG (ABX Pentra Triglycerides CP), cholesterol (ABX Pentra Cholesterol CP), and HDL-C (ABX Pentra HDL Direct CP) were done by colorimetry using a Pentra 400 Clinical Chemistry Analyzer (Horiba Instruments, Inc., CA). LDL-C concentrations were calculated according to Friedewald's formula. All parameters were measured in fasting blood samples.
A total of 2,172 Caucasian individuals from the MesyBepo (Metabolic Syndrome Berlin Potsdam) study population from the region of Berlin/Potsdam, Germany, were included in the present study. A total of 141 subjects who had been receiving lipid-lowering medication were excluded from further analyses. The remaining 2,031 subjects consisted of 651 men and 1,380 women (mean male age, 52 ± 14 years; mean female age 51 ± 13 years; mean male BMI, 29.1 ± 5.9 kg/m 2 ; mean female BMI, 29.9 ± 6.9 kg/m 2 ; data are given as arithmetic means ± SD), 312 subjects had type 2 diabetes (based on oral glucose tolerance test according to the WHO guidelines [ 21 ]). The study was approved by the ethics committees of the University of Potsdam and the variants affecting complex polygenic traits. Recently, fi ne mapping efforts in genes for LDL-C identifi ed by GWAS revealed genetic variants doubling the explained heritability ( 15 ). Moreover, focusing on special populations with reduced genetic heterogeneity as well as phenotypic complexity and homogenous environmental background might help to detect new allelic or haplotypic associations ( 16,17 ). Therefore, we attempted to identify novel variants associated with lipid phenotypes using a self-contained population of Sorbs from Eastern Germany.
We performed a GWAS on LDL-C, HDL-C, and TG levels in 839 Sorbs. All single-nucleotide polymorphisms (SNPs) with a P value <0.01 were subjected to a meta-analysis, including an independent Swedish cohort [Diabetes Genetics Initiative (DGI); n = ‫ف‬ 3,100]. Novel association signals with the strongest effects were subjected to replication studies in an additional German cohort (Berlin, n = 2,031). The most promising candidate gene, THOC5 , was significantly associated with HDL-C in a combined meta-analysis including all three study populations and has been further investigated in in vitro analyses to elucidate its potential functional role in HDL-C metabolism.

Subjects and phenotyping
Sorbs. All subjects are part of a sample from an extensively clinically characterized population from Eastern Germany, the Sorbs (18)(19)(20). From the 1,000 Sorbs individuals who are enrolled in the cohort, 839 subjects without lipid-lowering drugs (499 women and 340 men) were included in the present study. The individuals had a mean age of 46 ± 16 years and a mean body mass index (BMI) of 26.6 ± 4.9 kg/m 2 (data are given as arithmetic means ± SD). Further clinical characteristics of the study subjects are provided in Table 1 .
All parameters were measured in fasting blood samples. Total serum cholesterol and TG concentrations were measured by standard enzymatic methods (CHOD-PAP and GPO-PAP; Roche Diagnostics). Serum LDL-C and HDL-C concentrations were determined with commercial homogeneous direct measurement methods (Roche Diagnostics). All assays were performed in an automated clinical chemistry analyzer (Hitachi/Roche Diagnostics) at the Institute of Laboratory Medicine, University Hospital Leipzig.
The study was approved by the ethics committee of the University of Leipzig, Leipzig, Germany. All subjects provided written informed consent.  The Sorbs cohort included 499 women and 340 men. BMI, body mass index; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; WHR, waist-to-hip ratio.
The data were calculated using the ⌬ ⌬ Ct method with ␤ -actin as a normalization gene and were corrected for individual PCR effi ciency.

Genome-wide scan for lipid traits in the Sorbs
A genome-wide scan for association of SNPs with lipid parameters in the Sorbs revealed 14 novel loci reaching nominal P values <10 Ϫ 5 (supplementary Table I) (considering 390,619 SNPs, a P < 1.3 × 10 Ϫ 7 would be required to reach genome wide signifi cance after Bonferroni correction for multiple testing). Several previously published variants were nominally associated with lipid traits in the Sorbs (supplementary Table II Table II).

Meta-analysis (Sorbs and DGI)
SNPs with a P value < 0.01 in the Sorbs (4,182 SNPs for HDL-C, 4,437 for LDL-C, and 4,395 for TG) were taken forward into stage 2 for a meta-analysis (herein referred to as second-stage meta-analysis) together with the DGI data Charité-University (Medical Department), Berlin, Germany. All subjects provided written informed consent.

DNA extraction, genotyping, and quality control in the Sorbs
Genomic DNA was extracted using a QIAmp DNA Blood Midi Kit (Qiagen Inc., Valencia, CA) according to the manufacturer's protocol. Genotyping was performed using the 500K Affymetrix GeneChip and the Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Inc.) by the Microarray Core Facility of the Interdisciplinary Centre for Clinical Research, University of Leipzig, Germany, and by ATLAS Biolabs GmbH, Berlin, Germany. Genotypes were determined with GeneChip Genotyping Analysis Software (GTYPE) using the BRLMM algorithm for the 500K arrays and the Birdseed Algorithm for Genome-Wide Human SNP Array 6.0 (Affymetrix, Inc.). Data underwent quality control, and only SNPs fulfi lling the following criteria were included: missing rate per SNP <5%, missingness per sample <7%, Hardy-Weinberg equilibrium P > 0.0001, minor allele frequency >0.05. The average genotyping rate was 99.0%. In all, 390,619 autosomal markers overlapping between the 500K Affymetrix GeneChip and the AffymetrixGenome-Wide Human SNP Array 6.0 were included in the analyses.

Statistical methods and software
The calculation of minor allele frequencies, Hardy-Weinberg equilibrium, and missing rates per SNP was performed with PLINK ( 22 ). Genome-wide association with lipid phenotypes was assessed by linear regression in PLINK. All non-normally distributed parameters were transformed to approximate normal distribution. We corrected the analyses for age, gender, and BMI and relatedness by using genomic control for Sorbs ( = 1.31). Linkage disequilibrium (LD) metrics were calculated in Haploview 4.1 ( 23 ). A weighted meta-analysis was performed using METAL ( 24 ) Studyspecifi c P values and effect directions were converted to Z statistics and weighted with sample size of each study. Two-sided P values <0.05 were considered to provide nominal evidence for association and are presented without Bonferroni correction for multiple testing. Only associations that would reach the P values adjusted for multiple testing (Bonferroni correction; P = 0.05 divided by the number of tested SNPs) were considered statistically signifi cant.
Statistical analyses in replication studies were performed using SPSS version 18.0.2 (SPSS, Inc., Chicago, IL).

Genotyping for replication in the Berlin cohort
Genotyping of selected SNPs for replication in an independent cohort from Berlin was performed using the TaqMan allelic discrimination assay (Applied Biosystems, Inc.). Oligonucleotide sequences are available upon request. The TaqMan genotyping reaction was performed according to the manufacturer's protocol on an ABI PRISM 7500 sequence detector (Applied Biosystems Inc.).

Selection of THOC5 tagging SNPs and genotyping in the Sorbs
Five tagging SNPs (rs4823045, rs8140060, rs2283860, rs737975, and rs11704899) were selected from the HapMap database ( r 2 > 0.8; minor allele frequency >0.01) to cover all common genetic variants in THOC5 . Genotyping in the Sorbs was performed using the TaqMan allelic discrimination assay (Applied Biosystems, Inc.). Oligonucleotide sequences are available upon request.

Small interfering RNA mediated knock-down of THOC5 in vitro
Briefl y, human hepatic carcinoma (HepG2) cells were cultured in DMEM (high glucose: 4.5 g/l) (Invitrogen, Karlsruhe, Germany) (supplementary Fig. I). In the second-stage meta-analysis, fi ve SNPs at four loci were associated with HDL-C, 13 SNPs at nine loci with LDL-C, and 26 SNPs at seven loci with serum TG concentrations with combined P values < 5 × 10 Ϫ 5 (supplementary Fig. I and supplementary Table III). Considering there were approximately 4,000 SNPs tested, associations of two SNPs with HDL-C, six SNPs with LDL-C, and 17 SNPs with TG would reach statistical signifi cance after Bonferroni correction for multiple testing ( P < 1.3 × 10 Ϫ 5 ). Nevertheless, after exclusion of previously reported variants (two SNPs for HDL-C, three SNPs for LDL-C, and eight SNPs for TG; supplementary Table III), SNPs with strong evidence for association and representative for their LD groups were taken forward to replication in an independent German cohort from Berlin. In summary, rs8135828 ( THOC5 ) and rs12657936 ( SPEF2 ) were selected for replication of associations with HDL-C, rs7251213 ( APG4D ) with LDL-C, and rs9444205 ( TBX18 ) and rs17422816 ( FAM79B ) with TGs (supplementary Fig. I and supplementary Table III).

Replication of top association signals in the Berlin cohort and meta-analysis in three cohorts (Sorbs, DGI, Berlin)
HDL-C. The rs8135828 in the intronic region of the THOC5 was signifi cantly associated with HDL-C in the Berlin cohort ( Table 2 ). Minor allele carriers (A-allele) showed lower serum HDL concentrations ( P = 0.002 [Table 2]; genotype-specifi c means are provided in supplementary Table IV). The meta-analysis including Sorbs, the DGI sample, and the Berlin cohort revealed an association reaching P = 1.78 × 10 Ϫ 7 ( Table 2).
Despite consistent effect direction, the SPEF2 locus did not show a signifi cant effect in the Berlin sample ( P = 0.92) ( Table 2). In the meta-analysis including Sorbs, the Berlin cohort, and the DGI sample, the A-allele was signifi cantly associated with lower HDL serum levels ( P = 1.71 × 10 Ϫ 3 ) ( Table 2).

LDL-C.
Based on the second-stage meta-analysis of the Sorbs and DGI sample, we selected the rs7251213 in APG4D for further replication analyses in the Berlin cohort. Rs7251213 ( APG4D ) did not reach a nominal level of signifi cance in the Berlin sample (Table 2), but in a combined analyses of all cohorts the G-allele was associated with reduced LDL-C concentrations ( P = 2.91 × 10 Ϫ 5 ) ( Table 2).
TG. Based on the result of the second-stage meta-analysis, rs9444205 near TBX18 and rs17422816 in the intronic region of the FAM79B were selected for replication in the Berlin cohort. No signifi cant effects on serum TG concentrations were detected ( P = 0.93 and P = 0.21, respectively) ( Table 2). However, meta-analysis of all three study populations showed an association of these loci with TG levels ( P = 5.04 × 10 Ϫ 4 and 4.70 × 10 Ϫ 5 , respectively) ( Table 2).

Effects of additional THOC5 tagging SNPs on lipid traits in the Sorbs
Five tagging SNPs covering the common genetic variation within THOC5 were genotyped for fi ne mapping of the causal variant responsible for the initially identifi ed association between rs8135828 and HDL-C. As expected, rs11704899, which was in complete LD with the rs8135828, showed significant association with HDL-C (supplementary Table V). Rs8140060 showed a nominal effect on HDL-C serum concentrations ( P = 0.024, additive mode of inheritance). The A-allele was associated with reduced HDL levels in the Sorbs (supplementary Table V). Finally, rs11704899 showed nominal association with lower serum APOA1 levels ( P = 0.028, ␤ = Ϫ 0.026; SE = 0.012; standardized for the minor allele; additive mode of inheritance). Subjects on lipid-lowering drugs were excluded from statistical analyses because it may be assumed that the effects of individual genetic variants would be masked by the drug effects, whose contribution to the phenotype appears much stronger than that of the genetic polymorphism. Nevertheless, because excluding participants may lead to bias in the regression, we performed additional association analyses of all THOC5 polymorphisms with lipid parameters in the complete Sorbs cohort, which included subjects on lipid-lowering drugs. To correct for the effects of drugs, we estimated the effect of medication on lipid levels in regression analyses and used the obtained effect size to calculate expected proportional changes in the circulating lipid concentrations. Based on these calculations, factor 1.02 for total cholesterol, 1.08 for HDL-C, 1.03 for LDL-C, and 1.31 for TGs were used to adjust for the expected change in the phenotype of the treated patients. The results remained materially unchanged (supplementary Table V). Specifically, except for the rs11704899, none of the THOC5 SNPs provided statistically signifi cant associations with HDL-C concentrations. However, even in our initial analyses using datasets without subjects on lipid-lowering drugs, the other SNPs showed only nominal associations with HDL-C, which would not withstand Bonferroni corrections for multiple testing ( P < 0.01 required for statistical signifi cance when considering fi ve tested SNPs).

Effects of THOC5 on mRNA levels of genes related to HDL-C metabolism
Because THOC5 had not been associated with HDL-C metabolism before this study, we performed THOC5 knockdown experiments in the human HepG2 cell line to assess the impact of THOC5 on mRNA levels of selected genes related to HDL-C metabolism ( APOA1 , SCARB1 , and ABCG8 ) and to support the observed associations with HDL-C. The outcome of the THOC5 knock-down on mRNA levels of APOA1 , SCARB1 , and ABCG8 is shown in Fig. 1 . The respective silencing of THOC5 by siRNA resulted in signifi cantly reduced relative expression of THOC5 , with a maximum of approximately 70% after 48 h (all P р 4.14 × 10 Ϫ 13 ). This effective knock-down of the THOC5 mRNA expression led to diminished APOA1 mRNA levels after 24 h ( P = 2.7 × 10 Ϫ 3 ) as well as SCARB1 ( P = 1.3 × 10 Ϫ 2 ) and ABCG8 ( P = 3.4 × 10 Ϫ 3 ) mRNA levels after 48 h ( Fig. 1 ).

DISCUSSION
Although numerous genetic determinants of serum lipid profi les have been recently identifi ed using GWAS, only a small proportion of the estimated heritability can be ex-plained ( 3 ). Focusing on variants with strong effects in genetically homogenous populations might help to identify loci of interest among SNPs not ranked in the top tier of a classical meta-analysis but still of potential physiological significance. Therefore, in the present study we followed a study design in which we preselected SNPs based on the GWAS in the self-contained population of Sorbs and proceeded with these SNPs in subsequent meta-analyses. In our GWAS in the Sorbs, we successfully replicated several of the previously shown associations with plasma lipid traits. This indicates that our cohort had suffi cient power to detect signals found in larger meta-analyses and indirectly supports a potential physiological role of these variants in lipid metabolism.
In a meta-analysis including the Sorbs and the DGI sample followed by further replication in an independent cohort from Berlin, we identifi ed a THOC5 variant (rs8135828) associated with reduced HDL-C levels. The association of other independent variants in the THOC5 locus with HDL levels in the Sorbs further minimizes the risk of a false-positive result. However, this locus did not show up in the published GWAS and meta-analyses that can be readily accessed, including imputed datasets in "The database of Genotypes and Phenotypes" ( 25 ). THOC5 also did not come up in the recent meta-analysis of the Global Lipids Genetics Consortium ( 5 ). Even though the effect direction from the Global Lipids Genetics Consortium was consistent with the results of our study, there was no signifi cant association of rs8135828 with HDL-C concentrations ( P = 0.30) ( 5 ). It remains to be clarifi ed whether this is due to genetic heterogeneity between various study populations. Such heterogeneity of effect sizes in different populations could also be driven by gene-gene and gene-environment interactions. Another recent GWAS on genetic modulators of carotid atherosclerosis risk identifi ed variants in the THOC5 region ( 26 ). Although this might indirectly support the potential role of THOC5 genetic variation in lipid metabolism, the two studies report different SNPs. The SNPs reported by Shrestha et al. ( 26 ) map ‫ف‬ 66 kb away from rs8135828 shown in our study. Moreover, according to the HapMap database ( 27 ), rs8135828 is not in LD with either of the SNPs (rs13053817 and rs5763254) studied by Shrestha et al. (26) ( r 2 = 0.07 for both SNPs). Therefore, even though fi ne mapping of the respective chromosomal region would be inevitable to draw further conclusions, it seems unlikely that the two studies point to the same region/variant affecting both the lipid traits and the risk of carotid atherosclerosis.
THOC5 is ubiquitously expressed in most mammalian tissues. THOC5 protein expression is elevated in blood cells, kidney, and liver ( 28 ). THOC5 is part of the RNA spliceosome complex and infl uences C/EBP expression in adipocyte differentiation ( 29,30 ). It is also part of the TREX (transcription/export) complex, which is required for correct mRNA processing, in particular transcription elongation and nuclear export of mRNAs (especially heat shock mRNAs) ( 31,32 ). Furthermore, the THOC5/FMIP complex is essential for hematopoietic primitive cell survival in vivo ( 33 ). Using a THOC5 -depleted mouse fi broblast cell line, Guria et al. ( 31 ) identifi ed several genes that were transcribed but not exported into the cytoplasm in absence of the heterogeneous genetic background in the different study populations might have affected the outcome of the meta-analyses. We are also aware that both the Sorbs and the Berlin study cohorts seem to have rather high BMIs (26.4 and 29 kg/m 2 , respectively). We therefore adjusted all analyses for BMI to avoid bias driven by BMI effects in the lipid profi le. This does not necessarily preclude the possibility that some of the SNP associations with lipid phenotypes are affected by BMI. On the other hand, using the present study design with inclusion of obese subjects provides a broader distribution of the targeted phenotype. Nevertheless, we performed the replication analyses by stratifying the subjects based on BMI but found no differences between lean and obese subjects regarding genetic associations with lipid phenotypes (e.g., the rs8135828 did not show association with HDL levels in either the lean [<25 kg/m 2 ] or the obese [>30 kg/m 2 ] groups; P values of 0.19 and 0.31, respectively).
Running GWAS with imputed datasets might have resulted in additional association signals that would be worth being pursued. Nevertheless, even though by not including imputed datasets the study may have restricted the final outcome of the GWAS and may have missed additional association signals, combining genetic association studies with functional in vitro experiments provided clear evidence for the role of THOC5 in the lipid metabolism.
Taken together, our data suggest that THOC5 might be a novel player involved in the genetic control of serum lipid profi les.
The authors thank all those who participated in the studies, John Broxholm from the Bioinformatics Core Unit of the Wellcome Trust Centre for Human Genetics, and Andre Rothe from the Coordination Centre for Clinical Trials, University of Leipzig.
THOC5 . However, little is known about the potential relationship between THOC5 and HDL-C metabolism. Therefore, we elucidated the effects of a targeted THOC5 mRNA knock-down on genes related to HDL-C metabolism. Our results showed that APOA1 expression was signifi cantly reduced 24 h after THOC5 silencing. In line with this, rs11704899, which is in LD with the HDL-associated rs8135828 initially detected in the GWAS, showed nominal association with reduced circulating APOA1 levels. Considering the fact that APOA1 is the main protein of HDL particle formation, our data suggest that the effects of THOC5 variants on HDL-C concentrations might be mediated via effects on APOA1. Lower APOA1 levels are known to be associated with increased risk of coronary artery disease and related events ( 34 ). It has been demonstrated that variants in the APOA1 were associated with amyloidosis ( 35 ), which may lead to increased carotid intima-media thickness and to endothelial dysfunction ( 36 ). Knock-out of APOA1 in apoBtransgenic female mice, LDL receptor-defi cient mice, and LDL receptor/apoE-defi cient mice results in an increased risk of atherosclerosis ( 37,38 ). Surprisingly, THOC5 knockdown performed in vitro also leads to reduced mRNA expression of SCARB1 and ABCG8 in HepG2 cells. These data were achieved in in vitro experiments using a cell culture system that, due to the substantially lower molecular complexity, cannot entirely resemble the in vivo situation. Therefore, we interpret these fi ndings with caution and acknowledge that further functional studies are warranted to elucidate the precise molecular mechanisms by which THOC5 affects plasma HDL-C.
Our study has several limitations. First, the sample size of our initial GWAS might have been too small to reveal highly signifi cant variants that might be truly causative or close to causal variation. Second, we cannot rule out that ) after 24 h and for SCARB1 (* P = 1.3 × 10 Ϫ 2 ) and ABCG8 (** P = 3.4 × 10 Ϫ 3 ) after 48 h.