Genome-wide association study identifies novel recessive genetic variants for high TGs in an Arab population[S]

Abnormal blood lipid levels are influenced by genetic and lifestyle/dietary factors. Although many genetic variants associated with blood lipid traits have been identified in Europeans, similar data in Middle Eastern populations are limited. We performed a genome-wide association study with Arab individuals (discovery cohort: 1,353; replication cohort: 1,176) from Kuwait to identify possible associations of genetic variants with high lipid levels. We used Illumina HumanOmniExpress BeadChip and candidate SNP genotyping in the discovery and replication phases, respectively. For association tests, we used genetic models that were based on additive and recessive modes of inheritance. High triglycerides (TGs) were recessively associated with six risk variants (rs1002487/RPS6KA1, rs11805972/LAD1) rs7761746/Or5v1, rs39745/CTTNBP2-LSM8, rs2934952/PGAP3, and rs9626773/RP11-191L9.4-CERK) at genome-wide significance (P  6.12E-09), and another six variants (rs10873925/ST6GALNAC5, rs4663379/SPP2-ARL4C, rs10033119/NPY1R, rs17709449/LINC00911-FLRT2, rs11654954/CDK12-NEUROD2, and rs9972882/STARD3) were associated at borderline significance (P  5.0E-08). High TG was also additively associated with rs11654954. All of the 12 identified markers are novel and are harbored in runs of homozygosity. Literature evidence supports the involvement of these gene loci in lipid-related processes. This study in an Arab population augments international efforts to identify genetic regulation of lipid traits.

were mostly performed on European populations followed by African-American, East Asian, and South Asian populations (20)(21)(22)(23). Populations from the Arabian Peninsula have not been included in these global studies and there is limited genetic association data for blood lipids in the Arab population. This deficit persists in spite of the observation that cardiovascular disorder is a leading cause of death in Kuwait, representing 46% of all deaths (24), and in spite of the rising prevalence of metabolic trait-related disorders (including dyslipidemia at 70.3%, obesity at 48.2%, and diabetes at 23.3%) in Kuwait and the Middle East.
While most of the lipid-associated genetic variants identified in Europeans are consistent and have the same direction of association in other population groups (20,23,25), some are heterogeneous, population specific, and not validated in all ethnicities (26,27). Studies from the Arab region on traits associated with obesity and diabetes (28)(29)(30)(31)(32) revealed that not all Euro-centric markers show up with evidence of association and that new population-specific markers could be identified. The reasons for heterogeneity and lack of generalization include differences in effect sizes, allele frequencies, and linkage disequilibrium (LD) (25); it is also the case that sample sizes in previous studies conducted in the Arab region are probably underpowered to detect the established associations from other populations.
The native population of Kuwait is unique in many ways and offers a potential to detect novel genetic associations. The Arabian Peninsula is at the nexus of Africa, Europe, and Asia; and has been presumed to be part of early human migration. The population is well-structured into three groups: the KWP group that is largely of West Asian ancestry with European admixture; the KWS group that is predominantly of city-dwelling Saudi Arabian tribe ancestry; and the KWB group that is comprised of tent-dwelling nomadic Bedouins with a characteristic presence of 17% African ancestry (33). The population of Kuwait, like other states in the Peninsula, has been practicing consanguineous marriages (involving first or second cousins or relatives within the large family or the same tribe); the rate of consanguineous marriages can be as high as 54.3% (34). Consanguinity leads to increased endogamy, homozygosity, and accumulation of deleterious recessive alleles in the gene pool. The tradition of consanguinity has made the population vulnerable to a plague of recessive genetic disorders. Consanguinity and inbreeding can also play an important role in the etiology of complex disorders (with a multifactorial mode of inheritance), such as diabetes mellitus, hypertension, mental disorders, and cancer (35); autosomal recessive alleles may play a larger role in the pathogenesis of complex traits in this population (36). The population history, as delineated above, has the potential to result in some variants becoming more common and population specific; as a result, elucidation of novel associations is realistically possible in this population.
The Middle East exhibits a high prevalence of metabolic syndrome (37), diabetes (38,39), and familial hypercholesterolemia (40); these disorders, along with the practice of consanguineous marriages, have resulted in a pattern of dyslipidemia [low HDL cholesterol and high triglycerides (TGs)] that is different from many other regions of the world (41). These lifestyle disorders have reached the proportion of public health crisis; this epidemic is primarily a consequence of recent environmental changes that have triggered the effect of preexisting susceptibility genes via gene-environment interactions (42). The post-oil era has seen rapid lifestyle transitions, such as rapid urbanization, increasingly sedentary lifestyles, and Westernized diet practices. Interactions between genetic background and rapid transitions in diet and lifestyle may accelerate the incidence of disorders such as diabetes (43). A number of SNPs and genes have been implicated in gene-environment interactions that mediate the blood lipid phenotypes (44,45). The most common environmental factors associated with lipid phenotypes are physical activity [with HDL and total cholesterol (TC)], high fat challenge (with TG), and dietary saturated fat (with LDL) (46). Studying the Arab population, which has seen rapid lifestyle transitions, thus has the potential to identify novel preexisting susceptibility gene loci triggered by environmental factors.
As stated above, the population of Kuwait is composed of three distinct genetic groups that fall firmly into declared ancestral/tribal backgrounds; association tests, appropriately controlled for population stratification to avoid false positive findings, can provide opportunities for discovery of novel associations that are not seen in other populations. In the present study, we perform a GWAS on a cohort of native Arab individuals from Kuwait to elucidate novel genetic markers underlying the lipid traits, TG, LDL, HDL, and TC.

MATERIALS AND METHODS
The study was reviewed and approved by the International Scientific Advisory Board and Ethical Review Committee at Dasman Diabetes Institute.

Study participants
A total of 3,145 participants from Kuwait were recruited under protocols in accordance with guidelines laid in place by the institutional Ethical Review Committee. The study cohort included two groups. The first group comprised a random representative sample of adults (age >18 years) of Arab ethnicity from the six governorates of Kuwait. A stratified random sampling technique was used to select participants from the computerized register of the Public Authority of Civil Information, which maintains records of personal information for both Kuwaiti citizens and expatriates from other Arab and non-Arab countries. The second group comprised Arab individuals seeking tertiary medical care for diabetes/prediabetes-related disorders at our clinics, visitors to our nutrition programs and fitness center, visitors to our Open Day Events (that offer various diagnostic services), and visitors to our campaigns at primary health centers and blood banks in each of the six governorates of Kuwait. Such visitors interested to participate in our research programs were invited to the institute at a later date to give samples after fasting overnight. At the time of recruitment, ethnicity was confirmed through detailed questioning on parental lineage up to three generations; data on age, sex, and illness (e.g., diabetes and cardiovascular complications) were recorded. Baseline characteristics and vital signs, such as height, weight, waist circumference (WC), and blood pressure, were recorded. Signed informed consent was obtained from each of the participants. Details on medications taken by the participants for lowering lipid levels, diabetes, and hypertension were collected and used in correction procedures with the association statistics. A participant was regarded as affected by type 2 diabetes if the diagnosis was known to the participant (self-declaration) or, in accordance with the ADA guidelines, if fasting serum glucose was 7 mmol/l (126 mg/dl) or if glycated hemoglobin was 6.5% (48 mmol/mol). When in doubt, the recorded details on antidiabetes medication were used; and for participants recruited through our clinics or our campaigns (constituting the abovementioned second group), the clinician's notes were used.
The discovery cohort was drawn largely, but not exclusively, from the second group and the replication cohort was drawn largely, but not exclusively, from the first group. Recruitment into the two groups was carried out in parallel with one another; genotyping for the discovery phase was also performed in parallel to the recruitment process. As a result, in order to make up the numbers for genotyping batches and runs during the course of the discovery phase, randomly chosen samples from the first group had to be added to the discovery cohort (a random set of 200 samples from second group were part of discovery cohort). A total of 1,913 samples were considered for the discovery phase and 1,176 for the replication phase.

Sample collection and processing
Upon confirming that the participants had fasted overnight, signed consent forms and blood samples were collected. The guidelines of the institutional Ethical Review Committee were followed for the collection of blood samples and measurement of vital signs. A Gentra Puregene® kit (Qiagen, Valencia, CA) was used to extract DNA. Quantification of DNA was performed using a Quant-iT™ PicoGreen® dsDNA assay kit (Life Technologies, Grand Island, NY) and an Epoch microplate spectrophotometer; only samples with a ratio in the range of 1.8-2.1 for absorbance at 260 nm to absorbance at 280 nm were used. DNA stocks were then frozen. Prior to genotyping, frozen DNA was diluted to a working concentration of 50 ng/l, as recommended by Illumina (San Diego, CA).

Power calculation
Two types of power calculations were performed. The first one was to estimate the sample size and its potential to detect variability in quantitative traits with 80% power and a P-value threshold of 5.0E-08; the second one was to determine the number of samples required to achieve 80% power in a two-stage (discovery and replication) design.
For the second calculation, QPowR software (https://msu. edu/~steibelj/JP_files/QpowR.html) was used with the following parameters: total sample size = 2,532; total heritability = 0.05; samples genotyped in the first or second stage = approximately 50% of 2,532; markers typed in the second stage = typically 0.2% of the markers typed in the first stage; and type I error rate = 5.0E-08.

Discovery phase
Whole-genome genotyping was performed on DNA samples from the discovery cohort of 1,913 participants. Illumina Hu-manOmniExpress arrays utilizing Infinium® HD Assay Ultra genotyping assay methods were used to genotype. Assays included whole-genome amplification, fragmentation, hybridization, staining, and imaging of the HumanOmniExpress arrays using the Illumina iSCAN system. Genotyping was performed in batches; Illumina HumanOmniExpress-12v1-Multi_H (730,525 markers) was used to genotype 1,097 participants in 20 batches, HumanOm-niExpress-12v1-1_B (719,665 markers) to genotype 336 participants in five batches, and HumanOmniExpress-24v1-0_a (716,503 markers) to genotype 480 participants in six batches. The number of markers in these three versions of OmniExpress BeadChips differed from one another by at most 10,000; otherwise the markers were common between the three chip versions.

Replication phase
Top markers identified in the discovery phase were selected for replication in a cohort consisting of 1,176 unrelated participants. Candidate SNP genotyping was performed using TaqMan® SNP genotyping assays (Applied Biosystems, Foster City, CA) and ABI 7500 real-time PCR system (Applied Biosystems). Each PCR sample contained 10 ng of genomic DNA, 5× FIREPol® Master Mix (Solis BioDyne, Tartu, Estonia), and 1 l of 20× TaqMan® SNP genotyping assay (Applied Biosystems). The thermal cycling conditions were 60°C for 1 min, 95°C for 15 min, and then 40 cycles of 95°C for 15 s and 60°C for 1 min. Genotypes ascribed by realtime PCR were validated by direct Sanger sequencing of the PCR products for selected cases of homozygotes and heterozygotes. Sequencing was performed using the BigDye™ Terminator v3.1 cycle sequencing FS ready reaction kit (Applied Biosystems), according to the manufacturer's recommendations, on an Applied Biosystems 3730xl DNA analyzer (Applied Biosystems).

Meta-analysis with results from discovery and replication phases
Meta-analysis with the results of association tests (namely, allele frequencies, effective sample size, effect size, standard error, and P-values of SNP associations) from the discovery and replication cohorts was performed using the METAL tool (48).

Quality control and statistical analysis
Raw intensity data were pooled from all samples from all batches, and genotype calling was performed using the GenCall algorithm provided in the GenomeStudio software.
A series of quality metric thresholds was applied to derive a high-quality set of SNPs and samples.
Samples with call rates >95% were selected. SNPs with call rates of <98%, low intensities (AB R mean 0.25), poor cluster separation (Cluster Sep <0.3), heterozygote clusters too close to homozygotes (AB T mean 0.2 or 0.8), excess heterozygotes (Het Excess 0.2), and fewer than expected heterozygotes (Het Excess less than or equal to 0.3) were removed. Sex estimations were performed using GenomeStudio software, and samples with sex mismatches were removed. Duplicated samples were also removed. Strand designations were corrected to the forward strand and REF/ALT designations were corrected using the design files for HumanOmniExpress BeadChip (available from Illumina). In addition, poor markers with respect to missingness per individual (-mind; 0.1), minor allele frequency (-maf; 0.01), missingness per marker (-geno; 0.1), and Hardy-Weinberg equilibrium (HWE; <10 6 ) were removed using PLINK (49). Relatedness among the participants was calculated using the "-genome" feature in PLINK with PI_HAT >0.125 (i.e., third-degree relatives) and one sample per pair of related participants was randomly removed. Our earlier work (33) demonstrated that the Arab population in Kuwait is characterized by an admixture of six ancestral components that arise from the Human Genome Diversity Project populations of Negev Bedouin, Yoruba, Brahui tribe, Druze, Kalash tribe, and French Basque; we estimated the extents of the six ancestry components in the genetic substructure of the Kuwaiti population. Ancestry estimation was performed using ADMIXTURE (50), through which samples with abnormal deviations in the extents of component ancestry elements were identified as samples of ethnicity mismatch, and such samples were removed. These series of quality control (QC) steps reduced the number of markers to 632,375 and the size of the discovery cohort to 1,353 samples.
Principal component analysis (PCA) was performed using EIGENSTRAT (51). The following parameters were set: the number of eigenvectors to output (numoutevec = 10), no outlier removal (numoutlieriter = 0), number of principal components along which to remove outliers during each outlier removal iteration (numoutlierevec = 10), and number of SDs a participant must exceed, along one of the top (numoutlierevec) principal components, in order for that participant to be classified as an outlier (outliersigmathresh = 6.0) and to be removed. All 10 principal components were used as covariates in quantitative trait association tests to correct for population stratification.
LD-pruning was performed using the "-indep-pairwise" option in PLINK to remove markers with an R 2 value of >0.5 with any other SNP within a 50-SNP sliding window (advanced in 5-SNP increments). These pruned markers (340,299) were used for measuring relatedness and admixture, to perform PCA, and to calibrate the genome-wide P-value threshold used to identify significant genotype-phenotype associations.
Independent variables among the four lipid traits were identified by way of performing Pearson correlation analysis between the traits followed by matSpD (52) analysis (http://neurogenetics.qimrberghofer.edu.au/matSpD/); the estimated effective number of independent traits was seen as 3.

Quantitative trait association tests in discovery phase
Association tests were performed with all of the 632,375 SNPs (that passed the QC without LD-pruning) against HDL, LDL, TC, and TG using linear regression methods under additive and recessive genetic models. Association tests were adjusted for age, sex, and first 10 principal components with further adjustment toward medication for lowering lipid levels, toward obesity status, and toward diabetes status.

P-value thresholds for associations in discovery phase
The P-value threshold for genome-wide significance was calibrated for the number of LD-pruned markers, effective number of independent variables among the quantitative traits, number of tested genetic models, and number of models used to adjust the association tests; the "stringent" P-value threshold required to keep the type I error rate at 5% was thus calibrated to 6.12E-09 [= 0.05/(340,299 × 3 × 2 × 4)]. We further defined a "borderline" P-value threshold of (>6.12E-09 and 5.0E-08) in alignment with the standard P-value threshold for genome-wide significance adopted by the GWAS community.

Delineation of runs of homozygosity regions
Identification of runs of homozygosity (ROH) was performed using PLINK 1.9 through two different methods and sets of parameters: the first method comprised pruning the markers for LD (R 2 > 0.9) (leading to 568,670 markers) followed by ROH detection using optimal parameters suggested by Howrigan, Simonson, and Keller (53); the second method utilized the unpruned marker set (of 632,375 markers) to detect ROH using parameters as deployed in Christofidou et al. (54). Groups of overlapping ROH segments were discovered; for each such group, the consensus ROH region and mean ± SD (by taking the midpoint of individual ROH falling in the group) were delineated and were used to identify the overlay of SNP with ROH region. Further, the ROH signatures discovered from global populations, as reported in the Pemberton et al. (55) study, were used to classify the delineated ROH segments as "known" or "novel".

Tests for correlations among the quantitative traits associated with lipids and insulin resistance
Pairwise relationships among the four lipid traits (TG, HDL, LDL, and TC) and the trait of fasting plasma glucose (FPG) linked to insulin resistance were examined by way of performing Pearson correlation analysis. Relationship between two traits was denoted by the product-moment correlation coefficient (r 2 ) and significance of correlation (P-value, the probability of finding the current result if no correlation exists between the two variables, i.e., null hypothesis). A value of 0.05 was considered as the P-value threshold. The analysis was performed independently on the discovery and replication cohorts.

Testing the quantitative lipid traits for contribution to the dichotomous status of being diabetic or nondiabetic
The relationship between a lipid trait and the categorical status of being diabetic or nondiabetic was examined by way of performing logistic regression, a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The relationship between a trait and the outcome of diabetes status was denoted by the odds ratio (OR) of being diabetic (as calculated using the derived regression coefficient and standard error of coefficient) and significance of the coefficient (P-value). A value of 0.05 was considered as P-value threshold. The analysis was performed independently on the discovery and replication cohorts.

Marker and sample sets
The QC procedures led to a final marker set of 632,375 SNPs, a discovery cohort of 1,353 samples, and a replication cohort of 1,176 samples.

Characteristics of the study participants
The cohort used for the discovery phase largely comprised participants in middle adulthood (mean age, 46.77 ± 13.8 years), with almost equal proportions of males and females ( Table 1). The population was largely categorized as class I obese (mean BMI, 32.42 ± 7.38 kg/m 2 ), with a high mean value for WC (102.21 ± 16.35 cm). Forty-five percent of the participants were diabetic. The mean values for LDL, HDL, TC, and TG (3.07 ± 0.97 mmol/l, 1.13 ± 0.38 mmol/l, 4.92 ± 1.11 mmol/l, and 1.69 ± 1.18 mmol/l, respectively) were normal or near optimal. One hundred and thirty-three of the 1,353 participants that formed the discovery cohort were taking lipid-lowering medications; such medications were of the following types: Zocor (simvastatin), omega-3 fatty acids, Lipitor (atorvastatin), Crestor (rosuvastatin), Mevalotin (pravastatin), Lopid (gemfibrozil), and Lipanthyl.
The characteristics of the cohorts used in the study have been described in our previous reports: a) A subset was used to delineate the three genetic substructures of the Kuwaiti population (33). b) The cohort, when used to delineate genetic risk variants for obesity, led to identification of a TCN2 variant associated with WC under additive mode of inheritance (30). Scatter plots presenting the first three principal components of the final discovery cohort (after all the QC analysis) and representative Human Genome Diversity Project populations are presented in supplemental Fig. S1; the scatter plots depict the three genetic substructures and are in agreement with the published scatter plot in our previous study (33) (the scatter plot from the study is reproduced here in supplemental Fig. S2) for native Kuwaitis of Arab ethnicity confirmed through detailed surname lineage analysis. Scatter plots presenting the first two principal components of all the samples considered for the discovery cohort, prior to QC analysis and representative 1KGP populations (56), are presented in supplemental Fig. S3; the samples that got removed during the QC procedures are color-coded in the illustration.

Variants passing association tests in both the discovery and replication phases
Upon examining the results of association tests in the discovery phase for acceptable  values and at least nominal P-values of <1.0E-04, we short-listed 32 markers (of which 19 were associated with TG, 8 with HDL, 4 with TC, and 1 with TC and LDL) to carry forward to the replication phase. The SNP quality assessment values for these 32 markers in the replication phase are presented in supplemental Table S2. Two markers (namely rs1209347 and rs4731512) failed the assessment for allele frequency consistency (between the discovery and replication phases) and HWE QC (HWE >10 3 ). Twelve markers, associated with TG at either the genome-wide significant or borderline P-values (under recessive model), passed the replication P-value threshold of <0.05 (Tables 2, 3); in addition, three markers from cholesteryl ester transfer protein (CETP) (namely rs3764261, rs1800775, and rs1864163), associated with HDL at nominal P-values (under additive model), passed the replication P-value threshold; and none of the other markers associated with LDL or TC passed the replication P-value threshold.

Variants associated with lipid traits either at genome-wide significance or at borderline P-values
Six of the 12 markers associated with TG [namely rs1002487/ribosomal protein S6 kinase A1 (RPS6KA1), rs11805972/ladinin-1 (LAD1), rs7761746/olfactory receptor family 5 subfamily V member 1 (Or5v1), rs39745/cortactin binding protein 2 (CTTNBP2)-LSM8 homolog    Table 3). The association involving the rs11654954/ CDK12-NEUROD2 marker also appeared under the additive model. In the joint analysis, wherein we combined results from the discovery cohort and the replication cohort by way of meta-analysis, all the 12 markers showed significant P-values. The test statistics from the discovery phase for the markers that are in LD with the identified 12 markers for association with TG are listed in supplemental Data Set 1. Supplemental Fig. S4 presents the intensity maps; the maps display the quality of the three different genotypes called homozygous allele A, heterozygous allele AB, and homozygous allele B at these 12 markers. The quantilequantile (QQ) plots of the expected and observed log 10 (P-values) for the association of the markers with TG are presented in Fig. 1, and for the other three traits in supplemental Fig. S5. The genomic control inflation factors corresponding to TG were  = 1.01 (recessive model) and  = 1.029 (additive model) in tests with regular corrections,  = 1.002 (recessive model) and  = 1.028 (additive model) in tests additionally corrected for lipid lowering medication,  = 0.994 (recessive model) and  = 1.029 (additive model) in tests additionally corrected for obesity, and  = 0.996 (recessive model) and  = 1.028 (additive model) in tests additionally corrected for diabetes status. Similar values were obtained for the other traits as well. The values for  from recessive and additive models with regular corrections were (HDL, 1.013, 1.025), (TC, 1.05, 1.041), and (LDL, 1.047, 1.037). The values differed through a small range of 1.01 through 1.05. Values close to 1.0 were obtained for these  factors and, hence, it was not felt necessary to perform corrections for genomic-control inflation on association statistics. The Manhattan plot of the log 10 (P-values) from the genome-wide association analysis for all the four traits under both the recessive and additive models are presented in supplemental Fig. S6.
The regional association plots covering a region of 500 Kb centered at the SNPs showing association with TG at genome-wide significance are presented in Fig. 2, and those for the markers associated at borderline P-values are presented in supplemental Fig. S7. The regions surrounding the reported risk variants were gene-dense. However, the markers in LD with each reported risk variant were located in the same gene loci; for example: a) the rs10873925 LD markers (rs3113982, r 2 = 0.24; rs1252593, r 2 = 0.25; rs10782639, r 2 = 0.36; and rs199667, r 2 = 0.22) were also located in ST6GALNAC5; b) the rs11805972 LD markers (rs2799686, r 2 = 0.22 and rs6671391, r 2 = 0.35) were also located in LAD1; c) the rs17709449 LD markers (rs11628575, Fig. 1. QQ plot of the expected and observed -log 10 (P-value) for markers associated with TG under recessive or additive models corrected for age, sex, and the top 10 principal components that resulted from the PCA of the genotype data. Circles in the plots represent observed P-values obtained in tests of associations. Expected P-values represent the null hypothesis. Expected values from a theoretical  2 -distribution are represented by the light gray lines between the x axis and the y axis. The observed P-values for each SNP are sorted from largest to smallest and plotted against expected values. Thus, the presented QQ plot is a graphical representation of the deviation of the observed P-values from the null hypothesis. If some observed P-values are clearly more significant than expected under the null hypothesis, points move toward the y axis. The  represents the genomic control inflation factor, which compares observed association statistics against the expected distributions. Plots for the other three traits are presented in supplemental Fig. S5.  Fig. S7). In order to generate a regional association plot for a SNP-trait association, all the SNPs (typed in passing the QC analysis) from a region of around 500 Kb centered on the SNP were tested for association with the trait; the resultant statistics and the SNPs were displayed in the regional association plot. The regionplot tool (https://github.com/pgxcentre/region-plot) was used to produce regional plots. A: Regional association plot showing the risk variant, rs1002487 (in red and labeled), and its LD markers (in purple: rs6668958, r 2 = 0.29) in their respective gene regions and their association with TG. B: Regional association plot showing the rs11805972 (in red and labeled) and its LD markers (in purple: rs6671391, r 2 = 0.22 and rs2799686, r 2 = 0.35) in their respective gene regions and their association with TG. C: Regional association plot showing the rs7761746 (in red and labeled) and its LD markers (in orange: rs9257792, r 2 = 0.68); (in green: rs9501112, r 2 = 0.51; rs6901923, r 2 = 0.50; rs9257696, r 2 = 0.46; rs9257745, r 2 = 0.45; and rs9348827, r 2 = 0.48); and (in purple: rs7763661, r 2 = 0.30; rs11962388, r 2 = 0.28; rs11964645, r 2 = 0.29; rs9257803, r 2 = 0.35; rs11969272, r 2 = 0.25; rs9501676, r 2 = 0.33; rs9501677, r 2 = 0.32; rs9461533, r 2 = 0.28; rs10484548, r 2 = 0.32; rs362541, r 2 = 0.32; rs6915177, r 2 = 0.28; rs2076486, r 2 = 0.28; rs2076484, r 2 = 0.28; and rs64036, r 2 = 0.32) in their respective gene regions and their association with TG. D: Regional association plot showing the rs39745 (in red and labeled) and its LD markers (in orange: rs13226962, r 2 = 0.60 and rs7456706, r 2 = 0.64), (in green: rs2706164, r 2 = 0.54 and rs10487381, r 2 = 0.56), and (in purple: rs2214226, r 2 = 0.37 and rs39746; r 2 = 0.27) in their respective gene regions and their association with TG. E: Regional association plot showing the rs2934952 (in red and labeled) and its LD markers (in red: rs9972882, r 2 = 0. 82 , r 2 = 0.23; and rs17636692, r 2 = 0.23) were also located in LINC00911 and FLRT2; d) the rs9626773 LD marker (rs9627141, r 2 = 0.22) was also located in RP11-191L9 and CERK; and e) the rs1002487 LD marker (rs6668958, r 2 = 0.24) was also located in RPS6KA1.

Examining the NHGRI-EBI GWAS Catalog for published reports on associations between the identified markers and phenotype traits
We examined the NHGRI-EBI GWAS Catalog (https:// www.ebi.ac.uk/gwas/) to determine whether the 12 associations (at the stringent or borderline P-values), or at least the gene loci identified in our study, had been reported in previous GWASs on global populations. Of the identified 12 gene loci, only the PGAP3 and STARD3 were previously associated with lipid traits (20,57).

Examining the transferability of established lipidassociated variants from other populations to the study population
All of the 12 markers identified so far in the study are novel and resulted from using the genetic model of recessive mode of inheritance. In order to evaluate the transferability of the lipid markers established in other populations to the study population, we examined associations (resulting from using the genetic models of additive or recessive mode of inheritance) with nominal values of P < 1.0E-04 and consistent direction. This resulted in identifying 23 associations involving 22 markers (supplemental Table S3).  However, only six of the 22 markers passed the replication phase, and such variants were rs7156508/solute carrier family 10 member 1 (SLC10A1)-SPARC-related modular calcium binding 1 (SMOC1) (P = 5.49E-08 with TG), rs9972882/STARD3 (P = 4.07E-07 with TG), rs2934952/ PGAP3 (P = 8.81E-08 with TG), rs3764261/CETP (P = 1.10E-05 with HDL), rs1864163/CETP (P = 4.64E-06 with HDL), and rs1800775/CETP (P = 4.99E-06 with HDL); of these (except the SLC10A1-SMOC1 marker), all appeared under the additive model. Of the above six markers, the three CETP markers appeared as established markers in the NHGRI-EBI GWAS Catalog for lipid traits: the SNP, rs3764261/CETP, was associated with lipid traits at genomewide significance in European, Japanese, Han Chinese, African, British ancestry, and North Finnish founder populations (57)(58)(59)(60)(61); the rs1864163/CETP marker was similarly associated with lipid traits in East Asian, European, and Sardinian populations (57,62); and rs1800775/CETP was associated with lipid traits in Europeans and Filipinos (63,64). Further, the SNP, rs9972882/STARD3, was in LD (r 2 = 0.55) with an established marker, rs1877031/STARD3, associated with lipid traits in East Asians and Europeans (57).

ROH segments overlaying the identified markers associated with lipid traits
ROH analysis indicated that all of the 12 reported risk variants were harbored in ROH segments ( Table 4) identified in our study cohort. Upon comparing these 12 ROH segments with ROH data from studies on global populations (55), it was seen that the ROH segments for 9 of the 12 markers were known in global populations (see Table 4); the remaining markers that are harbored in novel ROH segments are rs1002487, rs39745, and rs9626773. Upon performing ROH analysis on the 22 markers that showed nominal values of P < 1.0E-04, it was seen that all of the 22 markers were proximal to ROH segments in the study population (supplemental Table S4); comparative analysis with data from the Pemberton et al. (55) study revealed that 20 of such ROH segments were known in global populations and 2 were novel.

Risk variants and allele frequency differences among different populations
Allele frequencies at the reported 12 risk variants in different populations (including the study cohort) and the three population subgroups of the study cohort are listed in supplemental Table S5. Fisher's exact tests failed to reveal any statistically significant difference (P-value threshold of 0.004) in allele frequencies at any of the identified 12 risk variants between our study cohort and 1KGP populations.

Gene expression regulation by the identified risk variants
Examination of Genotype-Tissue Expression (GTEx) (https://www.gtexportal.org) data to assess the involvement of the identified risk variants in gene expression regulation indicated that 6 of the reported 12 markers regulate their own or one or more of the reported 12 gene loci in blood and other tissues (supplemental Table S6): the LD partner (rs2799686) of LAD1 marker regulates LAD1; the CTTNBP2-LSM8 marker regulates LSM8; the PGAP3 marker regulates PGAP3 and STARD3; the LD partner (rs12897409) of the LINC00911-FLRT2 marker regulates FLRT2; the CDK12-NEUROD2 marker regulates PGAP3 and STARD3; and STARD3 regulates STARD3, PGAP3, and RP11-690G19. 3. Supplemental Data Set 2 presents an excel document that lists GTeX data for all the markers that are in LD with the 12 risk variants identified in this study.

Associations among lipid traits, insulin resistance-linked trait of FPG, and diabetes status
It is well-documented that lipid measures, such as TG and HDL, are associated with insulin resistance and the incidence of type 2 diabetes (65, 66); thus, it is interesting to evaluate the associations among the lipid and insulin resistance traits in our study cohort. Upon performing Pearson correlation tests to identify the existence of statistically significant associations between FPG (insulin resistancelinked trait) and the lipid traits, it was seen that FPG exhibited a direct correlation with TG (discovery cohort: r 2 = 0.28; P-value < 2.2E-16; replication cohort: r 2 = 0.31; P-value < 2.2E-16) and an inverse correlation with HDL (discovery cohort: r 2 = 0.11; P-value < 4.9E-05; replication cohort: r 2 = 0.173; P-value < 2.2E-16). Upon performing logistic regression models for each of these two traits for association with the diagnosis of diabetes in the participants, it was seen that high levels of TG and low levels of HDL were risk factors for diabetes; TG was associated with higher odds of being diabetic [OR = 1.37 (95% CI 1.  Table S8).
A literature survey indicated suggestive inferences on the involvement of the 12 identified gene loci in metabolic processes (supplemental Table S8). Presented below are gene loci for which the literature suggested a link to the trait of TG: The RSK1 protein (from RPS6KA1) is an important regulator of insulin signaling and glucose metabolism in the MAPK/ERK pathway (67). This protein can selectively phosphorylate insulin receptor substrate 1 (IRS1) and thereby prevent insulin resistance (68). RSK1deficient mice remain sensitive to insulin due to the loss of the negative feedback mechanism for insulin resistance (69)(70)(71). Thus, RSK1 has the potential to be involved in insulin resistance. The CERK converts ceramide to ceramide 1-phosphate, a sphingolipid metabolite. Ceramides play an active role in glucose homeostasis, insulin signaling, and, ultimately, the diabetes phenotype (72)(73)(74)(75); ceramides in conjunction with diacylglycerols mediate high TG and insulin resistance (76). It is further the case that CERK deficiency suppresses diet-induced obesity and improves glucose intolerance (77). Thus, CERK has impact on processes relating to insulin resistance. The SPP2, NPY1R, and FLRT2 proteins also have the potential to be involved in diabetes/obesity (see supplemental Table S8). In our study, the reported variants from RPS6KA1, CERK, SPP2, NPY1R, and FLRT2 are associated with TG. The TG trait is well-known for association with insulin resistance (65,66) in many populations, including Japanese (78), Asian Indians (79), Chinese (80), and Hispanics (81). A study using the Arab population from Oman has shown that elevation in serum TG level is associated with increasing levels of glucose in blood (82). Prevalence of high plasma TG levels in Arab individuals from Saudi Arabia with diabetes was significantly higher than in those without diabetes (83). Our study also finds that TG levels have positive correlations with the insulin resistance-linked trait of FPG, and HDL levels have inverse correlations with FPG; further, TG is associated with higher odds of diabetes.
Elevated activity of sialyltransferase (from ST6GALNAC5) in blood cells and increased levels of sialic acids are associated with coronary diseases (84); genetic analysis of Iranian  (55) shown, SNPs rs1002487, rs39745, rs1565922, rs9626773, rs11654954, and rs9972882 found to be known in worldwide population, whereas SNPs rs11805972, rs4663379, rs10033119, and rs7761746 representing/overlapping ROH regions in this population not found in (55)   subjects demonstrated that mutations in ST6GALNAC5 act as risk factors for coronary artery disease (85). In our study, the variant from ST6GALNAC5 is associated with TG. A microsatellite-based linkage study on Caucasian families with premature coronary artery disease and myocardial infarction showed that genes in the 1p31-32 region (in which the ST6GALNAC5 is located) influence TG level (86). Epidemiological and genetic evidence exists to support the notion that raised TG is an additional cause (87) and an independent risk factor (88-90) for cardiovascular disease and all-cause mortality; this notion also holds in Arab populations: serum concentrations of TG were significantly higher in the CHD+ compared with the CHD group of Saudi Arabian patients (91). Though global GWASs have identified many risk variants associated with lipid levels, none of these established markers emerged in our study. Even upon examining the associations with nominal values of P < 1.10E-05, only four established associations [involving the three CETP markers (rs3764261, rs1864163, and rs1800775) and a STARD3 marker (rs9972882), which is in LD with an established marker (rs1877031/STARD3)] could be concluded as transferable to the study population. The inability to reproduce the established European gene loci could have resulted from several aspects of the study design, including the study power, the low prevalence of causative European variants in the study population, or gene-environment interactions that masked the effects of the European variants in the study population. This concern may be resolved in our future studies by way of enlarging the cohort, expanding the genome-coverage for genotyping (e.g., by way of imputing), and performing conditional analysis of examining regions nearby the well-established gene loci.
There is a high level of inbreeding in the Arab region due to the practice of consanguineous marriages, often between first cousins. An overwhelming proportion (63%) of the disorders documented in the Catalogue of Transmission Genetics in Arabs (CTGA) (http://cags.org.ae/ctga/) follow a recessive mode of inheritance. A great burden of recessive alleles and homozygosity in the Kuwaiti population has been reported in our publications (33) and in our ongoing studies (S. E. John, D. Antony, M. Eaaswarkhanth, et al.; unpublished observations). Thus, it is not surprising to find that all the reported 12 risk variants appeared when the genetic model based on the recessive mode of inheritance was used and were seen harbored in ROH segments identified in our study cohort; and three of such ROH regions were not observed in global populations of other ancestries. Further, the four established markers (from CETP and STARD3), for which we had an indication of transferability to the study population, were also located in ROH regions; the ROH encompassing the CETP markers was novel. Inbreeding helps with the aggregation of risk alleles in families; three of the reported gene loci in this study were also associated with phenotypes characterizing familial aggregation. PGAP3 is linked to a congenital disorder of glycosylation (Mabry syndrome) (92,93); SMOC1 is associated with microphthalmia with limb anomalies (94); and RPS6KA1 is associated with Sézary syndrome (95).
Allele frequencies at the reported 12 markers did not differ significantly between Arab and global populations; however, it is possible that the impact of these risk alleles on phenotype traits is more pronounced in the Arab population.
In conclusion, by performing a GWAS followed by replication in an independent sample set of Arab subjects from Kuwait, we pinpointed 12 novel risk variants associated (under a genetic model based on recessive mode of inheritance) with plasma TG levels in this population. The study provided an indication of transferability of the established markers from CETP and STARD3 genes to the study population under a genetic model based on the additive mode of inheritance. Studies in culturally and geographically distinct ethnic populations, such as those of the Arabian Peninsula, that have been underrepresented in global genome survey studies will augment international efforts to identify the genetic reasons for variation in lipid traits.
The authors are extremely thankful to the Biostatistics and Epidemiology Department for their efforts and excellent work on recruiting participants, conducting interviews, collecting samples, and managing sample information data. The authors further thank Maisa Mahmoud for providing help with recruiting participants and Daisy Thomas for providing help with recruiting participants and collecting phenotype information. The Tissue Bank Core Facility is acknowledged for sample processing and DNA extraction.