Gene networks and pathways for plasma lipid traits via multitissue multiomics systems analysis

Genome-wide association studies (GWASs) have implicated ∼380 genetic loci for plasma lipid regulation. However, these loci only explain 17–27% of the trait variance, and a comprehensive understanding of the molecular mechanisms has not been achieved. In this study, we utilized an integrative genomics approach leveraging diverse genomic data from human populations to investigate whether genetic variants associated with various plasma lipid traits, namely, total cholesterol, high and low density lipoprotein cholesterol (HDL and LDL), and triglycerides, from GWASs were concentrated on specific parts of tissue-specific gene regulatory networks. In addition to the expected lipid metabolism pathways, gene subnetworks involved in “interferon signaling,” “autoimmune/immune activation,” “visual transduction,” and “protein catabolism” were significantly associated with all lipid traits. In addition, we detected trait-specific subnetworks, including cadherin-associated subnetworks for LDL; glutathione metabolism for HDL; valine, leucine, and isoleucine biosynthesis for total cholesterol; and insulin signaling and complement pathways for triglyceride. Finally, by using gene-gene relations revealed by tissue-specific gene regulatory networks, we detected both known (e.g., APOH, APOA4, and ABCA1) and novel (e.g., F2 in adipose tissue) key regulator genes in these lipid-associated subnetworks. Knockdown of the F2 gene (coagulation factor II, thrombin) in 3T3-L1 and C3H10T1/2 adipocytes altered gene expression of Abcb11, Apoa5, Apof, Fabp1, Lipc, and Cd36; reduced intracellular adipocyte lipid content; and increased extracellular lipid content, supporting a link between adipose thrombin and lipid regulation. Our results shed light on the complex mechanisms underlying lipid metabolism and highlight potential novel targets for lipid regulation and lipid-associated diseases.

Lipid metabolism is vital for organisms as it provides energy as well as essential materials such as membrane components and signaling molecules for basic cellular functions. Lipid dysregulation is closely related to many complex human diseases, such as atherosclerotic cardiovascular disease (CVD) (1), Alzheimer's disease (2,3), type 2 diabetes (T2D) (4), and cancers (5). The notion of targeting lipid metabolism to treat human diseases has been reinforced by the fact that many diseaseassociated genes and drug targets (e.g., HMGCR as the target of statins and PPARA as the target of fibrates) are involved in lipid metabolic pathways (6)(7)(8).
Accumulating evidence supports that plasma lipids are complex phenotypes influenced by both environmental and genetic factors (9,10). Heritability estimates for main plasma lipids are high [e.g., 70% for low density lipoprotein cholesterol (LDL) and 55% for high density lipoprotein cholesterol (HDL)] (11), indicating that DNA sequence variation plays an important role in explaining the interindividual variability in plasma lipid levels. Indeed, genome-wide association studies (GWASs) have pinpointed a total of 386 genetic loci, captured in the form of single nucleotide polymorphisms (SNPs) associated with lipid phenotypes (12)(13)(14)(15)(16). For example, the most recent GWAS on lipid levels identified 118 loci that had not previously been associated with lipid levels in humans, revealing a daunting genetic complexity of blood lipid traits (16).
However, there are several critical issues that cannot be easily addressed by traditional GWAS analysis. First, even very large GWAS may lack statistical power to identify SNPs with small effect sizes and as a result the most significant loci only explain a limited proportion of the genetic heritability, for example, 17.2-27.1% for lipid traits (17). Second, the functional consequences of the genetic variants and the causal genes underlying the significant genetic loci are often unclear and await elucidation. To facilitate functional characterization of the genetic variants, genetics of gene expression studies (18,19) and the ENCODE efforts (20) have documented tissue-or cell-specific expression quantitative trait loci (eQTLs) and functional elements of the human genome. These studies provide the much-needed bridge between genetic polymorphisms and their potential molecular targets. Third, the molecular mechanisms that transmit the genetic perturbations to complex traits or diseases, that is, the cascades of molecular events through which numerous genetic loci exert their effects on a given phenotype, remain elusive. Biological pathways that capture functionally related genes involved in molecular signaling cascades and metabolic reactions and gene regulatory networks formed by regulators and their downstream genes can elucidate the functional organization of an organism and provide mechanistic insights (21). Indeed, various pathway-and network-based approaches to analyzing GWAS datasets have been developed (18,(22)(23)(24) and demonstrated to be powerful to capture both the missing heritability and the molecular mechanisms of many human diseases or quantitative phenotypes (18,23,25,26). For these reasons, integrating genetic signals of blood lipids with multitissue multiomics datasets that carry important functional information may provide a better understanding of the molecular mechanisms responsible for lipid regulation as well as the associated human diseases.
In this study, we apply an integrative genomics framework to identify important regulatory genes, biological pathways, and gene subnetworks in relevant tissues that contribute to the regulation of four critical blood lipid traits, namely, total cholesterol (TC), HDL, LDL, and triglyceride (TG). We combine the GWAS results from the Global Lipids Genetics Consortium (GLGC) with functional genomics data from a number of tissue-specific eQTLs and the ENCODE project, and gene-gene relationship information from biological pathways and data-driven gene network studies. The integrative framework comprises four main parts ( Fig. 1): 1) Marker Set Enrichment Analysis (MSEA) where GWAS, functional genome, and pathways or coregulated genes are integrated to identify lipidrelated functional units of genes, 2) merging and trimming of identified lipid gene sets, 3) key driver analysis (KDA) to pinpoint important regulatory genes by further integrating gene regulatory networks, and 4) validation of key regulators using genetic perturbation experiments and in silico evidence. This integrated systems biology approach enables us to derive a comprehensive view of the complex and novel mechanisms underlying plasma lipid metabolism.

GWAS of lipid traits
The experimental design, genotyping, and association analyses of HDL, LDL, TC, and TG were described previously (12). The dataset used in this study comprises >100,000 individuals of European descent (sample size 100,184 for TC, 95,454 for LDL, 99,900 for HDL, and 96,598 for TG), ascertained in the United States, Europe, or Australia. More than 906,600 SNPs were genotyped using Affymetrix Genome-Wide Human SNP Array 6.0. Imputation was further carried out to obtain information for up to 2.6 million SNPs using the HapMap CEU (Utah residents with ancestry from northern and western Europe) panel. SNPs with minor allele frequency (MAF) <1% were removed. Finally, a total of 2.6 million SNPs tested for association with each of the four lipid traits were used in our study.

Genetic association study of lipid traits using MetaboChip
The experimental design, genotyping, and association analyses of the lipid MetaboChip study were described previously (15). The study examined subjects of European ancestry, including 93,982 individuals from 37 studies genotyped with the MetaboChip array, comprising 196,710 SNPs representing candidate loci for cardiometabolic diseases. There was limited overlap between the individuals involved in GWAS and those in MetaboChip.

Knowledge-based biological pathways
We included canonical pathways from the Reactome (version 45), Biocarta, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases (27,28). In addition to the curated pathways, we constructed four positive control pathways based on known lipid-associated loci (P < 5.0 × 10 −8 ) and candidate genes from the GWAS Catalog (29). These gene sets were based on 4, 11, 13, and 13 studies for TC, TG, LDL, and HDL, respectively (full lists of genes in each positive control sets are in supplemental Table S1), and serve as positive controls to validate our computational method.

Data-driven modules of coexpressed genes
Beside the canonical pathways, we used coexpression modules that were derived from a collection of genomics studies (supplemental Table S2) of liver, adipose tissue, human aortic endothelial cells (HAECs), brain, blood, kidney, and muscle (30)(31)(32)(33)(34)(35)(36)(37)(38)(39). A total of 2,706 coexpression modules were used in this study. Although liver and adipose tissue are likely the most important tissues for lipid regulation, we included the other tissue networks to confirm whether known tissue types for lipids could be objectively detected and whether any additional tissue types are also important for lipids.

Mapping SNPs to genes
Three different mapping methods were used in this study to link SNPs to their potential target genes.
Chromosomal distance-based mapping. First, we used a standard distance-based approach where a SNP was mapped to a gene if within 50 kb of the respective gene region. The use of ± 50 kb to define gene boundaries is commonly used in GWAS.
eQTL-based mapping. The expression levels of genes can be seen also as quantitative traits in GWAS. Hence, it is possible to determine eQTLs and the expression SNPs (eSNPs) within the eQTLs that provide a functionally motivated mapping from SNPs to genes. Moreover, the eSNPs within the eQTL are specific to the tissue where the gene expression was measured and can therefore provide mechanistic clues regarding the tissue of action when intersected with lipidassociated SNPs. Results from eQTL studies in human adipose tissue, liver, brain, blood, and HAEC were used in this study (30,(32)(33)(34)(38)(39)(40)(41)(42)(43)(44)(45). We included both cis-eSNPs (within 1 Mb distance from gene region) and trans-eSNPs (beyond 1 Mb from gene region), at a false discovery rate (FDR) <10%.
ENCODE-based mapping. In addition to the eQTLs and distance-based SNP-gene mapping approaches, we integrated functional data sets from the Regulome database (20), which annotates SNPs in regulatory elements in the Homo sapiens genome based on the results from the ENCODE studies (46).
Nine unique combinations of SNP-gene mapping. Using the above three mapping approaches, we derived nine unique sets of SNP-gene mapping. These are: eSNP adipose, eSNP liver, eSNP blood, eSNP brain, eSNP HAEC, eSNP all (i.e., combining all the tissue-specific eSNPs above); Distance (chromosomal distance-based mapping); Regulome (ENCODE-based mapping); and Combined (combining all the above methods).

Removal of SNPs in linkage disequilibrium
We observed a high degree of linkage disequilibrium (LD) in the eQTL, Regulome, and distance-based SNPs, and this LD structure may cause artifacts and biases in the downstream analysis. For this reason, we devised an algorithm to remove SNPs in LD while preferentially keeping those with a strong statistical association with lipid traits. Technical details are available in supplementary methods. We chose an LD cutoff (R 2 < 0.5) to remove redundant SNPs in high LD.

Marker Set Enrichment Analysis
We applied a modified MSEA method (24,47) to find pathways/coexpressed modules associated with lipid traits (supplemental methods). FDRs were estimated with the method by Benjamini and Hochberg (48). Pathways or coexpression modules with a FDR < 10% were considered statistically significant. MSEA was applied to both the GLGC GWAS dataset and the MetaboChip dataset. The combined FDR from these two datasets was expected to be <1% (10% × 10% = 1%).

Comparison between MSEA and other computational methods
To ensure that the pathway results from MSEA are reproducible, we used the improved gene-set-enrichment analysis (iGSEA) approach (49). In the iGSEA analysis, we generated gene sets using the same canonical pathways and coexpression modules in MSEA. The SNPs were mapped to genes using the default settings of iGSEA. For each given gene set, significance proportion-based enrichment score was calculated to estimate the enrichment of genotypephenotype association. Then, iGSEA performed label permutations to calculate nominal P-values to assess the significance of the pathway-based enrichment score and FDR to correct multiple testing, with a FDR < 25% (default setting) regarded as significant pathways. Considering that MSEA and iGSEA were independent, the combined FDR from these two methods of analysis was expected to be <5% (10% × 25% = 2.5%).

Construction of independent supersets and confirmation of lipid association
Because the pathways or coexpression modules were collected from multiple sources, there were overlapping or nested structures among the gene sets. To make the results more meaningful, we constructed relatively independent supersets that captured the core genes from groups of redundant pathways and coexpression modules (supplemental methods). After merging, we annotated each superset based on function enrichment analysis of the known pathways from the Gene Ontology and KEGG databases (P < 0.05 in Fisher's exact test after Bonferroni correction). The supersets were given a second round of MSEA to confirm their significance associated with lipids using P < 0.05 after Bonferroni correction as the cutoff.

Key driver analysis
We adopted a previously developed KDA algorithm (50-52) of gene-gene interaction networks to the lipidassociated supersets in order to identify the key regulatory genes (Fig. 1). In the study, we included Bayesian gene regulatory networks from diverse tissues, including adipose tissue, liver, blood, brain, kidney, and muscle (30)(31)(32)(33)(34)(35)(36)(37)(38). A key driver (KD) was defined as a gene that is directionally connected to a large number of genes from a lipid superset, compared with the expected number for a randomly selected gene within the Bayesian network (details in supplemental methods). The MSEA, merging, and KDA were performed using R.

Enrichment analysis of lipid-associated subnetworks in human complex diseases
We collected disease susceptibility genes from the GWAS Catalog with GWAS P < 10E-5 for four human complex diseases, including CVD ["myocardial infarction," "myocardial infarction (early onset)," "coronary artery calcification," and "coronary heart disease"], Alzheimer's disease, T2D, and cancer ("colon cancer," "breast cancer," "pancreas cancer," "prostate cancer," and "chronic lymphocytic leukemia"). Fisher's exact test was used to explore the enrichment of genes in the lipid-associated subnetworks in the disease gene sets. Bonferroni-corrected P < 0.05 was considered significant.
Validation of F2 in adipocyte functions via F2 siRNA transfection in 3T3-L1 and C3H10T1/2 adipocyte cell lines The mouse preadipocytes 3T3-L1 and C3H10T1/2 cells were obtained from ATCC and maintained and differentiated to adipocytes according to the manufacturer's instruction. For knockdown experiments, three predesigned siRNAs targeting F2 gene (sequences in supplemental Table S3; Gen-ePharma, Paramount, CA) were tested and the most effective one was selected for the experiment (supplemental Fig. S1). We first measured F2 expression during adipocyte differentiation and found increased F2 expression on days 8-10 in 3T3-L1 and days 6-10 in C3H10T1/2 during differentiation, which helped inform on the timing of siRNA transfection in these cell lines. 3T3-L1 adipocytes were transfected with 50 nM of F2 siRNA using Lipofectamin 2000 on day 7 (D7) of differentiation, a day before F2 increase. Followed by 72 h of siRNA treatment, adipocytes were processed for Oil red O staining of lipids and Real-time qPCR for select genes. C3H10T1/2 adipocytes were transfected with 50 nM of F2 siRNA using Lipofectamin 2000 on day 5 (D5) and day 7 (D7), and adipocytes were processed on day 9 (D9) for Oil red O staining of lipids, real-time qPCR for select genes, and quantitative lipid assays. As control, 50 nM of scrambled siRNA (GenePharma) was transfected at the same time points as the F2 siRNA in the two cell lines. To determine changes in lipid accumulation, adipocytes were stained by Oil red O stain solution. After obtaining images, Oil red O was eluted in isopropyl alcohol and absorbance values were measured at 490 nm.

RNA extraction and real-time qPCR
Total RNA was extracted from the adipocytes (Zymo Research, Irvine, CA), and RNA was reverse transcribed using cDNA Reverse Transcription Kit (Thermo Scientific, Madison, WI), real-time qPCR for select network and nonnetwork genes was performed using the primers shown in supplemental Table S3. Each reaction mixture (20 μl) is composed of PowerUp SYBR Green Master Mix (Applied Biosystems), 0.5 μM each primer, and cDNA (150 ng for F2 gene, 20-50 ng for the other genes). Each sample was tested in duplicate under the following amplification conditions: 95 • C for 2 min, and then 40 cycles of 95 • C for 1 s and 60 • C for 30 s in QuantStudio 3 Real-Time PCR System (Applied Biosystems, Foster City, CA). PCR primers were designed using the Primer-BLAST tool available from the NCBI web site (53). Melt curve was checked to confirm the specificity of the amplified product. Relative quantification was calculated using the 2ˆ(−ΔΔ CT) method (54). Beta-actin was used as an endogenous control gene to evaluate the gene expression levels. All data are presented as the mean ± SEM of n = 4/ group. Statistical significance was determined by two-tailed Student's t-test and values were considered statistically significant at P < 0.05.

Extraction and quantification of lipids in cells and media
Lipids were extracted from C3H10T1/2 cells and culture media using the Folch method (55) with minor modifications. Briefly, whole culture medium (1 ml) from each well of a 12well plate was collected in a separate tube. Cells were washed with phosphate buffered saline (PBS) and collected in 1 ml PBS and homogenized. The media or cell homogenate was mixed in 5 ml of chloroform:methanol (2:1, vol/vol) by shaking vigorously several times and centrifuged at 2,500 g for 15 min. The bottom organic layer was transferred to a new glass tube. The remaining aqueous phase and interphase including the soluble protein were mixed with 5 ml chloroform by vigorous shaking, followed by centrifugation at 2,500 g for 15 min. The bottom organic layer was combined with the first collected organic layer. The combined organic phase was evaporated using nitrogen, and then the dried lipids were resuspended in 0.5% Triton X-100 in water. Samples were stored in −80 • C until lipid analysis. TG, TC, unesterified cholesterol (UC), and phospholipid levels in lipid extractions from cells and from culture media were measured separately using a colorimetric assay at the UCLA GTM Mouse Transfer Core (56). Intracellular lipids were normalized to the cellular protein amount measured by BCA protein assay kit (Pierce, Rockford, IL). Extracellular lipids are presented as lipid quantity in 1 ml of collected media.

Identification of pathways and gene coexpression modules associated with lipid traits
To asses biological pathway enrichment for the four lipid traits with GLGC GWAS, we curated a total of 4,532 gene sets including 2,705 tissue-specific coexpression modules (i.e., highly coregulated genes based on tissue gene expression data) and 1,827 canonical pathways from Reactome, Biocarta and KEGG. These gene sets were constructed as data-and knowledgedriven functional units of genes. Four predefined positive control gene sets for HDL, LDL, TC, and TG were also created based on candidate genes curated from the GWAS catalog (57). To map potential functional SNPs to genes in each gene set, tissue-specific eQTLs, ENCODE functional genomics information, and chromosomal distance-based mapping were used (details in Methods). Tissue-specific eQTL sets were obtained from the GTEx database from studies on human adipose tissue, liver, brain, blood, and HAECs, and a total of nine SNP-gene mapping methods were created. The liver and adipose tissues have established roles in lipid regulation, whereas the other tissues are included for comparison.
On integration of the datasets mentioned above using MSEA, we identified 65, 86, 90, and 92 gene sets whose functional genetic polymorphisms showed significant association with HDL, LDL, TC, and TG, respectively, in GLGC GWAS (FDR < 10%; supplemental Table S4). The predefined positive controls for the four lipid traits were among the top signals for their corresponding traits ( Table 1), indicating that our MSEA method is sensitive in detecting true lipid trait-associated processes. Compared with other tissues, more pathways were captured when using liver and adipose eSNPs to map GWAS SNPs to genes (supplemental Table S4). For example, 56 of the 86 LDL-associated pathways were found when liver and adipose eSNPs were used in our analysis. These results confirmed the general notion that liver and adipose Yes Yes a The Traits columns represent in which methods the MSEA of the pathways is significant with FDR < 10%. Numbers 1-9 represent adipose eSNP (1), blood eSNP (2), brain eSNP (3), human aortic endothelial cells (HAEC) eSNP (4), liver eSNP (5), all eSNP (6), distance (7), regulome (8), and combined (9), respectively. The MetaboChip and iGSEA columns tell whether the gene set can also be detected as statistically significant in the analysis. tissue play critical roles in regulating plasma lipids, leading us to focus the bulk of our analysis on these two tissues, with the remaining tissues serving as a supplement.
Among the significant gene sets, 39 were shared across the four lipid traits. These gene sets represented the expected lipid metabolic pathways as well as those that are less known to be associated with lipids, such as "antigen processing and presentation," "cell adhesion molecules," "visual phototransduction," and "IL-5 signaling pathway" (summary in Table 1; details in supplemental Table S4). We broadly classified the common gene sets detected into "positive controls," "lipid metabolism," "interferon signaling," "autoimmune/immune activation," "visual transduction," and "protein catabolism" (Table 1).

Replication of lipid-associated pathways using additional dataset and method
To replicate our results from the analysis of GLGC GWAS datasets, we utilized an additional lipid genetic association dataset based on a MetaboChip lipid association study (15), which involved individuals independent of those included in GLGC. The gene sets detected using this independent dataset highly overlapped with those from the GLGC dataset (Table 1; supplemental Fig. S2; overlapping P values < 10 −20 by Fisher's exact test). We also utilized a different pathway analysis method iGSEA (49) and again many of the gene sets were found to be reproducible (Table 1;

Construction of nonoverlapping gene supersets for lipid traits
As the knowledge-based pathways and data-driven coexpression modules used in our analysis can converge on similar functional gene units, some of the lipid-associated gene sets have redundancies. We therefore merged overlapping pathways to derive independent, nonoverlapping gene sets-associated lipid traits. For the 39 shared pathways/coexpression modules across the four lipid traits described earlier, we merged and functionally categorized them into five independent supersets (Table 1; Table 3). For the significant gene sets for each lipid trait, we merged them into 17, 16, 18, and 14 supersets for HDL, LDL, TC, and TG, respectively (Table 3; supplemental Table S5), and confirmed that the merged supersets still showed significant association with the corresponding lipid traits in a second round of MSEA (P < 0.05 after Bonferroni correction for the number of supersets tested; Table 3).

Identification of central regulatory genes in the lipid-associated supersets
Subsequently, we performed a KDA (Fig. 1) to identify potential regulatory genes or KDs that may regulate genes associated with each lipid trait using Bayesian networks constructed from genetic and gene expression datasets of multiple tissues (detailed in Methods; full KD list in supplemental Table S6). The top adipose and liver KDs for the shared supersets of all four lipid traits and the representative Bayesian subnetworks are shown in Fig. 2.
In adipose tissue ( Fig. 2A), the top KDs for the "lipid metabolism" subnetwork include well-known lipoproteins and ATP-binding cassette (ABC) family members that are responsible for lipid transport, such as APOF, APOA5, and ABCB11. We also found several KDs that are less known to be associated with lipid metabolism, particularly F2 (coagulation factor II or thrombin). For the autoimmune/immune activation subnetwork, CD86, HCK, and HLA-DMB were identified as KDs. PSMB9 was a KD for the protein catabolism subnetwork, whereas NUP210 is central for the interferon signaling subnetwork. Moreover, the SYK gene is a shared KD between lipid metabolism and autoimmune/immune activation.
In the liver (Fig. 2B), the top KDs for the lipid metabolism subnetwork are enzymes involved in lipid and cholesterol biosynthesis and metabolism, such as FADS1 (fatty acid desaturase 1), FDFT1 (farnesyl-diphosphate farnesyltransferase 1), HMGCS1 (3-hydroxy-3-methylglutaryl-CoA synthase 1), and DHCR7 (7-dehydrocholesterol reductase). We also identified more KDs for the interferon signaling subnetwork in the liver compared with the adipose tissue, with MX1, MX2, ISG15, IFI44, and EPSTI1 being central to the subnetwork. Similar to the adipose network, PSMB9 and HLA-DMB were also identified as KDs for protein catabolism and autoimmune/immune activation subnetworks in liver, respectively. We did not detect KD genes for the visual transduction subnetwork in either tissue, possibly because the networks of liver and adipose tissues did not capture gene-gene interactions important for this subnetwork.
In addition to the KDs for the subnetworks shared across lipid traits as discussed above, we identified tissue-specific KDs for individual lipid traits (supplemental Table S6). In adipose, PANK1 and H2B histone family members were specific for the HDL  9 7,8,9 7,8,9 7,8,9a The method column represents in which methods the MSEA of the pathways is significant with Bonferroni-adjusted P < 0.05. Numbers 1-9 represent: adipose eSNP (1), blood eSNP (2), brain eSNP (3), human aortic endothelial cells (HAEC) eSNP (4), liver eSNP (5), all eSNP (6), distance (7), regulome (8), and combined (9), respectively. subnetworks (Fig. 3A); HIPK2 and FAU were top KDs for the LDL subnetworks (Fig. 3B); genes associated with blood coagulation such as KNG1 and FGL1 were KDs for the TC and TG subnetworks (Fig. 3C, D). Of interest, genes related to insulin resistance, PPARG and FASN, were KDs for both the HDL and TG subnetworks. Similarly, trait-specific KDs and subnetworks were also detected in the liver; 37 KDs were identified for the TG subnetwork including ALDH3B1 and ORM2, whereas AHSG, FETUB, ITIH1, HP, and SERPINC1 were KDs found in the LDL subnetwork. We note that most of the KDs are themselves not necessarily GWAS hits but are surrounded by significant GWAS genes. For example, gene F2 is centered by many GWAS hits in the adipose subnetwork (APOA4, APOC3, APOA5, LIPC, etc.; Fig. 2; supplemental Fig. S3). The observation of GWAS hits being peripheral nodes in the network is consistent with previous findings from our group and others (24,(58)(59)(60)(61)(62) and again supports that important regulators may not necessarily harbor common variations owing to evolutionary constraints.

Experimental validation of F2 KD subnetworks in 3T3-L1 and C3H10T1/2 adipocytes
Taking into account that the F2 gene is surrounded by various significant GWAS hits within its subnetwork, we aimed to validate the role of the F2 gene subnetwork in lipid regulation through siRNA-mediated knockdown experiments in two adipocyte cell lines (3T3-L1 and C3H10T1/2) to ensure reproducibility and robustness of our results. We found that F2 gene expression was low in preadipocytes for both cell lines but gradually increased during adipogenesis. In fully differentiated adipocytes between day 8 and day 10, the F2 gene expression level was higher than in preadipocytes by 12fold and sixfold for 3T3-L1 and C3H10T1/2 lines, respectively (Fig. 4A, B). When treated with F2 siRNA, both adipocyte cell lines showed a significant decrease (P < 0.01) in lipid accumulation based on Oil red O staining, as compared with controls treated with scrambled siRNA (Fig. 4C, D). Subsequently, we tested the effect of F2 gene siRNA knockdown on 10 neighbors of the F2 gene in the adipose network (selected from Fig. 2A). With 60% knockdown efficiency of F2 siRNA in the 3T3-L1 adipocytes, seven F2 network neighbors (Abcb11, Apoa5, Apof, Fabp1, Lipc, Gc, and Proc) exhibited significant changes in expression levels (Fig. 4E). With 74% knockdown efficiency of F2 in C3H10T1/2 adipocytes, six F2 network neighbors (Abcb11, Apoa5, Apof, Fabp1, Lipc, and Plg) showed significant changes in expression levels (Fig. 4F). Several of these genes are involved in lipoprotein transport and adipose subnetwork genes and negative control genes after siRNA knockdown. At day 7 of differentiation of 3T3-L1 and day 5 and day 7 of differentiation of C3H10T1/2, adipocytes were transfected with F2 siRNA for the knockdown experiments. Ten F2 neighbors were randomly selected from the first-and second-level neighboring genes of F2 in adipose network. Four negative controls were randomly selected from the genes not directly connected to F2 in the adipose network. G, H: The fold changes of fatty acid uptake. In contrast, none of the four negative controls (random genes not in the F2 network neighborhood) showed significant changes in their expression levels for the 3T3-L1 cell line. However, one negative control gene (Snrpb2) did change in the C3H10T1/2 cell line. These results overall support our computational predictions on the structures of F2 gene subnetworks. Next, we measured the expression levels of genes related to adipogenesis (Pparg, Cepba, Srepb1, Fasn), lipolysis (Lipe), fatty acid transport (Cd36, Fabp4), and other adipokines following F2 siRNA treatment. We found no change in the expression of most of the tested genes, with the exception of Fasn (in C3H10T1/2), important in the formation of long-chain fatty acids, and Cd36 (in both 3T3-L1 and C3H10T1/2), which encodes fatty acid translocase facilitating fatty acid uptake. Cd36 expression was decreased by 15% in 3T3-L1 cells (Fig. 4G) and 35% in C3H10T1/2 cells (Fig. 4H) (P < 0.05), and Fasn expression was decreased by 25% (Fig. 4H) (P < 0.01) in C3H10T1/2 cells compared with control. The decreases in Cd36 and Fasn after F2 knockdown suggest that fatty acid synthesis and uptake by adipocytes are compromised, which could contribute to alterations in circulating lipid levels.
We subsequently measured the lipid contents within the cells and in the media of C3H10T1/2 adipocytes. Following F2 siRNA treatment, we found significant decreases in the total intracellular lipid levels (cTotal Lipid), total cholesterol (cTC), and unesterified cholesterol (cUC), as well as a nonsignificant trend for decreased triglycerides (cTG) (Fig. 4I). By contrast, in the culture media, there were significant increases in the total lipid levels (mTotal Lipid) and triglycerides (mTG) following F2 siRNA treatment (Fig. 4J). These results support that F2 knockdown led to decreased intracellular lipids and increased extracellular lipids, agreeing with the overall decreased expression of F2 network neighbor genes involved in lipid transport and uptake.

The association between the lipid subnetworks and human diseases
Epidemiological studies consistently show that plasma lipids are closely associated with human complex diseases. For example, high TC and LDL levels are associated with an increased risk of CVD. Here, we examined the association between the lipid subnetworks identified in our study and four human complex diseases, namely, Alzheimer's disease, CVD, T2D, and cancer (Materials and Methods). We found that the gene supersets identified for each lipid traits were significantly enriched for GWAS candidate genes reported by GWAS catalog for the four diseases at Bonferroni-corrected P < 0.05 ( Fig. 5; supplemental Table S7). The superset lipid metabolism, which was shared across lipid traits, was associated with Alzheimer's disease and CVD. When trait-specific subnetworks were considered, those associated with TC, LDL, and TG had more supersets associated with CVD compared with those associated with HDL, a finding consistent with recent reports (15,63,64). In addition, supersets of each lipid trait, except HDL, were also found to be significantly associated with cancer, whereas supersets associated with HDL, LDL, and TG, but not TC, were linked to T2D.

DISCUSSION
To gain comprehensive insights into the molecular mechanisms of lipid traits that are important for numerous common complex diseases, we leveraged the large volume of genomic datasets and performed a data-driven multiomics study combining genetic association signals from large lipid GWASs, tissue-specific eQTLs, ENCODE functional data, known biological pathways, and gene regulatory networks. We identified diverse sets of biological processes, guided by their tissue-specific gene-gene interactions, to be associated with individual lipid traits or shared across lipid traits. Many of the lipid-associated gene sets were significantly linked to multiple complex diseases including CVD, T2D, cancer, and Alzheimer's disease. More importantly, we elucidated tissue-specific gene-gene interactions among the gene sets and identified both well-characterized and novel KDs for these lipidassociated processes. We further experimentally validated a novel adipose lipid regulator, F2, in two different adipocyte cell lines. Our results offer new insight into the molecular regulation of lipid metabolism, which would not have been possible without the systematic integration of diverse genetic and genomic datasets.
We identified shared pathways associated with all four lipid traits, including lipid metabolism and autoimmune/immune activation, which have been consistently linked to lipid phenotypes, as well as additional pathways such as interferon signaling, protein catabolism, and visual transduction. Interferon factors have previously been linked to lipid storage attenuation and adipokine/adipogenesis-related genes in 3T3-L1 (G) and in C3H10T1/2 (H). Gene expression levels were determined by RT-qPCR, normalized to beta-actin. The fold changes were relative to scrambled siRNA control. Sample size n = 4/group. I, J: Lipid profiles: total lipid, triglyceride (TG), total cholesterol (TC), unesterified cholesterol (UC), and phospholipid (PL) in C3H10T1/2 cells (I) and in media (J). Total Lipid was estimated using the sum of the four lipids (TG, TC, UC, PL). Intracellular lipids plotted in (I) were normalized to total cellular protein quantity. Extracellular lipids plotted in (J) are presented as lipid quantity in 1 ml of collected media. Sample size n = 6/group. Results represent mean ± SEM. Statistical significance was determined by two-sided Student's t-test (*P < 0.05 and **P < 0.01). differentiation in human adipocytes (65). Protein catabolism has only recently been identified to be important in regulating lipid metabolism through the PSMD9 protein, which had no previously known function but was shown to cause significant alterations in lipid abundance in both a gain of function and loss of function study in mice (66). The visual transduction superset contains retinol-binding proteins, which are carrier proteins involved retinol transport, and play key roles in gene expression regulation and developmental processes (67). Visual transduction also shares lipoprotein genes with lipid metabolism, suggesting that retinol-related signal transduction is intimately linked to lipoprotein transport and hence plasma lipid levels.
Furthermore, our results indicate that the traitspecific supersets are tissue specific. For example, most TG-specific pathways were found to be significant when adipose eSNPs were used, and complement and insulin signaling pathways in the adipose tissue were specific for TG. This is in line with adipose tissue functioning as the major storage site for TG and the regulatory role of immune system and insulin signaling in adipocyte functions and fat storage (68). We also found five HDL-specific pathways, most of which are associated with glucose, lipid, and amino acid metabolism and were signals derived from liver eSNPs. As HDL acts as the major vehicle for transporting cholesterol to the liver for excretion and catabolism, the critical role of the liver as well as the connections between major metabolic pathways in HDL regulation is recapitulated by our analysis. Of interest, the TCspecific pathways can be found only when brain eSNPs are used. The brain accounts for 2% of body weight, whereas it contains 23% of TC in the body (69), and dysregulated cholesterol trafficking appears to be involved in the pathogenesis of neurodegenerative diseases, such as Parkinson's and Alzheimer's disease (70). These tissue-and trait-specific pathways or processes support the unique features of each lipid species and point to tissue-specific targeting strategies to modulate levels of individual lipid traits and the associated diseases.
In addition to detecting trait-and tissue-specific causal pathways for the lipid traits, our study attempted to delineate the interactions between lipid genes and pathways through gene network analysis. Indeed, the tissue-specific gene networks revealed in our study highlight the regulatory connections between lipid genes and pathways and thus put individual genes in a broader context. The identification of KDs in a network is essential for uncovering key regulatory components and for identifying drug targets and biomarkers for complex diseases (24,71). Here, we adopted data-driven Bayesian gene regulatory networks that combine various genomic data (50) to detect the central genes in plasma lipid regulation. The power of this data-driven objective approach has been demonstrated recently (24,51,60,61,72,73) and is again supported in this study by the fact that many KDs detected are known regulators for lipids or have served as effective drug targets based on the DrugBank database (74). For instance, for the shared lipid metabolism subnetwork, four top KDs (ACAT2, ACSS2, DHCR7, and FADS1) are targeted by at least one US Food and Drug Administration-approved anticholesteremic drug. Another KD, HMGCS1, is a rate-limiting enzyme of cholesterol synthesis, and is considered a promising drug target in lipid-associated metabolic disorders (75). These lines of evidence lead us to speculate that the other less-studied KDs are also important for lipid regulation.
Among the top network KDs predicted, several, including F2, KLKB1, and ANXA4, are involved in blood coagulation. A previous study revealed that polymorphisms in the anticoagulation genes modify the efficacy of statins in reducing the risk of cardiovascular events (76), which in itself is not surprising. However, the intimate relationship between a coagulation gene F2 and lipid regulation predicted by our analysis is intriguing (Fig. 4). We found that the partner genes in the adipose F2 subnetwork tend to be differentially expressed after F2 knockdown in both 3T3-L1 and C3H10T1/2 adipocytes, with several of the altered genes (Apoa5, Apof, Abcb11, Fabp1, Fasn, and Cd36) closely associated with cholesterol and fatty acid transport and uptake. We further observed that F2 knockdown affects lipid storage in adipocytes, with a decrease in the intracellular lipid content and an increase in the extracellular lipid content in the media. Of interest, the F2 expression level is low in preadipocytes and only increases during the late phase of adipocyte differentiation. Our findings support a largely untapped role of F2 in lipid transport and storage in adipocytes and provide a novel target in the F2 gene.
In addition to the shared KDs such as F2 for different lipids, it may also be of value to focus on the trait-specific KDs as numerous studies have revealed that these lipid phenotypes play different roles in many human diseases. For example, LDL and TC are important risk factors for CVD (77) and TG has been linked to T2D (78), whereas the role of HDL in CVD has been controversial (79). We detected 37 genes as TG-specific KDs in liver regulatory subnetworks. Among these, CP (ceruloplasmin) and ALDH3B1 (aldehyde dehydrogenase 3 family, member B1) were clinically confirmed to be associated with T2D (80,81) whereas most of the other genes such as DHODH and ANXA4 were less known to be associated with TG and thus may serve as novel targets. In adipose tissue, genes important for insulin resistance and diabetes such as PPARG and FASN were found to be KDs for TG, further supporting the connection between TG and diabetes. In addition, FASN has been implicated as a KD in numerous studies for nonalcoholic fatty liver disease (62,73,82), again highlighting the importance of this gene in common metabolic disorders.
We acknowledge some potential limitations to our study. First, the GWAS datasets utilized are not the most recently conducted and therefore provide the possibility of not capturing the full array of unknown biology. However, despite this, our results are consistent with the biology found more recently including overlapping signals in pathways for chylomicron-mediated lipid transport and lipoprotein metabolism (83) as well as more novel findings such as visual transduction pathways. In addition, one of our KDs KLKB1, which was not found to be a GWAS hit in the dataset we utilized, has since been found to pass the genome-wide significance threshold in more recent larger GWASs and is a hit on apolipoprotein A-IV concentrations, which is a major component of HDL and chylomicron particles important in reverse cholesterol transport (84). This further exemplifies the robustness of our integrative network approach to find key genes important to disease pathogenesis even when smaller GWASs were utilized.
In summary, we used an integrative genomics framework to leverage a multitude of genetic and genomic datasets from human studies to unravel the underlying regulatory processes involved in lipid phenotypes. We not only detected shared processes and gene regulatory networks among different lipid traits but also provide comprehensive insight into traitspecific pathways and networks. The results suggest there are both shared and distinct mechanisms underlying very closely related lipid phenotypes. The tissuespecific networks and KDs identified in our study shed light on the molecular mechanisms involved in lipid homeostasis. If validated in additional population genetic and mechanistic studies, these molecular processes and genes can be used as novel targets for the treatment of lipid-associated disorders such as CVD, T2D, Alzheimer's disease, and cancers.

Data availability
All genomic data utilized in the analysis were previously published and were downloaded from public data repositories. All experimental data were presented in the current manuscript. Mergeomics code is available at R Bioconductor https://doi.org/10.18129/B9.bioc. Mergeomics.