The Hybrid Mouse Diversity Panel: a resource for systems genetics analyses of metabolic and cardiovascular traits

The Hybrid Mouse Diversity Panel (HMDP) is a collection of approximately 100 well-characterized inbred strains of mice that can be used to analyze the genetic and environmental factors underlying complex traits. While not nearly as powerful for mapping genetic loci contributing to the traits as human genome-wide association studies, it has some important advantages. First, environmental factors can be controlled. Second, relevant tissues are accessible for global molecular phenotyping. Finally, because inbred strains are renewable, results from separate studies can be integrated. Thus far, the HMDP has been studied for traits relevant to obesity, diabetes, atherosclerosis, osteoporosis, heart failure, immune regulation, fatty liver disease, and host-gut microbiota interactions. High-throughput technologies have been used to examine the genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes of the mice under various environmental conditions. All of the published data are available and can be readily used to formulate hypotheses about genes, pathways and interactions.

studied, increasing the accuracy of the data that are collected, and results derived from different studies of the HMDP can be integrated. For example, transcriptomic data obtained in one study ( 1 ) were used to interpret proteomic data ( 7 ) and metabolic data ( 8 ) obtained from a separate set of mice.

High-resolution association mapping
The ability to perform high-resolution association mapping in the HMDP is based on the inclusion of about 30 "classic" inbred strains, which have undergone many generations of recombination since their origins from stocks of pet mice ( 9 ). This makes it possible to carry out association analysis much as in a human GWAS. Generally, it is possible to map complex traits to one to two megabase regions containing fi ve to 20 genes or less using the HMDP, depending on the level of linkage disequilibrium and gene density of the region ( 1 ). This resolution is at least an order of magnitude improved as compared with traditional linkage analysis. For example, Fig. 1 shows the mapping of a cis -expression quantitative trait locus (eQTL) in the HMDP and an F2 intercross. One important point to note is that because the classic inbred strains exhibit very signifi cant population structure, it is essential that this is corrected to avoid false positive associations. This is conveniently accomplished using mixed model algorithms such as EMMA ( 10 ) or FaST-LMM ( 11 ). These algorithms essentially perform a t -test for association while correcting for population structure using a kinship matrix based on genotypes. Genome-wide signifi cance is determined using simulation, a Bonferroni correction, or a false discovery rate ( 1,12 ).

Mapping power
With only 100 inbred strains in the HMDP, mapping power is considerably limited as compared with large intercrosses between pairs of inbred strains or human GWASs with thousands of samples. Nevertheless, simulation studies suggest that there is reasonable power to map loci that explain 5% or more of the trait variance ( 1 ). Because, as in humans, there are likely to be hundreds of loci that contribute to complex clinical traits, the mapping will generally detect only the handful of loci with strongest effects. Power can be increased by examining additional inbred and RI strains that have been genotyped ( 5,13 ), but for practical reasons most studies have been limited to about 100 strains. Power can also be considerably increased while retaining high resolution by performing meta-analysis that incorporates data from traditional crosses ( 14,15 ). Molecular phenotypes, such as transcript levels, protein levels, and metabolite levels, are generally determined by a much smaller number of loci than clinical traits and there is adequate power to map at least the major loci affecting these. For example, using expression arrays to quantitate liver transcript levels, about 2,500 signifi cant cis -expression quantitative trait loci (eQTLs) were detected in liver ( 1 ), while about 5,000 cis-eQTLs were detected in cultured macrophages ( 16 ). pathways, examine gene-by-environment, study host-gut microbiome relationships, and prioritize human genomewide association study (GWAS) candidate genes.
We anticipate that this review will primarily be of interest to cardiometabolic investigators interested in using data from the HMDP to help guide their research. Therefore, at the end of the review, in the Database section, we have discussed the kinds of questions that can be addressed using the data. Also, because many cardiometabolic researchers may not be versed in genetics approaches, we have defi ned some of the terms and concepts used in this review in Table 2 .

THE HMDP
The HMDP was developed as a systems genetics resource similar to recombinant inbred (RI) strain sets ( 2,3 ) or chromosome substitution strains ( 4 ), but with the added advantage of high-resolution association mapping ( 1 ). It consists of a set of 30 classic inbred strains chosen for diversity plus 70 or more RI strains derived primarily from strains C57BL/6J and DBA/2J (the BxD RI set) and A/J and C57BL/6J (the AxB and BxA RI sets). The classic strains provide mapping resolution, while the RI strains provide power. All of the chosen strains are commercially available from the Jackson Laboratory (https://www.jax. org) and all have been either sequenced (www.sanger. ac.uk/science/data/mouse-genomes-project) or densely genotyped ( 5 ).

Cumulative data
In common with RI strains ( 6 ), the HMDP resource is renewable in the sense that the inbred strains are permanent. This allows multiple mice of the same genotype to be  Strains in which a small region of the genome from one strain has been placed, by repeated crossing, onto the genetic background of a second strain. Correlation In statistics, a measure of the strength and direction of a linear relationship between two variables. Usually measured as a correlation coeffi cient. eQTL A genetic locus that controls the levels of a transcript. GWAS An examination of common genetic variation across the genome designed to identify associations with traits such as common diseases. Typically, several hundred thousand SNPs are interrogated using microarray technologies.

Haplotypes
Combinations of alleles at genetic loci that are inherited together. Heritability An estimate of the proportion of genetic variation in a population that is attributable to genetic variation among individuals. Inbred strains Strains in which a set of naturally occurring genetic variations have been fi xed by many generations of inbreeding. Linkage analysis Analysis of the segregation patterns of alleles or loci in families or experimental crosses. Such analysis is commonly used to map genetic traits by testing whether a trait cosegregates with genetic markers whose chromosomal locations are known. LD In population genetics, LD is the nonrandom association of alleles. For example, alleles of SNPs that reside near one another on a chromosome often occur in nonrandom combinations owing to infrequent recombination. LD should not be confused with genetic linkage, which occurs when genetic loci or alleles are inherited jointly, usually because they reside on the same chromosome.

LD blocks
Regions of high correlation across genetic markers, which results from their linkage in cis on a chromosome and thus infrequent recombination during meiosis. LD blocks are often demarcated by recombination hot spots Modules In the context of network modeling, groups of components that are tightly connected or correlated across a set of conditions, perturbations or genetic backgrounds. Natural genetic variation Genetic variation that is present in all populations as a result of mutations that occur in the germline; the frequencies of such mutations in populations are affected by selection and by random drift. This is in contrast with experimental variation that is introduced by techniques such as gene targeting and chemical mutagenesis. QTL A genetic locus that infl uences complex and usually continuous traits, such as blood pressure or cholesterol levels.

RI strains
A set of inbred strains that is generally produced by crossing two parental inbred strains and then inbreeding random intercross progeny; they provide a permanent resource for examining the segregation of traits that differ between the parental strains.

Systems genetics
A global analysis of the molecular factors that underlie variability in physiological or clinical phenotypes across individuals in a population. It considers not only the underlying genetic variation but also intermediate phenotypes such as gene expression, protein levels and metabolite levels, in addition to gene-by-gene and gene-by-environment interactions.

Trans -regulatory factors
Factors which regulate the transcription of genes at a distance. Examples are transcription factors and microRNAs.

Fig. 1.
Greatly increased mapping resolution in the HMDP as compared with a traditional cross between two inbred strains. Shown is the mapping of a strong cis -eQTL, for the gene Cyp2c37 , by linkage in an F2 cross (blue line) or by association in the HMDP (black dots). The position of the gene is indicated by the red box.
The F2 cross included about 300 mice and global transcript levels were determined using microarrays. The fi gure is reprinted from ( 44 ), with permission .

Genetic diversity
The HMDP panel includes about 4,000,000 common SNPs, roughly similar to the number of common SNPs in human populations ( 17 ), and there is substantial variation of most clinical traits that have been examined, as discussed below. In contrast, the Collaborative Cross and the Diversity Outbred ( 18 ) include "wild-derived" strains, which increase the diversity by an order of magnitude ( 17 ). While there will certainly be greater total variation of most complex traits in the Collaborative Cross, there will also be greater genetic complexity, potentially complicating genetic dissection. Among the HMDP mice, about 40% of genes exhibit signifi cant cis -eQTLs in various tissues, and the vast majority of genes exhibit secondary ( trans -regulated) genetic variation.

Relevance to complex human diseases
If the mouse is to serve as a model of common metabolic and cardiovascular traits, it is important that the relevant pathways be conserved in the two species. One measure of such conservation is the degree of overlap between mouse and human GWAS data. Studies in the HMDP for osteoporosis ( 19,20 ), obesity ( 21 ), blood cell levels ( 22 ), and heart failure ( 23 ) suggest that the overlap will be substantial. We discuss an example of pathway conservation in the section on fatty liver disease.

SYSTEMS GENETICS
The power of the HMDP for analysis of complex traits derives from the integration of genetics with global information is apparent at the "hotspot" loci where differences in DNA methylation at a single locus can be seen to infl uence the levels of multiple transcripts, proteins, and metabolites.
As illustrated below, omics data can be used to identify candidate genes for clinical traits using correlation and causality testing (30)(31)(32). Interactions between genes and their relationships to clinical traits can also be examined using enrichment analyses or network modeling ( 33,34 ). Finally, subclinical phenotypes can provide an additional useful "bridge" between molecular phenotypes and the more complex clinical traits; for example, Attie and Kebede studied insulin secretion by isolated pancreatic ␤ cells as a subphenotype for diabetes ( 35 ). In the sections below, we discuss the various datasets that have been generated and provide examples of the types of analyses that have been performed.

Osteoporosis
Bone mineral density (BMD), a trait relevant to osteoporosis, is highly heritable in mice. Farber and colleagues examined variation of BMD among the HMDP strains and, using association and network modeling, have uncovered several novel genes, some of which also infl uence BMD in humans ( 19,20 ). GWASs in the HMDP for total body, spinal, and femoral BMD revealed four signifi cant associations (chromosomes 7, 11, 12, and 17) harboring between 14 and 112 genes each . This was reduced to 26 functional candidates by identifying those genes that were regulated by local eQTLs in bone or that harbored potentially functional nonsynonymous coding variants. A candidate at the strongest locus (chromosome 12) was a nonsynonymous SNP in the additional sex combs-like 2 ( Asxl2 ) gene . The role of the gene was confi rmed by showing that Asxl2 knockout mice exhibit reduced BMD ( 19 ) and this has been confi rmed in subsequent studies ( 36 ). It is noteworthy that the human ASXL2 locus exhibits a suggestive association with BMD.
To model biologic interactions of genes involved in BMD, the investigators used coexpression network analysis, an approach that partitions genes into modules, along with causality modeling ( 31,37 ). A graphic representation of one such module enriched in BMD genes is shown in Fig. 3 . Such network modeling studies suggested a function for Asxl2 in osteoclast differentiation and this was validated by showing that knockdown of Asxl2 in bone marrow macrophages impaired their ability to form macrophages. Two additional genes involved in osteoblast differentiation, Maged 1 and Pard6g , were identifi ed using analyses of a coexpression network module containing many genes that defi ne the osteoblast lineage. Furthermore, the module was shown to be strongly regulated by the Wnt signaling agonist, Sfrp1 ( 38 ). Recently, bone expression data from the HMDP were used to follow up on a BMD locus previously identifi ed in a traditional F2 cross between strains C3H/HeJ and C57BL/6J. These studies revealed molecular phenotypes using "omics" technologies ( Table  1 ). The natural variations found among the inbred strains of the HMDP directly perturb a substantial fraction of all genes, as judged by the number of genes exhibiting cis -eQTL or allele-specifi c expression ( 24,25 ), and these, in turn, result in thousands of secondary perturbations. When the molecular and clinical traits are monitored together, relationships between them can be observed using mapping, correlation, and modeling [reviewed in ( 26 )]. This is the basis of "systems genetics."

Genetic analysis of molecular phenotypes using high throughput technologies
Omics data can be analyzed using genetics in the same manner as other phenotypic traits. For example, variations in the levels of a transcript in a population can be treated as a quantitative trait and the genetic loci responsible can be mapped to regions of the genome using linkage or association analyses. Loci that reside near the genes whose transcripts are measured are likely to affect enhancer/promoter function and are thus often assumed to act in cis , while loci affecting expression of genes on other chromosomes or many megabases away on the same chromosome presumably act through diffusible factors and are thus assumed to act in trans . Such loci are termed eQTLs. Originally, individual transcript levels were quantitated in populations using hybridization or polymerase chain reaction amplifi cation ( 27 ), but with the advent of expression arrays and RNA-Seq, it became possible to map eQTLs globally ( 1 ). Such studies have shown that genetic variations in gene expression are very common, affecting levels of thousands of genes in both human and mouse populations [reviewed in ( 26,28 )]. Moreover, it appears that a large fraction ( ‫ف‬ 85%) of the variations for common disease traits result from variations in gene expression rather than from structural (protein coding) variation [for example, ( 29 )]. The levels of proteins and metabolites can also be quantitatively measured using high throughput technologies, and the loci controlling these can be similarly mapped to identify protein QTLs (pQTLs) or metabolite QTLs ( 7,8 ).

The fl ow of biologic information: from genes to molecular traits to clinical traits
Whereas common disease traits are complex, infl uenced by tens or hundreds of loci, molecular traits tend to be much simpler. For example, cis -eQTLs often explain a large fraction of the variance of the transcript levels. A key aspect of the systems genetics approach is that molecular traits can thus constitute a bridge of sorts between DNA variation and clinical traits. An example of the application of such "vertical" omics is shown in Fig. 2 . Several million sites of DNA methylation were identifi ed in livers of the HMDP strains, using reduced representational bisulfi te sequencing, and 22,000 sites that exhibited substantial genetic variation in methylation levels were selected. These were then tested for signifi cant association with molecular traits, as quantitated by expression arrays, proteomics, and metabolomics, as well as clinical traits. The fl ow of biologic . In (C) and (D), the proteins or transcripts are plotted on the y axis according to the location of the encoding gene. Each dot is a signifi cant association at the corresponding Bonferroni thresholds across CpGs tested with levels of clinical traits or number variation associated with altered expression levels, and Degs1 , a fatty acid desaturase involved in the metabolism of bioactive sphingolipids. These same mice were examined for global transcript levels in liver, adipose, and muscle, as well as metabolites in plasma. A list of the most strongly correlated genes revealed many known to contribute to obesity, such as Lep , Sfrp5 , MIxipl , Dgat1 , and Nnmt ( 21 ).
These results have some important implications for the current "epidemic of obesity". Thus, the fi ndings support the concept of a genetically determined "setpoint," because almost all of the strains studied reached a plateau level of body fat following the initial weight gain ( Fig. 4C ). The fi nal plateau level was dependent on the genetic background between strains and was only weakly correlated with food consumption ( 21 ), although within a strain there was strong correlation between food intake and the development of obesity. Moreover, cross-fostering studies (in which the microbiomes of different strains are exchanged) showed that gut microbiotas are responsible, in part, for the differences in response to dietary challenge ( 42 ). This is consistent with the idea that subtle changes in microbiota composition may have contributed, in part, to the increased prevalence of obesity ( 43 ).

Insulin resistance and type 2 diabetes
Insulin resistance (IR) is characterized by the failure of tissues to respond appropriately to insulin. It is strongly associated with obesity and contributes importantly to type 2 diabetes, fatty liver disease, and cardiovascular disease.
Bicc1 as a novel determinant of osteoblastogenesis and BMD in both mice and humans ( 20 ).

Obesity and dietary responsiveness
The analysis of obesity in humans is confounded by environmental factors such as the inability to monitor food intake. The HMDP has been particularly useful in examining the response to a high-fat dietary challenge because the same genetic backgrounds can be examined under different conditions. As shown in Fig. 4A , the HMDP strains exhibit substantial variation in body fat percentage on both chow and high-fat diets. The heritabilities for both fat as a percent of body weight as well as the response to a high-fat diet were in the range of 80%. Genome-wide association analyses of the HMDP identifi ed eight significant/suggestive loci associated with obesity traits, such as body fat percent change in response to the diet ( Fig. 4B ), several of which overlapped with human GWAS loci for body mass index ( 21 ). For example, the chromosome 18 locus contains the endosomal/lysosomal Niemann-Pick C1 ( Npc1 ) gene, a human GWAS hit ( 39,40 ). A previous study with heterozygous knockout mice for Npc1 revealed increased responsiveness to a high-fat diet as compared with wild-type mice, whereas there was no effect on a lowfat diet ( 41 ). This is precisely the phenotype observed in the HMDP: mice with reduced Npc1 expression due to a cis -eQTL had increased adiposity on the high-fat diet, but not the chow diet. Other strong candidates are the amylase ( Amy ) genes on chromosome 3, which show copy levels of metabolites, proteins, or transcripts in liver. E, F: The association of percent methylation of a CpG on chromosome 1 at 173,115,750 base pairs (x axis) versus the levels of plasma HDL cholesterol (E) or apoAII (F). Reproduced from ( 63 ), with permission. to IR. Analysis of the HMDP strains revealed large differences in IR when fed a diet rich in fat and refi ned carbohydrates along with striking sex differences. More than 15 genome-wide signifi cant loci for traits associated with IR Fig. 4. Genetic control of response to high-fat (HF) high-sucrose (HS) diet. Mice of the HMDP strains (six to eight male mice per group) were maintained on a low-fat chow diet until 8 weeks of age, when they were placed on a high-fat (32% kcal) and high-sucrose (25% kcal) diet for 8 weeks. The percent body fat on chow or on high-fat diet is shown in (A) and a GWAS of the percent body fat change following feeding of the diet is shown in (B). The red line in (B) indicates the threshold for genome-wide signifi cance and likely candidate genes under each peak are indicated. The increase in percent body fat in response to the diet largely plateaus after about 4 weeks (C), consistent with a genetically controlled "setpoint" model of obesity ( 21 ). Reproduced from ( 21 ), with permission.
Analysis of IR in humans is confounded by environmental factors, sex differences, age, and disease pathology and, despite large GWASs, there has been limited success in identifying the genetic factors and pathways contributing strong evidence from human studies for the involvement of six genes in susceptibility to NAFLD ( Table 3 ) . In the HMDP, fi ve out of six of these genes exhibited signifi cant correlation, in terms of gene expression in adipose or liver, with hepatic TG levels. Some of these associations (those with cis -eQTLs) may result from direct genetic variation driving the expression of these genes, whereas the others may be secondary.

Heart failure
Heart failure is a very common cause of death, with a lifetime risk of more than one in nine in developed countries. Characterized by loss of cardiac output, heart failure is a heterogeneous disorder associated with complex pathological features, including contractile dysfunction, fi brosis, and hypertrophy. It is a highly heterogeneous disorder that results from many different chronic stressors, most notably hypertension and injury following myocardial infarction. The heterogeneity has complicated human GWASs and only a small number of signifi cant loci have been identifi ed despite meta-analyses of tens of thousands of patients ( 45,46 ). To model heart failure in the mouse, Rau et al. ( 23 ) administered a ␤ -adrenergic agonist, isoproterenol (ISO), to the HMDP for 3 weeks using an implanted pump. The strains showed considerable variability in the development of hypertrophy, fi brosis, and changes in heart function (based on echocardiography parameters). GWASs revealed 7 signifi cant and 17 suggestive loci, containing an average of 14 genes in linkage disequilibrium with the peak SNP, for cardiac hypertrophy, fi brosis, and surrogate traits relevant to heart failure. A number of loci contained highly promising candidate genes, including genes known to contribute to Mendelian cardiomyopathies in humans or having established roles in cardiac pathology, as well as novel candidates based on systems genetics strategies.
A strong candidate in a chromosome 7 locus for fi brosis was Abcc6 , an orphan transporter that is the cause of the disorder, pseudoxanthoma elasticum, characterized by chronic calcifi cation of a number of soft tissues, including heart. Mutations of the gene occur among a number of common mouse strains, such as DBA/2J and C3H/HeJ, where they cause calcifi cation of heart and other tissues in older mice beginning at about 6 months of age ( 47 ). To test the role of Abcc6 in ISO-induced fi brosis, genetargeted mice on a C57BL/6J background were examined following ISO treatment. As compared with the wild-type mice the level of fi brosis (as measured by collagen content) in the knockout mice was substantially increased ( Fig. 5A ). Similarly, on a C3H/HeJ background, which carries a naturally occurring Abcc6 -null mutation, mice expressing a genomic Abcc6 transgene were rescued from fi brosis ( 22 ) ( Fig. 5B ).

Plasma lipids
As compared with humans, mice have relatively low levels of LDLs and TG-rich lipoproteins and somewhat elevated levels of HDLs ( 48 ). Even when fed high-fat diets, the levels of LDL cholesterol and TGs remain relatively low. Higher levels of these, a prerequisite for the development were identifi ed and a novel IR gene, Agpat5 , was validated. Mice in which Agpat5 expression was suppressed, using an antisense oligonucleotide, had reduced plasma insulin levels and increased ability to clear glucose ( 12 ). Agpat5 is a mitochondrial lipid acyltransferase involved in the conversion of lysophosphatidic acid to phosphatidic acid ( 12 ). Systems genetics analyses involving global transcript levels in liver and adipose tissue, as well as plasma metabolites, implicated a number of additional genes and revealed a signifi cant correlation with plasma arginine levels ( 12 ).

Fatty liver disease
Non-alcoholic fatty liver disease (NAFLD) encompasses a wide spectrum of liver abnormalities ranging from benign accumulation of lipids (steatosis) to infl ammation and fi brosis (non-alcoholic steatohepatitis) to cirrhosis, and then end stage liver disease and cancer . As yet, human GWASs have succeeded in identifying only a handful of genes signifi cantly associated with NAFLD and these explain a tiny fraction of disease heritability. NAFLD is strongly associated with obesity, diabetes, and dyslipidemia, and the "epidemic of obesity" has resulted in a high prevalence of NAFLD (20-30% of Western populations).
To identify genetic and environmental factors contributing to NAFLD, liver steatosis and related clinical and molecular traits were studied in the HMDP following feeding of a high-fat high-carbohydrate diet for 8 weeks ( 34 ). More than a 30-fold variation in liver TG was observed and, as in human populations, this was strongly associated with both body fat and IR, which together explained more than 40% of the variation in liver TG. GWASs revealed four loci signifi cantly associated with hepatic TG levels, and candidates of each of the loci were screened using gene expression data ( cis -eQTL, correlation with trait) and coding sequence variation, available in the Sanger database as discussed above. The Gde1 gene in the chromosome 7 locus, containing a total of 17 genes, was selected on the basis of a strong cis -eQTL and strong correlation with hepatic TG content in both liver and adipose. Its role in steatosis was confi rmed by showing that Gde1 overexpression and shRNA knockdown in liver using adenoviral delivery led to reciprocal effects in liver TG accumulation ( 44 ). Gde1 encodes glycerophosphodiester phosphodiesterase 1, a broadly expressed integral membrane protein that catalyzes the degradation of deacylated phospholipids, such as glycerophosphoethanolamine and glycerophosphocholine. Gde1 has no direct role in TG biosynthetic pathways; however, one of the end products of the phosphodiesterase reaction is glycerol 3-phosphate, the precursor for TG biosynthesis. In addition, Gde1 may affect hepatic metabolic homeostasis through altering the availability of bioactive phospholipids and metabolites. How the variation in liver TG in the HMDP strains will correlate with subsequent pathologies is unknown, but liver TG levels were strongly associated with plasma alanine aminotransferase levels , a measure of liver injury. Prolonged feeding studies or stronger stressors will be required to examine the further progression of the disease.
NAFLD nicely illustrates the concordance of human and mouse disease pathways. At the present time, there is Six genes, listed here, have been associated with NAFLD in human studies. Transcript levels for these genes were determined in livers and gonadal adipose tissue of the HMDP. Five of the six (the exception being Pnpla3 ) exhibited signifi cant correlation (r) with hepatic TG levels in mice fed a high-fat high-carbohydrate diet in either liver or adipose. Two of the fi ve had strong cis -eQTLs in liver ( 44 ( 14 ).

Atherosclerosis
The mouse has become the most widely used animal model of atherosclerosis and there have been thousands atherosclerosis, inferred causality using GWAS results, and, fi nally, identifi ed what were termed "key driver" genes. The modeling was verifi ed in part by comparing human and mouse networks and performing experiments with cell lines.

Infl ammatory responses
Many metabolic and cardiovascular traits have an important infl ammatory component. To examine genetic contributions to infl ammation, peritoneal macrophages from 92 strains of the HMDP were cultured and studied for genome-wide transcript levels before and after treatment with lipopolysaccharide (LPS) or oxidized lipids (Ox-PAPC) ( 16 ). A larger number of cis -eQTLs were identifi ed in this study, as compared with in vivo tissues (5,217 in the control, 4,587 in the LPS, and 4,747 in the Ox-PAPC, as compared with 2,000-4,000 in most tissue studies). Presumably, this refl ects reduced environmental effects and a more homogeneous cellular composition. Between 9,000 and 18,000 trans -eQTLs were also identifi ed although, because of the problem of multiple comparisons, many of these are likely to be false positives ( 51 ). A number of the trans -acting loci were present as "hotspots," particularly after LPS treatment. The largest such hotspot was on chromosome 9 at 119 Mb and included over 1,000 regulated genes, many of which were infl ammatory cytokines or LPS-primary response genes.
assessed. In addition, global gene expression was quantitated using arrays in the aorta and the liver, and levels of lipids, glucose, insulin, numerous cytokines, and a panel of metabolites were quantitated in the plasma. As shown in Fig. 6 , despite the fact that all the mice consisted of 50% C57BL/6J background, there was well over a 600-fold range of variation in lesion sizes. While males tended to have lesion sizes several-fold smaller than females, the sizes of lesions in males and females were very signifi cantly correlated ( r = 0.474, P = 2.6 × 10 Ϫ 15 ). Because C57BL/6J mice have a roughly intermediate lesion size in both males and females, the very small lesions (less than half the size of those in C57BL/6J) cannot be explained by additive models of inheritance. The relationships between atherosclerosis and various risk factors in mice closely resembled those in humans ( 49 ). The data reported in the study provide a rich resource for further studies of atherosclerosis; for example, a number of relevant traits were mapped with high-resolution and a number of novel metabolite associations were observed. Furthermore, the expression data can be used to identify novel candidate genes or prioritize genes in human GWAS loci ( 29,49 ).
A combination of human and HMDP expression data were used to model cross-tissue regulatory gene networks for atherosclerosis ( 50 ). Briefl y, the authors constructed coexpression networks, identifi ed modules associated with The locus contains 12 genes based on linkage disequilibrium of which only 6 were expressed in macrophages These were systematically tested using siRNA knockdown and the trans regulation of most of the genes was shown to be due to 2310061C15Rik , a poorly characterized gene with homology to a mitochondrial protein involved in cytochrome C oxidase biogenesis ( 16 ). These data provide a rich resource for further studies of infl ammatory interactions, including pathogen interactions; for example, periodontal bone loss in response to LPS varies strikingly in the HMDP ( 52 ).

Type 1 diabetes and diabetic nephropathy
In some studies, only a fraction of the number of strains required for association mapping of traits have been characterized. One such study involves analysis of kidney disease in the context of type 1 diabetes ( 53 ). The authors bred the DBA/2J. Akita transgenic mouse model of type 1 diabetes to 28 of the HMDP strains and examined histologic and molecular parameters associated with diabetic nephropathy in diabetic mice and nondiabetic littermates. The most striking observed phenotype was urine albuminto-creatinine ratios, which increased 2-to 6-fold over euglycemic control values for most strains, but more than 10-fold in six strains, including 50-and 83-fold in two strains, NOD/ShiLtJ and CBA/J, respectively ( 53 ).

Other clinical traits
A variety of nonmetabolic traits are being studied in the HMDP. For example, the HMDP strains differ strikingly in hearing parameters and hearing loss due to noise. A number of loci were identifi ed in association studies ( 15,54 ) and Nox3 was shown to be critical for noise-induced hearing loss ( 55 ).
Conditioned fear phenotypes and global transcript levels for hippocampus and striatum were determined in the HMDP strains ( 17 ). A total of 27 behavioral quantitative trait loci were mapped and these results were integrated with eQTL results. Coexpression networks were constructed for hippocampus and striatum, and modules strongly associated with fear traits were identifi ed. Similarities and differences in modules in the two brain regions were examined ( 17 ).

Gene-by-environment interactions
While human GWASs have identifi ed many loci for metabolic and cardiovascular traits, a major limitation is the inability to examine environmental interactions. When the HMDP mice were challenged with various environmental conditions, a high-fat/high-sucrose diet ( 12,21 ), a high-fat/high-cholesterol diet ( 49 ), or isoproterenol treatment ( 23 ), virtually all clinical traits examined and hundreds of molecular traits, such as transcript levels, showed evidence of gene-by-environment (GxE) interactions (for example, see Fig. 7 ). Most striking were infl ammatory responses of peritoneal macrophage to bacterial LPS, where a number of hotspots affecting the responses of hundreds of genes were identifi ed ( 16 ). Because the majority of common genetic variation is regulatory rather than protein coding ( 56 ), it is not surprising that GxE interactions occur so frequently. It is likely that changes in transcription factor binding related to sequence variation will be a major mechanism driving cis -regulated GxE interactions such as those in Fig. 7 , although any of the events that are critical for gene expression could be involved, including chromatin interactions, chromatin state, alternative splicing, and posttranslational modifi cations. Many of the trans -regulated effects could result from genetic differences affecting the metabolism of dietary components or drugs. The gut microbiome, for example, is likely to be an important mediator of environmental responses, as discussed in the section below.

Gene-by-gene interactions
The importance of gene-by-gene (GxG) interactions in common disease in humans has been controversial, but studies in mice strongly point to their importance ( 57,58 ). The signifi cance of GxG interactions can be examined globally by comparing "broad sense" heritability (the sum of all genetic infl uences) with "narrow sense" heritability (the portion due to additive effects and not including GxG interactions). For example, a study of numerous traits in haploid yeast suggested that broad sense was substantially larger than narrow sense heritability for some traits but not others ( 59 ). Whereas such parameters are diffi cult to estimate in humans, they can be studied more accurately in mice because genetically identical replicates (members of inbred strains) are available and the environment can be controlled. Indeed, using the HMDP, traits such as heart failure and atherosclerosis appear to have considerably greater broad sense than narrow sense heritability ( 49 ).

Epigenetics
High-resolution genome scale epigenetic profi ling using next generation sequencing (ChIP-Seq, DNase-Seq, FAIRE-Seq, bisulfi te sequencing, etc.) has enabled analysis of the regulatory variation in which genetic variants are likely to act ( 60,61 ). A variety of epigenetic marks in liver have been examined in a subset of the HMDP ( 62 ) and DNA methylation has been examined in 90 HMDP strains ( 63,64 ). Much of the epigenetic variation was found to be controlled in cis and was strongly associated with the expression levels of nearby genes, which were, in turn, associated with protein, metabolite, and clinical traits (see Fig. 2 for example). Figure 2 shows an example of a DNA methylation that occurs near the Apoa2 gene on chromosome 1. The degree of methylation is strongly associated with the levels of apoA2 protein and HDL cholesterol (apoA2 is the second-most abundant protein in HDL). In addition to cis regulation, some instances of trans regulation were validated. For example, variable methylation of a cytosinephosphate-guanine (CpG) on chromosome 13 was associated with the degree of methylation at hundreds of sites throughout the genome, as well as the expression of many methylation and complex clinical traits, such as HDL levels, IR, obesity, and blood cell levels. For example, Fig. 2E, F shows the association of a methylation site on chromosome 1 with HDL cholesterol levels and expression levels of the nearby apoA-II gene ( Apoa2 ). For many complex traits, the associations with methylation were much stronger than with any nearby SNPs. Whether such strong associations result from effects on the expression of nearby genes or some other mechanism is unclear ( 64 ). Moreover, combinations of multiple methylation sites, identifi ed genes . A strong candidate for mediating the effect was the nearby Mtrr gene, encoding methionine synthase reductase. The enzyme is part of the folate cycle, involved in the generation of methyl donors for DNA and histone methylation. To experimentally validate Mtrr as the causal gene, gene-trapped Mtrr mice with reduced gene expression were studied and found to affect a highly overlapping set of methylation sites ( 63 ).
The most striking fi nding from these studies was the strong association between certain variations in DNA Fig. 7. Gene-by-environment interactions in response to a high-fat high-sucrose (HF/HS) diet. Shown are adipose transcript levels for two genes, sorbitol dehydrogenase (A) and histone deacetylase 1 (B), in mice fed either the chow diet (black dots) or the HF/HS diet (colored dots). The strains are rank ordered by transcript levels on the chow diet and the transcript levels on the HF/HS diet are colored according to the genotype of the peak cis -eQTL. In the case of sorbitol dehydrogenase, gene expression levels in mice with allele B are repressed by the diet, whereas those with allele A are induced. In the case of histone deacetylase 1, the induction is much larger in mice with genotype A than genotype B. controlled conditions (see Fig. 2 ) and studies of metabolite levels have been performed for liver and plasma when mice were fed either chow or high-fat diets ( 8,11,49 ). A number of conclusions emerged; for example, trimethylamine-N -oxide (TMAO) levels were found to be a strong predictor of atherosclerosis ( 49 ), as they are in humans. GWAS analyses resulted in the identifi cation of numerous metabolite QTLs (mTQLs), and the causal genes for some of these differences were experimentally validated ( 8 ). In a study of liver metabolites in mice fed a chow diet, 40% of metabolites measured showed evidence for genetic regulation. In total, the 110 measured metabolites were found to be mapping signifi cantly to 240 loci, and 36 metabolites were found to be signifi cantly associated with clinical traits ( 8 ). This work also highlighted the value of using the HMDP to identify and validate candidate genes regulating metabolite levels by integrating the transcript eQTLs with the metabolite QTLs. Following this recipe, the authors were able to identify the causal genes affecting N -acetylglutamate and glycerol-3-phosphate levels in liver.

Host-gut microbiota interactions
There is now overwhelming evidence that gut microbes can contribute to metabolic and cardiovascular disorders ( 66 ). A striking example is the association between levels of TMAO, a substance derived exclusively through the action of gut microbiota and cardiovascular disease. As yet, however, which microbes contribute to disease traits and what factors determine the composition of gut microbiota are poorly understood. Genetics provides a potentially powerful approach to address such questions, and to that end, Parks, Org, and colleagues ( 21,42 ) profi led gut microbiota using 16S rRNA gene sequencing from over 100 HMDP strains. Remarkably, they observed very high heritability of microbiota composition, in the range of 0.5 for most genera ( 42 ). They also observed a number of relationships between gut microbiota composition and clinical traits. For example, a strong association between levels of Akkermansia mucinophila , a common microbe that resides in and digests the mucin layer of the intestine, and IR was observed ( 21 ). This was then tested experimentally by introducing the microbe into mice using gavage and, indeed, profound effects on IR and other metabolic traits were observed ( 42 ). In other studies, the composition of the gut microbiota was shown to contribute to differences in TMAO levels between inbred strains of mice ( 67,68 ). Finally, cross-fostering studies, in which newborn mice are raised by foster mothers and consequently "inherit" their microbiota, suggested that differences in response to diet in the HMDP strains was due, in part, to the composition of the gut microbiota ( 42 ). Large human population studies of gut microbiota composition have been reported ( 69 ) and others are underway but, given the very large impact of diet and other environmental factors on gut microbiota, it will be challenging to tease out disease associations. The HMDP data constitute a powerful resource for further dissection of mechanistic host-gut microbiota interactions, enabling the formation of hypotheses that can then be examined in human studies. using linear regression modeling, were capable of predicting complex phenotypes, such as BMD and blood cell traits. Notably, many of the loci containing these methylation sites did not overlap signifi cantly with the SNP-based association ( 64 ).

Genetic control of protein abundance
Mapping protein levels as a quantitative trait (pQTL) is a critical aspect of understanding regulatory variation in the context of common disease. Recent advances in mass spectrometry-based proteomic methods have now enabled quantitation of thousands of proteins. One important question is the relationship between transcript levels and protein levels as a function of genetic variation. Whereas transcript-protein correlations are clearly very strong between different cell types, the perturbations introduced by common genetic variation are much more subtle. This issue was evaluated in liver using the HMDP ( 7 ). Ghazalpour et al. ( 7 ) quantifi ed over 5,000 peptides in the HMDP using a liquid chromatography-mass spectrometry reference-based labeling approach. Based on this, a set of 485 most reliable proteins were selected and compared with levels of the corresponding transcripts. Although, in some cases, the correspondence was excellent and many highly signifi cant pQTLs were mapped, about half of the proteintranscript pairs exhibited little or no correlation, even among the most heritable variations in transcript levels. A somewhat stronger correspondence was observed in yeast intercross population using green fl uorescent protein tags to quantify single-cell protein abundance ( 59 ). Although technical factors undoubtedly contributed to the lack of correspondence, there are a number of ways in which protein levels might be regulated independently of transcript levels, including regulation of translation, codon constraint, RNA editing, alternative splicing, posttranslational modifications, and protein turnover. One particularly signifi cant mechanism may involve protein complexes; thus, proteins which form complexes with other proteins likely have a specifi ed stoichiometry, and if one protein is produced in excess of the other, it will likely undergo rapid degradation. In the study of Ghazalpour et al. ( 7 ), it is noteworthy that in the case of ribosomal proteins, many of which were detected, there was essentially no correspondence between transcript and protein levels . Presumably, any such proteins produced in excess of the levels that could be incorporated into ribosomes would be rapidly degraded.

Regulation of metabolism
Recent advances in mass spectrometry and nuclear magnetic resonance have made high throughput analyses of hundreds of metabolites in biologic samples possible, and investigators have begun to utilize the relationships between metabolite levels and disease traits for use as biomarkers or elucidation of disease mechanisms. Human population studies of plasma metabolites have identifi ed a number of disease associations and shown that levels of many metabolites are highly heritable ( 65 ). The HMDP offers an opportunity to integrate metabolite levels with epigenetic, transcriptomic, protein, and clinical data under

Stem cells
Genetic factors controlling stem cell number, proliferation, and differentiation are poorly understood. Zhou

HMDP DATABASE AND ITS USE FOR CARDIOMETABOLIC RESEARCH
The data discussed above are organized on a server at UCLA and published data are available upon request from the corresponding author. Some of the data are also available through the Jax Phenome Database (phenome.jax.org) as well as the GeneNetwork database (www.genenetwork.

Sex differences
Most common diseases, including metabolic and cardiovascular diseases, differ in prevalence between men and women ( 70 ). In mice, such differences can be examined in detail, and previous studies have revealed thousands of differences in gene expression between sexes ( 71 ), most of them resulting from hormonal effects ( 72 ). In the HMDP, most clinical traits exhibited striking differences between males and females. For example, Fig. 8 shows IR, quantitated as homeostatic model assessment (HOMA)-IR. While there is considerable genetic variation, it is clear that in the majority of strains, HOMA-IR is greater in males ( 12 ). While explanations for most of these differences are unknown, systems genetics approaches in the HMDP should be informative. For example, whereas in humans, males are more susceptible to atherosclerosis than females, the reverse is true in mice. Studies of a subset of HMDP mice revealed that levels of TMAO [a strong contributor to atherosclerosis, in humans and mice ( 49 )] were much higher in females than in males, and analysis of hepatic transcript levels showed that this was due largely to greatly decreased levels of the enzyme, FMO3, in male mice due to repression by testosterone ( 67 ). In contrast, in humans, FMO3 expression is similar in males and females.

Blood cell levels
The levels of the major blood cell groups, red cells, lymphocytes, monocytes, and granulocytes, vary considerably among the HMDP strains ( 22 ). A number of loci for each cell type were identifi ed by GWASs, several of which overlap with loci observed in human studies. For example, fi ve red cell trait loci were identifi ed in the HMDP and four of these correspond to red cell loci reported in a recent human GWAS ( 73 ). A major locus affecting mean corpuscular volume and several other red cell traits mapped to Hbb-b1 , a likely causal gene that is part of the ␤ -globin cluster on chromosome 7 ( 22 ). Fig. 8. Sex differences in IR in the HMDP. HOMA-IR, a measure of IR based on glucose and insulin levels, was determined in the HMDP for males and females. In addition to large differences between strains, females clearly tended to be less insulin resistant than males. Reproduced from ( 12 ), with permission. ucla.edu/) ( 75 ). Below, we briefl y outline how the database can be interrogated to address certain questions. The basic operations used are correlation, genetic mapping, and statistical modeling ( 26 ) org). Also, precomputed data, including trait-genome associations (for clinical and molecular traits), trait correlations, and expression data across tissues, can be easily searched at the Systems Genetics Resource (https://systems.genetics. Fig. 9. Application of the HMDP database to investigate genes or traits of interest. Hypothetical examples of how information from the HMDP can be utilized to explore relationships between genes (A) and traits (B) of interest and their relationships with multiple layers of information. For each layer, correlation analysis can be used to ask a specifi c question and interpret results which could elucidate novel functions and/or relationships of genes or traits of interest.
A large body of data has now been collected and is freely available to interested researchers. This includes hundreds of genome-wide signifi cant loci, most containing less than a dozen genes, along with expression, proteomic, and metabolomics data to narrow the list of likely candidates. Apart from mapping, the lists of genes correlated with clinical traits contain many of the genes known to contribute to the traits [for example ( 21 )] and is undoubtedly highly enriched for genes yet to be discovered. The resource also presents opportunities to examine fundamental issues such as GxE and GxG interactions, sex differences, and host-gut microbiota interactions.

What information can be gained about your gene of interest?
One informative operation is to obtain the list of clinical traits or molecular traits (other genes, proteins, metabolites) that are correlated with any gene of interest ( Fig. 9A ). There are several possible explanations for the correlation: Your gene of interest (YGI) may infl uence the other traits (causal, indicated by a red arrow), it may be perturbed by the other traits (reactive), or the correlation may result from the fact that both YGI and the correlated traits are regulated by some other factor, possibly another gene or a technical issue such as a batch effect. Such a list provides candidates for further study and can be broadly examined for pathway enrichment [for example, see ( 44 )] thereby illuminating possible functions of YGI. It is also possible to perform causal modeling to help identify mechanistic interactions ( 30,31 ). For example, if the expression of YGI is regulated by a strong cis -eQTL, one can ask whether other traits map to that same locus.

What can I learn about my complex clinical trait of interest?
Similar to the analysis of YGI above, a useful operation is to examine the genes, proteins, or metabolites correlated with the clinical trait ( Fig. 9B ). The relationships may be causal, reactive, or independent, as discussed above. Also, one can map the major loci contributing to the traits of interest and subsequently prioritize the candidate genes at the loci using gene expression and sequence data. Finally, various kinds of modeling can be applied to identify sets of genes involved in the trait; for example, coexpression modeling can identify gene modules that can be tested for relationship to the trait using principal component analysis ( 76 ).
There are many other types of questions that can be addressed using the HMDP database. Examples include: What is the relationship between chromatin marks, gene expression, and clinical traits? What is the nature of geneby-environment interactions? How does the host contribute to gut microbiota composition? What pathways are shared among disease traits? The approaches to these questions are discussed in the works reviewed above.

CONCLUSIONS
The HMDP resource provides a means of formulating hypotheses about the interactions underlying complex metabolic and cardiovascular traits. Whereas QTL mapping using traditional crosses in mice succeeded in identifying numerous highly replicable loci, the poor resolution of linkage analysis, often tens of megabases, made the identifi cation of strong candidates diffi cult. Consequently, only a modest number of causal genes were identifi ed over the past twenty-fi ve years ( 77 ). In contrast, since its development in 2010, studies by a small number of laboratories using the HMDP have validated well over a dozen novel genes underlying complex traits. Key to this has been the integration of high-resolution association mapping along with systems genetics analysis using high throughput data.