|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thematic Review |
Department of Human Genetics and Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, and Department of Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, CA 90073
Published, JLR Papers in Press, July 11, 2006.
1 To whom correspondence should be addressed. e-mail: reuek{at}ucla.edu
| ABSTRACT |
|---|
|
|
|---|
Supplementary key words comparative genomics positional cloning functional genomics gene-trap mutagenesis ethylnitrosourea mutagenesis
| INTRODUCTION |
|---|
|
|
|---|
|
| IDENTIFICATION OF LIPID METABOLISM GENES BY GENOME SEQUENCE ANALYSIS |
|---|
|
|
|---|
300 remaining sequence gaps (reduced from 150,000 gaps in the draft sequence released in 2001) (1, 2). Inherent in the nucleotide sequence is information about the chromosomal landscape, including the presence of local and long-range duplications, disperse and tandem repeat sequence families, and, of greatest interest, an estimated 20,00025,000 protein-coding genes. However, the genes constitute the vast minority of the genome, accounting for <2% of the sequence, whereas >45% of the genome is composed of interspersed repeat sequences, many of which are relics of retroviruses and DNA transposons. The high ratio of repetitive sequence to recognizable functional sequence in the genome has made it challenging to identify the complete complement of protein-coding genes.
One approach to better decipher the information contained in the human genome has been to compare it with genome sequences from other organisms (3) (see Table 1
for genome browsers and other pertinent resources). Comparative genomics is based on the premise that functional sequences have been positively selected over evolutionary time, whereas random mutations have been tolerated in sequences without specific function (for reviews, see 46). The choice of which genomes to compare depends on the type of information being sought. A comparison of genome sequences from human and chimpanzee, which diverged
8 million years ago and exhibit 95% genome-wide sequence similarity, will not distinguish between functional and nonfunctional sequences because the species diverged so recently. On the other hand, the divergence of mouse and human 7580 million years ago provides a suitable evolutionary distance for the identification of functional sequences without the high background of similarity across the entire genome that occurs between primates. Importantly, functional similarities between human and mouse are easily established, as >99% of human genes have a detectable homolog in the mouse, with high conservation of gene order and exon-intron organization. Although only
2% of the genome consists of protein-coding sequences, there is
5% sequence conservation between human and mouse. This suggests that in addition to protein-coding sequences, regulatory elements and non-protein-coding genes have been conserved as a result of functional constraints (7).
|
Although the example of APOA5 is striking, it raises the question of whether this was a unique case or does comparative genomics have further applicability for the identification of lipid metabolism genes? In fact, despite the availability of nearly all of the genome sequence, many genes have not been annotated accurately, or at all, and comparative genomics can be extremely useful in this regard. We previously used the available mouse genome sequence to characterize the mouse aldo-keto reductase (AKR) gene cluster and to identify new AKR family members based on similarities in protein-coding sequences (10). AKRs represent a superfamily of oxidoreductases that catalyze the NADP(H)-dependent reduction of a wide variety of aldehydes and ketones, including steroid hormones, prostaglandins, and many xenobiotic compounds (11). The AKR1C subfamily members function in steroid hormone homeostasis and also act as bile acid binding proteins. To better characterize the members of the AKR1C subfamily, we analyzed mouse genome sequence using sequences of four known Akr1c genes and identified four additional members of the AKR gene superfamily residing in a cluster on chromosome 13 (10). The high degree of similarity in predicted amino acid sequence among the eight AKR1C proteins suggested that they have arisen through repeated gene duplication events but have diverged in substrate specificity and expression patterns. Because of the high sequence similarity between these AKR gene family members, it had been impossible to unambiguously identify the corresponding human gene homologs simply by standard sequence comparisons, such as BLAST (Basic Local Alignment Search Tool). However, this is possible through analysis with the VISTA alignment tool (Table 1) that was used to identify the APOA5 gene, because it integrates both DNA sequence and position within the genome. Applying the VISTA tool to a region of human chromosome 10, we can now identify not only the six previously described human AKR genes but also three previously unannotated members, which likely correspond to mouse Akr1c21, Akr1c19, and Akr1c14 (Fig. 2 , regions AC). This analysis reveals that the human and mouse AKR gene clusters are conserved in number of genes, exon-intron organization, and orientation of genes within the cluster.
|
A related, but distinct, comparative genomic approach has been used to identify regulatory elements upstream of the human apolipoprotein [a] (apo[a]) gene. Apo[a] is unusual in that no homolog exists in the mouse or most other non-primates, so comparison of distant mammalian genomes is not feasible (13). Therefore, Rubin and colleagues (14) instead analyzed apo[a] gene and upstream sequences in several closely related species using an approach known as "phylogenetic shadowing." Phylogenetic shadowing is based on the premise that, even in closely related species, DNA sequences that are not critical will vary, at least some of the time (15). Thus, by analyzing sequences covering a 1.6 kb region near the 5' end of the apo[a] gene in 18 different primates, it was possible to identify a novel regulatory sequence found only in Old World monkeys and hominids. This technique is a powerful tool to identify primate-specific regulatory motifs, which cannot be detected through human-mouse sequence comparisons.
With additional sequence information that will become available through projects such as ENCODE (for Encyclopedia of DNA Elements), which will sequence 1% of the orthologous region of the genome in >20 mammalian species, and new tools developed for the analysis of both evolutionarily close and distant genomes, comparative genomics will become an even more powerful approach for deciphering information not only about protein-coding genes but also about noncoding microRNA genes and cis-regulatory sequence (16, 17).
| IDENTIFICATION OF LIPID METABOLISM GENES BY POSITIONAL CLONING |
|---|
|
|
|---|
Positional cloning strategy
The positional cloning approach traditionally involves genetic mapping of the disease phenotype, delimiting the physical stretch of DNA sequence spanning the region, identifying all genes residing there, and testing these candidates for involvement in the disease by sequence analysis in affected versus unaffected individuals (Fig. 3A
). One of the first successful positional cloning efforts, to isolate the gene underlying Huntington's disease, required almost 10 years of intensive effort by an international consortium to identify the gene, even after the identification of a closely linked genetic marker (18). Fortunately, the availability of genomic resources, including genome sequences, polymorphisms, and gene expression information, has streamlined this process in several respects. As a result, several important lipid metabolism genes, underlying both Mendelian and complex disorders, have recently been identified, as shown in Table 2
and described below.
|
|
The availability of whole genome sequences has virtually eliminated the two labor-intensive intermediate steps of the positional cloning process, physical mapping and identification of genes in the region (Fig. 3). As recently as a few years ago, once a candidate region was delimited by genetic markers, it was necessary to define the physical map region and genes contained therein experimentally by isolating large insert DNA clones followed by exon trapping or cDNA selection (25, 26). This tedious process has largely been replaced by simple database queries. The genetic markers that delimit the candidate region can be submitted to a genome browser, and with a few keystrokes, the entire physical region, including sequence and known and predicted genes, is accessible (Table 1). This effectively converts the positional cloning approach into a "positional candidate" approach (Fig. 3B). It should be noted, however, that identification of novel genes by this method still offers challenges, as many uncharacterized genes are still annotated incorrectly, or not predicted at all, in current versions of the genome sequence (e.g., Diet1, described below).
Finally, whole genome data are valuable in screening candidate genes within a mapped region, which can be prioritized using a combination of in silico and experimental data. For example, candidate genes can be investigated in the Gene Ontology database (Table 1), in which genes are classified on the basis of all known information with respect to their cellular compartment (e.g., nucleus, endoplasmic reticulum, etc.), biological process (e.g., signal transduction, fat cell differentiation, etc.), and molecular function (e.g., catalytic activity, receptor, etc.) (27). Analysis of the Gene Ontology classification for all candidate genes at a particular locus may indicate high-priority candidates based on attributes that relate to the phenotype. Another criterion for inclusion or exclusion of candidate genes is the tissue expression pattern. For example, a gene that underlies abnormal very low density lipoprotein production may be expected to have high expression levels in liver, whereas a gene that affects adipocyte differentiation is expected to be expressed in adipose tissue. Rather than perform Northern blot or RT-PCR experiments to determine the expression pattern of a set of candidate genes, expression information for the vast majority of known and predicted genes can be obtained through DNA microarray databases (28). For example, the Genome Institute of Novartis (Table 1) provides expression data for nearly 80 human tissues, including 21 brain subregions and several classes of B- and T-cells in addition to standard tissue types, as well as for >60 mouse tissues, including seven stages of embryonic development. These in silico tools help prioritize candidate genes to analyze further using experimental approaches.
Although the isolation of genes via positional cloning typically follows a standard strategy, the details leading to the identification of specific gene mutations are unique to each mutant gene. However, a few generalities have emerged. In the mutations described here, the most fruitful approach to screen a series of candidate genes has been to investigate altered gene expression levels in the mutant. This presupposes that the mutation is of a type that will lead to altered mRNA transcription or stability rather than affecting the function of a protein that is otherwise expressed at normal levels. Interestingly, in the recently cloned lipid metabolism mutations, altered expression levels have been prevalent, suggesting that such mutations are quite common or that this type of mutation is more easily identified and therefore overrepresented in the mutant genes identified to date. The causative mutations include gene rearrangements, small insertions or deletions, and point mutations that lead to premature stop codons, so that the mRNA transcripts produced from such genes are degraded, presumably by nonsense-mediated mRNA decay (29). The methods for screening gene expression alterations range from gene-by-gene analysis via Northern blot or RT-PCR to characterization of global mRNA expression patterns by DNA microarray hybridization. The use of microarray analysis to compare gene expression patterns in wild-type versus mutant tissues was instrumental in the identification of the Cd36 gene mutation in the spontaneous hypertensive rat and in the identification of the ABCG5 and ABCG8 mutations in human sitosterolemia (30, 31). More recently, microarray analysis has been combined with linkage analysis in segregating mouse and rat populations to identify genes underlying complex traits, as described below.
Identification of mutations underlying single gene disorders
Coincidental with the availability of the genome resources described above, there has been notable progress in the positional cloning of genes underlying lipid metabolism disorders in mouse and other experimental organisms, as well as in human (see Table 2 for lipid-related genes identified by positional cloning since 1997). The majority of these have been single gene disorders, but successes in complex disorders have also been reported. Ten years ago, we reviewed about a dozen naturally occurring single gene mouse mutations affecting lipid metabolism, many of which had been genetically mapped but the genes not identified (32). Interestingly, aside from three mutations in extinct mouse strains, all of these genes have now been identified. These include genes that are now quite familiar in the lipid field, such as the Npc1 (Niemann Pick type C1) gene, identified as the causative mutation underlying both the lysosomal cholesterol storage disorder (lcsd) and sphingomyelinosis (spm) mouse mutations, and the Soat1 gene encoding sterol O-acyltransferase 1 (more commonly known as ACAT-1), responsible for the causative mutation in the ald (adrenocortical lipid depletion) mouse (Table 2 and references therein). In addition to the mouse, positional cloning in model organisms has identified lipid metabolism genes such as fat-free (ffr), required for normal intestinal lipid absorption in zebrafish, and mutations in two separate cholesterol synthetic genes as a cause of cataracts in the Shumiya rat (33, 34).
In humans, positional cloning and/or positional candidate methods have been used to identify mutations underlying single gene disorders in three separate members of the ABC transporter family: mutations responsible for sitosterolemia were identified in ABCG5 and ABCG8, whereas Tangier disease and familial HDL deficiency, characterized by virtually absent circulating HDL, were attributed to mutations in ABCA1 (31, 3537). Positional cloning was also used to identify mutations underlying human autosomal recessive hypercholesterolemia in a novel gene encoding a putative LDL receptor adaptor protein (38). Notably, a few genes underlying multigenic lipid disorders have also been isolated via positional cloning very recently in both mouse and human, as described below. As an illustration of one of the strengths of the positional cloning approach, many of the recently identified genes have been completely novel genes without any previous characterization (e.g., Lpin1, Diet1) or have been known genes not previously implicated in lipid metabolism (e.g., Txnip, Angptl3). Below, we describe a few of the recently positionally cloned mouse genes that are particularly relevant to dyslipidemias; for more examples, see Table 2.
The fld mutation
In 2001, positional cloning of the fatty liver dystrophy (fld) mouse mutation identified a novel gene, lipin (Lpin1), which defined a new family of three related genes in mammals (Lpin1, Lpin2, and Lpin3) (39). Interestingly, single lipin homologs also exist in invertebrates and single-celled eukaryotes, suggesting a fundamental cellular role for lipin. The fld mutation leads to a rearrangement of the Lpin1 gene, producing a null mutation. Lipin-deficient mice exhibit fatty liver and hypertriglyceridemia during the neonatal period, followed by peripheral neuropathy beginning at
3 weeks of age, and complete lack of adipose tissue and secondary insulin resistance throughout their lifetime (reviewed in 40). Lipin is expressed predominantly in adipose tissue, skeletal muscle, and testis, with lower level expression detectable in several other tissues. Studies with lipin-deficient fld mice and lipin transgenic mice have revealed that lipin is required for normal adipocyte development and that it plays a role in energy metabolism in muscle (4143).
Whereas lipin-deficient mice are lipodystrophic as a result of impaired fat cell differentiation, lipin transgenic mice become obese, suggesting that subtle genetic variations that influence lipin expression levels could influence fat content and glucose homeostasis in humans. Indeed, this appears to be the case, as an association study in which polymorphisms spanning the human LPIN1 gene were typed in Finnish dyslipidemic families and in obese and lean subjects revealed significant associations with body mass index and insulin levels (44). It is of interest to determine the molecular function of lipin and its family members. Two recent studies in the yeast Saccharomyces cerevisiae provide interesting clues. In one study, the yeast lipin homolog was shown to have a role in regulating phospholipid biosynthesis and nuclear growth during the cell cycle (45). Another group studying yeast determined that lipin acts as a magnesium-dependent phosphatidic acid phosphatase, an enzyme with a key role in triacylglycerol synthesis through the conversion of phosphatidic acid to diacylglycerol (46). It remains to be determined whether mammalian lipin has similar properties and what specific roles the individual lipin family members play.
The Hyplip1 mutation In 2002, the gene underlying the Hyplip1 mouse mutation was identified by positional cloning and found to reside in the gene for thioredoxin-interacting protein (Txnip), also previously known as the tumor suppressor, vitamin D3-upregulated protein (VDUP-1) (47). The Hyplip1 mutant mouse exhibits metabolic abnormalities similar to familial combined hyperlipidemia (FCHL), with increased cholesterol, triglyceride, and apoB levels and increasing severity of the hyperlipidemia with age (48). Hyplip1 mice carry a nonsense mutation in the Txnip gene, leading to dramatically reduced mRNA levels (47). TXNIP appears to have a role in modulating the cellular redox state, with important metabolic effects. In liver, these include the regulation of biochemical branch points for gluconeogenesis and lipogenesis, such that TXNIP-deficient animals divert substrates such as phosphoenolpyruvate to lipogenesis rather than gluconeogenesis, contributing to hyperlipidemia in these animals (49, 50). Interestingly, it was determined that TXNIP-deficient mice appear to have similar levels of thioredoxin activity as wild-type mice, raising the possibility that TXNIP deficiency may exert its effects in part by other mechanisms (51). Indeed, in pancreatic ß-cells, Txnip expression is highly induced by glucose and Txnip overexpression promotes ß-cell apoptosis (52). This negative effect of TXNIP on cell proliferation is consistent with its initially proposed role as a tumor-suppressor gene and with the reduced expression observed in tumor cell lines and human tumors (5356). In agreement with the role of TXNIP as a tumor suppressor, TXNIP deficiency in the Hyplip1 mouse strain was recently shown to be sufficient to initiate hepatocellular carcinoma (57). The discovery that a gene initially characterized as a tumor suppressor is the underlying cause of hyperlipidemia illustrates the independence of positional cloning from preconceived ideas regarding gene function.
The hypl mutation The KK/San mouse carries a spontaneous mutation, hypl, that renders these animals hypolipidemic. KK/San mice are derived from the KK mouse strain, a multigenic model of moderate obesity and type 2 diabetes characterized by hyperlipidemia, hyperinsulinemia, and hyperglycemia (58). In a group of KK mice maintained at a laboratory in Japan, a spontaneous mutation occurred producing animals that are hypolipidemic despite the maintenance of high body weight, insulin, and glucose levels (59). Most pronounced were reductions in triglycerides, but cholesterol and nonesterified fatty acids were also reduced. Positional cloning identified the mutation in the angiopoietin-like 3 (Angptl3) gene, resulting in a premature stop codon and a dramatic reduction in mRNA levels (59). ANGPTL3 is a member of the vascular endothelial growth factor family, but it lacks the ability to bind to the Tie2 receptor, suggesting that it is unlikely to have a role in blood vessel formation, as do other family members (60). ANGPTL3 is produced in liver and secreted into the circulation. In contrast to the reduced ANGPTL3 levels observed in KK/San mice, overexpression or administration of ANGPTL3 leads to increased plasma lipids (59). Several studies have provided insight into the mechanisms by which ANGPTL3 levels influence lipid levels. ANGTPL3 was found to inhibit lipoprotein lipase activity, leading to an 85% increase in peripheral disposal of exogenous triglycerides and accounting for the reduced triglyceride levels in the ANGPTL3-deficient KK/San mouse (59, 61). Interestingly, ANGPTL3 appears to have a different role in adipocytes, in which it promotes lipolysis, thus explaining the reduced levels of circulating fatty acids in KK/San mice (62).
Angptl3 genetic variations have also been associated with altered atherosclerosis susceptibility in mice and humans. Transfer of the hypl mutation onto the apoE null mouse strain significantly reduces aortic lesion size (63). Additionally, a difference in ANGPTL3 amino acid sequence has been proposed as the causative variation underlying the Ath8 atherosclerosis susceptibility locus (64). A genetic association between the human ANGPTL3 gene and atherosclerotic lesions has also been reported (64). Because both ANGPTL3 and its close relative ANGPTL4 are secreted proteins that inhibit lipoprotein lipase, there is interest in these factors as targets for the treatment of dyslipidemias and the metabolic syndrome (reviewed in 65, 66). In this vein, it is of interest to determine which factors regulate ANGPTL3 levels. It has been shown that Angptl3 gene expression is induced by the liver X receptor and negatively regulated by thyroid hormone, suggesting a potential mechanism for the hypotriglyceridemic effects of thyroid receptor ß agonists (67, 68).
The Diet1 mutation The widely studied C57BL/6J mouse strain is among the most susceptible to hypercholesterolemia and aortic lesions when fed a diet containing cholesterol, fat, and cholate (69). Therefore, it was highly unexpected when the nearly genetically identical C57BL/6ByJ substrain was found to be resistant to hypercholesterolemia and aortic lesions, despite similar levels of food intake and dietary cholesterol absorption (70). The resistance to increased plasma cholesterol levels in C57BL/6ByJ mice is associated with altered hepatic gene expression leading to the enhanced conversion of cholesterol to bile acids and subsequent excretion in the urine and feces (71, 72). Resistance to hypercholesterolemia cosegregated with high bile acid levels in a cross between C57BL/6ByJ and an unrelated mouse strain, indicating that a single locus controls both traits (71). Via positional cloning, we recently identified the causative mutation in a novel gene, Diet1, that has undergone rearrangement in the C57BL/6ByJ genome (our unpublished data). In this case, the information available in genome databases had incorrectly predicted Diet1 as three separate genes, and no expressed sequence tags were available, calling the expression of this gene into question. Our subsequent analysis revealed that Diet1 is prominently expressed in intestine and illustrates that some tissues may be underrepresented in the expressed sequence tag databases. Given these issues, it is highly unlikely that Diet1 would have been identified as a candidate gene for hypercholesterolemia resistance through any methods that rely on assumptions about function. Analysis of Diet1 function is under way, but it appears to have its primary role in bile acid metabolism in intestine, with secondary effects on the regulation of bile acid synthesis in liver (our unpublished data). It will be interesting to determine how Diet1 is regulated and whether it provides a potential target for cholesterol lowering through the modulation of bile acid metabolism.
Positional cloning of genes underlying complex traits
In addition to mutations underlying Mendelian disorders, the identification of mutations contributing to complex traits, such as atherosclerosis and diabetes, is now becoming a reality. Genes contributing to complex traits are first mapped through whole genome linkage screens, which often detect several quantitative trait loci (QTLs) that contribute to the phenotype. In the mouse, individual QTLs contributing to a complex phenotype may be isolated on a defined genetic background by the production of congenic strains (73). Congenic strains typically contain a relatively large genome segment carrying the mutation or QTL, which may harbor hundreds of potential candidate genes. Thus, for successful evaluation of candidates, the original congenic region may be subdivided through further breeding to produce subcongenic strains containing a tractable number of candidate genes.
The strategy of genetic mapping, congenic strain production, and positional candidate gene screening has recently met with success to identify genes underlying complex traits associated with lipid metabolism in the mouse. These include Alox5 (5-lipoxygenase) and Tnfsf4 (encoding the OX40 ligand) as genes influencing atherosclerosis susceptibility and the Sorcs1 (a member of the sortillin, SorLA, and SorCS1-3 family) mutation in type 2 diabetes (7476). The functions of the proteins encoded by these genes in atherosclerosis and diabetes remain to be clarified. Nevertheless, what is known about these genes is consistent with a role in disease. For example, the OX40 ligand is thought to enhance T-cell function, and it has been shown previously that T-lymphocytes promote atherosclerosis (77). Consistent with this, mice carrying targeted mutations in Tnfsf4 have smaller atherosclerotic lesions, whereas transgenic overexpression resulted in larger lesions (75). Furthermore, two independent human populations exhibited an association between an SNP within the gene and the risk of myocardial infarction (75). In the case of 5-lipoxygenase, it is well established that this enzyme has a role in acute inflammation through the production of leukotrienes from arachidonic acid (reviewed in 78). In atherogenesis, it is possible that 5-lipoxygenase may function in processes ranging from monocyte chemotaxis and proliferation to lesion destabilization and rupture. The role of SorCS1 in type 2 diabetes is unclear at present, but it may be related to Sorcs1 expression in pancreatic islets. Mice with altered Sorcs1 expression levels exhibit decreased insulin secretion and disrupted islet morphology (76). One hypothesis is that SorCS1, through the binding of platelet-derived growth factor, has a role in stabilizing the microvasculature in islets, which could influence islet growth, survival, and insulin secretion.
A promising approach for the identification of genes underlying complex traits is known as "genetical genomics," which integrates linkage analysis and gene expression data for progeny in a genetic cross. This technique has only recently become feasible with the advent of affordable whole genome linkage and expression analysis via microarray technology. It has already been embraced by several groups for the analysis of complex traits in mouse and rat and promises to revolutionize the identification of genes and gene networks controlling lipid metabolism and cardiovascular disease (reviewed in 7982). In this approach, animals in an experimental cross are each typed for a genome-wide panel of genetic markers as well as clinical phenotypes (e.g., lipid levels, fat mass, or atherosclerotic lesion scores) and gene expression levels of thousands of genes using DNA microarrays. The combination of gene expression levels with genetic linkage data allows the mapping of loci that determine transcript abundance, known as "expression QTLs," in the same way as is traditionally used to map a lipid or disease phenotype. The mapping of these loci immediately reveals whether the control point for determining the levels of a transcript occurs at the gene itself (i.e., acting in cis) or is physically located elsewhere in the genome, implying regulation by some other gene(s) (acting in trans). If a locus for cis-regulated gene expression coincides with that for a clinical trait, the corresponding gene serves as a likely candidate for determining the trait and can be interrogated for DNA sequence variations that might be responsible. Thus, the integration of linkage and gene expression data with clinical traits will provide a powerful tool to prioritize candidate genes mapping to QTLs in complex traits. An example is the recent identification of Insig2 as a susceptibility gene for plasma cholesterol levels (83). Interestingly, this approach also detected Alox5 as a gene that influences susceptibility to obesity and bone density (84). This approach is being applied to identify gene variations that influence complex traits such as obesity, plasma lipid levels, atherosclerosis, and diabetes and will no doubt lead to candidate genes and increased understanding of the gene networks that determine these complex traits (8588).
In humans, numerous potential susceptibility genes for traits such as hyperlipidemia, obesity, and coronary artery disease have been reported through typing genetic polymorphisms in case-control studies. There are, however, caveats in interpreting results obtained using this approach, as false-positives are common as a result of factors such as population admixture or selection bias. Here, we will consider only susceptibility genes for complex lipid-related disorders that have been identified by whole genome linkage and positional cloning. A striking example is the identification of a gene contributing to FCHL, which is characterized by increased serum cholesterol and triglyceride levels and is the most common familial lipid disorder predisposing to premature heart disease (89). Using whole genome scans in a Finnish population, a QTL for FCHL was identified on chromosome 1q21-23 (90). Coincidentally, this region of the human genome corresponds to a region of mouse chromosome 3 that harbors the Txnip mutation causing combined hyperlipidema, which raised the possibility that the same gene was responsible for this disorder in mouse and human (48, 90). However, subsequent high-resolution mapping and positional candidate gene analysis revealed that the human FCHL susceptibility gene is USF1. USF1 encodes upstream stimulatory factor-1, a transcription factor that regulates several genes involved in lipid and glucose homeostasis, making it an attractive fit for the observed phenotype of FCHL (91). Although the USF1 gene variants identified (two synonymous SNPs) were strongly associated with FCHL in Finnish subjects, it is unclear whether these are the two causative variants. However, recent confirmation of the associations between USF1 and FCHL in other populations provides strong evidence that USF1 is a susceptibility gene for FCHL (92, 93).
A positional cloning/candidate approach was also used to identify an obesity susceptibility gene, SLC6A14 (94). This gene encodes an amino acid transporter that is thought to regulate tryptophan availability for serotonin synthesis and hence may influence appetite control. Susceptibility genes for coronary artery disease were recently also identified using positional candidate or genome-wide association studies. These were reviewed recently and include the MEF2A transcription factor, LTA (lymphotoxin-
), ALOX5AP (5-lipoxygenase-activating protein), and PDE4D (phosphodiesterase 4D) (reviewed in 95). These recent successes in the identification of genes underlying not only monogenic but also multigenic lipid disorders indicate that optimism is warranted for an eventual understanding of key genetic factors in common human lipid disorders.
| FUNCTIONAL GENOMICS FOR THE ELUCIDATION OF GENE FUNCTION IN LIPID METABOLISM |
|---|
|
|
|---|
Ethylnitrosourea mutagenesis
Although not a new technique, chemical mutagenesis of mice recently experienced a renaissance as a method for functional genome analysis on a large scale (105108). The chemical of choice is ethylnitrosourea (ENU), an alkylating agent that causes mutation at a frequency of 3001,200 times greater than the spontaneous mutation rate (109). ENU is administered to male mice in a series of injections, which mutagenizes spermatogonial stem cells. The mutagenized males are mated to wild-type females, and the resulting offspring are screened directly to detect dominant mutations or mated for two additional generations to detect recessive mutations. Mutations induced by ENU are typically point mutations, which may lead to missense, nonsense, and splice-site mutations. Thus, unlike gene knockouts, which result in complete ablation of gene function, ENU treatment results in a variety of mutation types, including loss-of-function, hypomorphic, which reduce but do not abolish gene function, and antimorphic, which antagonize gene function. The analysis of an alleleic series consisting of several distinct mutations within a gene can provide valuable information about gene and protein function that is not necessarily revealed by the analysis of knockout mutations, which represents an advantage of the ENU mutagenesis approach. A disadvantage of the approach is the necessity to produce and perform phenotypic assays on hundreds or thousands of live mice to reveal phenotypes of interest. Furthermore, once a phenotype is detected, it must be test bred to demonstrate that it is indeed heritable, followed by genetic mapping and positional cloning to identify the mutant gene. Positional cloning of mutant loci has been a bottleneck in this process, but the recent development of techniques for "gene-driven" ENU screens may alleviate this problem (discussed briefly below).
Among the clinical phenotypes frequently assessed in ENU-mutagenized mice are traits that are relevant to lipid metabolism, obesity, and diabetes. These include blood glucose and lipid levels, heart rate, blood pressure, body weight, and body composition. ENU mutagenesis programs based in the United States, Germany, and England have all identified ENU mutants with alterations in several of these traits (110113) (Table 1). For example, the Munich ENU Mouse Mutagenesis Project has screened for increased plasma cholesterol levels in 15,000 mutagenized mice (113). More than 100 mutant mice were detected with hypercholesterolemia, and from these, nine distinct hypercholesterolemic mouse lines have been developed. Likewise, the UK ENU Mutagenesis Program from Harwell reported mutants with low plasma total cholesterol, low HDL cholesterol, and increased triglycerides (112). Several mutagenesis projects make animals with various phenotypes available to the public for further characterization. The Mouse Heart, Lung, Blood, and Sleep Disorders Center at the Jackson Laboratory has identified mice with reduced or increased total cholesterol, HDL cholesterol, triglyceride, and glucose levels, hypertension, abnormal heart rate, or electrocardiogram measurements, anemia, coagulation abnormalities, and obesity (see Table 1 for website). Some of the mutations have already been mapped to specific chromosomal regions, including mutations causing high total cholesterol, low HDL cholesterol, and hyperglycemia. Mutants characterized in this screen are made available to interested investigators for a nominal fee through a web-based interface for a limited time period.
Upon identification of an interesting phenotype in a mutagenized mouse, the task to identify the underlying mutation by positional cloning is similar to that described above. As most of these mutagenesis projects are relatively recent, to date there have been few lipid metabolism genes isolated from ENU-mutagenized mice. One interesting example is the identification of a mutation in Cd36, previously shown to be a physiologic receptor for oxidized LDL, in an ENU-mutagenized mouse with an immunodeficiency phenotype (114). This finding revealed a role for CD36 as a sensor of microbial diacylglycerides as a requirement for mounting an immune response against numerous bacteria, fungi, and protozoa.
In the future, alternatives to the standard phenotypic screening followed by positional cloning may increase the throughput and utility of ENU mutagenesis. With the availability of mouse genome sequence and high-throughput mutation screening via PCR coupled with high-temperature gradient capillary electrophoresis, it is now possible to screen cryopreserved sperm from ENU-mutagenized mice for mutations in any gene of interest (115117). Sperm containing mutations of interest are then used to generate live mice by in vitro fertilization. In the future, this approach may enable high-throughput, gene-driven, rather than phenotype-driven, ENU screens of an archive containing thousands of sperm samples with mutations in a large proportion of the genes.
Gene-trap mutagenesis
Gene-trap mutagenesis involves the mutation of mouse embryonic stem cells by random insertion of trapping vectors into genes or gene regulatory sequences, leading to the impairment or ablation of gene function (reviewed in 118). The site of insertion can be determined through PCR and sequencing using the vector as a tag, and the results can be collated into databases that can be searched to locate mutations in specific genes. This is a true gene-driven approach, as live mice are not generated until a mutation of interest is identified in the database of embryonic stem cells. Gene-trap vectors typically contain a reporter gene (e.g., ß-galactosidase) and a selectable marker gene (e.g., neomycinR) or a combination of both functions (e.g., ß-geo). Integration of a gene-trap vector into a gene intron leads to the production of a fusion mRNA transcript containing upstream exons from the endogenous gene joined to the reporter gene sequences by virtue of a splice acceptor site incorporated into the vector (Fig. 4A
). The translation of the fusion transcript results in the expression of the reporter gene driven by the endogenous gene regulatory sequences in a pattern that mimics the expression of the endogenous gene. Staining for ß-galactosidase reporter expression provides a tool to learn about the expression of the gene under study. Mutations arising from the insertion of gene-trap vectors have a high likelihood of causing loss-of-function or hypomorphic mutations, depending on the position of insertion within the gene.
|
ATP citrate lyase mutation A gene-trap mutation was used to examine the physiological function of ATP citrate lyase, one of two cytosolic enzymes that synthesize acetyl-CoA, a precursor of both triglycerides and cholesterol. An embryonic stem cell line containing a gene-trap insertion in the ATP citrate lyase gene (Acly) was identified in the gene-trap database and used to generate mice (121). The homozygous Acly mutation was embryonic lethal, but as a result of the insertion of the ß-galactosidase reporter gene, it was possible to histologically characterize in detail the in vivo expression pattern of ATP citrate lyase in heterozygotes carrying one copy of the gene-trap allele. These studies revealed ubiquitous expression of ATP citrate lyase, but particularly high levels in tissues undergoing high rates of de novo lipogenesis, including liver of mice fed a high-carbohydrate diet, and in developing brain. Interestingly, although Acly expression was much lower in adult compared with developing brain, it remained particularly pronounced in cholinergic neurons, because of a requirement for ATP citrate lyase in acetylcholine synthesis.
Phosphatidylserine decarboxylase mutation The role of the phosphatidylserine decarboxylase gene (Psid) in phospholipid biosynthesis was recently elucidated in vivo through the characterization of a gene-trap mutant (122). Phosphatidylethanolamine synthesis occurs in the endoplasmic reticulum and in the mitochondria and is controlled by Psid specifically in mitochondria. Mice homozygous for the disruption of Psid died during development, revealing that phosphatidylethanolamine synthesis in the endoplasmic reticulum cannot substitute for the mitochondrial pathway. Furthermore, fibroblasts isolated from Psid/ mice revealed that the reduction in phosphatidylethanolamine levels resulted in abnormally shaped and fragmented mitochondria. Heterozygous mutant mice exhibited an upregulation of a key enzyme in the endoplasmic reticulum pathway for phosphatidylethanolamine synthesis and, hence, normal tissue phospholipid content.
Phosphatidylserine synthase 2 mutation
An additional gene with a role in phospholipid synthesis, phosphatidylserine synthase 2 (Pss2), has been investigated in gene-trap mice. Pss1 and Pss2 encode two distinct serine-exchange enzymes involved in phosphatidylserine synthesis. Mice homozygous for gene-trap-induced PSS2 deficiency had normal development and tissue phospholipid content, although
10% of the animals exhibited testicular atrophy (123). Analysis of the reporter gene revealed high Pss2 expression in testis as well as brown adipose tissue, neurons, and myometrium. Characterization of serine-exchange activity in tissues from Pss2/ mice revealed a compensatory mechanism to maintain normal phosphatidylserine content involving both reduced phosphatidylserine degradation and increased PSS1 activity (124).
Acylglycerolphosphate acyltransferase 6 mutation One approach to use the gene-trap database is to search for insertions in genes of interest, as illustrated in the examples above. Another approach is to examine the sequences of genes of unknown function that have sustained gene-trap insertions. Using the latter approach, we identified an insertion in a gene (Agpat6) with sequence similarity to a family of 1-acylglycerol-3-phosphate O-acyltransferase (AGPAT) enzymes. AGPATs are involved in triacylglycerol synthesis, catalyzing the transfer of acyl groups to lysophosphatidic acid. The most well-characterized member of the family, AGPAT2, has been shown to have a regulatory role in adipocyte differentiation, and null mutations result in congenital lipodystrophy (125, 126). Mice carrying a gene-trap mutation in Agpat6 revealed important roles for AGPAT6 in triglyceride accumulation in white and brown adipose tissue and in milk (127, 128). Using the ß-galactosidase reporter gene, we determined that Agpat6 is expressed in several tissues, with extremely high expression in brown adipose tissue and substantial levels in white adipose tissue, testis, and specific regions of the brain (Fig. 5A , B). Agpat6/ mice were viable, but offspring of AGPAT6-deficient mothers died within a few days unless transferred to a foster mother (127). This prompted an analysis of Agpat6 expression in mammary gland, where ß-galactosidase staining was detected in epithelial cells. Mammary gland from lactating Agpat6/ mice had underdeveloped alveoli and ducts and reduced triglyceride accumulation. AGPAT6-deficient mice also had 25% lower body weight attributable to reduced triglyceride storage in several adipose tissue depots (128). Adipose tissue in the subdermal region was reduced dramatically (Fig. 5C). Agpat6/ mice exhibited resistance to both diet-induced and genetically induced obesity, which was associated with reduced triglyceride content in adipose tissue and increased energy expenditure. These studies revealed that Agpat6 plays a unique physiological role that cannot be substituted for by other members of this gene family.
|
Another promising addition to the gene-trapping repertoire is the technique of "targeted trapping." As mentioned above, gene-trap insertion is not fully random, and the technique is therefore likely to reach a plateau without achieving gene-trap insertions in all protein-coding genes (119). To produce mutations in the remaining genes, a high-efficiency method of targeted mutation would be valuable. The targeted trapping strategy, which involves homologous recombination with a promoterless gene-trapping vector, produces targeted mutations in specific genes with >50% efficiency, a substantial improvement over the 15% efficiency of standard homologous recombination (129). The primary limitation of this technique is that the target gene must be expressed to some degree in embryonic stem cells so that the selectable marker present in the promoterless construct will be expressed. Recent studies have defined a threshold expression level corresponding to 1% of that of the transferrin receptor (129). Targeted trapping may be applicable to perhaps half of the genes remaining after the saturation of standard gene-trapping techniques, and other methods may be required for the remaining genes. It is possible that the use of different families of trapping vectors coupled with various delivery methods (e.g., electroporation vs. retroviral infection) may also allow gene-trap insertions in additional genes. Additional techniques for the high-throughput generation of targeting vectors, such as bacterial artificial chromosome recombineering, may also increase the efficiency of targeted trapping in the future (130, 131).
| PROSPECTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
Manuscript received July 6, 2006
| REFERENCES |
|---|
|
|
|---|