|
|
||||||||
Papers In Press, published online ahead of print October 1, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thematic Review |


* Department of Human Genetics, Department of Medicine, and Department of Microbiology, Immunology, and Molecular Genetics, and Molecular Biology Institute, University of California, Los Angeles, CA 90095-1679
Department of Biomathematics, University of California, Los Angeles, CA 90095-1766
Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095-1732
The online version of this article (available at http://www.jlr.org) contains an additional table. ![]()
Published, JLR Papers in Press, August 1, 2004. DOI 10.1194/jlr.R400006-JLR200
1 To whom correspondence should be addressed. e-mail: jlusis{at}mednet.ucla.edu
| ABSTRACT |
|---|
In this review, we discuss how the integration of genetics and technologies such as transcriptomics and proteomics, combined with mathematical modeling, may lead to an understanding of such networks.
Supplementary key words systems biology transgenic mice quantitative trait locus mapping principal components Bayesian networks correlation coefficients genetics genomics proteomics
| INTRODUCTION |
|---|
|
| HUMAN STUDIES |
|---|
Genetic contributions to common, complex forms of atherosclerosis (and of traits such as lipid metabolism that are relevant to the disease) were first studied by population association with candidate genes based on our biochemical knowledge. One of the first important examples of this was apolipoprotein E (apoE), which was found to exhibit three common alleles in all populations studied (E2, E3, and E4). The E2 form was found to be strongly associated with a relatively uncommon dyslipidemia (type III) and with low cholesterol levels in the population, and the E4 allele was associated with increased cholesterol levels. Since the early 1980s, thousands of association studies with candidate genes have been performed for traits relevant to atherosclerosis. Of these, approximately a dozen have shown rather consistent findings [e.g., hepatic lipase with HDL levels and peroxisome proliferator-activated receptor
(PPAR
) with type 2 diabetes], but most remain questionable, including many that have been studied in multiple, relatively large populations (2).
Since the mid 1990s, a common paradigm in human studies of complex traits has been to carry out linkage analysis in families to identify the regions of the genome harboring the most significant common genetic factors, followed by either linkage disequilibrium analysis of the region to pinpoint the underlying gene or the testing of "positional candidates" at that locus. The first successful example of this was the identification of calpain 10 in type 2 diabetes in a large set of Mexican-American families (3). Other similar studies have now identified several other loci and genes relevant to atherosclerosis (2). Linkage analysis has very limited power for complex traits and thus will reveal only the strongest and most common variations in the populations being studied (4). With the advent of cheaper methods for the detection of polymorphisms, genome-wide association studies are becoming feasible. For example, Ozaki et al. (5) carried out a study of single nucleotide polymorphisms in thousands of individuals in Japan who had been studied for coronary heart disease and identified several genes exhibiting strong evidence of association. These were then studied in a second set of families, and one gene, lymphotoxin-
, was found to be highly significant in the second set of individuals as well. A detailed map of common polymorphism haplotypes (HapMap) of the genome with a single nucleotide polymorphism (SNP) every kilobase or so should be completed by 2005, and this should greatly aid in the implementation of whole-genome association studies (www.hapmap.org).
Why have efforts to identify genes for the common forms of atherosclerosis been largely unsuccessful? One reason, of course, is that genes for the common forms have mostly modest effects that are difficult to detect in the background of many genetic and environmental perturbations. Another important reason is likely related to epistatic interactions. Thus, the effects of certain variations may influence phenotypes only in particular genetic backgrounds. This may explain why human studies frequently fail to replicate other human studies (different populations) or animal findings (different genetic context) (6). It seems unlikely that the goal of understanding in detail the genetic network involved in atherogenesis can be achieved in the foreseeable future by direct studies of human populations. Given the extensive conservation of gene structure and function among mammals (mice and humans differ by
300 genes), the overall features of this network are likely to be similar between humans and other mammals. Therefore, the most useful approach will be to work out details of the network in animal models and then examine the corresponding features in human populations. It will be particularly important to define gene-gene and gene-environment interactions in animal models, because these will be the most challenging aspects of the problem.
Because atherosclerosis involves many cell types and important systemic influences, tissue culture studies will reveal only a subset of the important interactions. Nevertheless, such studies will importantly complement in vivo studies (7). In particular, expression array analyses of cells in response to genetic, nutritional, or pharmacologic perturbations should help in the formulation or validation of network models. For example, Johnson et al. (8) studied gene expression profiles of vascular smooth muscle cells in response to a polycyclic aromatic hydrocarbon present in tobacco smoke. Studies of cells obtained from individuals with various Mendelian or complex disorders may also be informative when subjected to genomic, proteomic, or metabolomic analyses.
| KNOCKOUTS AND OTHER SINGLE GENE MUTATIONS |
|---|
At present, investigators primarily use transgenic approaches to study candidate genes, but with the completion of the genome sequences of human and various model organisms, including rat and mouse, an important future goal will be to define the functions of all 35,000 or so mammalian genes. For this, classic gene-specific approaches will be too laborious and time-consuming, and gene-trap mutagenesis or RNA interference (RNAi) approaches will be used instead. Already, several large gene-trap libraries of embryonic stem cells have been produced (10). Another approach for identifying genes relevant to specific processes involves the use of spontaneous or chemically induced mutations. Spontaneous mutations in mice, for example, have proven very useful for examining aspects of lipid metabolism (11). Several large-scale chemical mutagenesis screens of mice are being performed at present in the public and private sectors using ethyl nitrosourea, an alkylating agent that introduces point mutations at a high frequency (12).
Transgenic animals are usually characterized only with respect to a few phenotypes, such as the amount of atherosclerosis, the complexity of the lesions, the levels of plasma lipids, or the expression of selected candidate genes. Such results provide only a small fraction of the potential information that can be extracted with respect to networks. For example, genome-wide microarray analyses could be performed on a variety of tissues to provide a picture of the components of the transcriptional network that are perturbed. Such data could help in the formulation and validation of network models. For such studies, it may be preferable to examine animals in which the expression of a gene is altered but not totally ablated, because the latter condition may result in many nonphysiological alterations. Parallel studies in tissue culture cells can be used to complement or guide the animal studies (13).
| DISSECTION OF COMPLEX TRAITS IN ANIMAL MODELS |
|---|
Among inbred strains of mice and rats are variations relevant to most aspects of atherogenesis: plasma lipoprotein levels, blood pressure, diabetes, obesity, inflammation, atherosclerotic lesion development, lesion composition, lesion calcification, lesion-related medial destruction, and dietary responsiveness. Some of these variations are observed only in sensitized genetic backgrounds, such as hypercholesterolemia induced by null mutations for apoE or the LDL receptor. Recent studies have shown that hypercholesterolemic mice also show evidence of lesion rupture, although the occlusive thrombosis that is an important feature of the clinical disease has not been observed. These genetic variations tend to be very complex in rodents as well as in humans (16).
The genetic loci responsible for these variations can be mapped by linkage analysis [quantitative trait locus (QTL) mapping] in crosses between different strains (see, for example, Fig. 2) . A recent review from the Complex Trait Consortium provides a clear and concise overview of QTL mapping (17). Generally, a hundred or more backcross or intercross progeny are generated and typed for the traits of interest and for genetic markers spaced at intervals along the genome. A variety of programs are available to perform linkage analysis, with features that permit interval mapping (testing for linkage between markers), calculation of statistical evidence of linkage, and analysis of epistasis and other interactions between loci. Such studies have shown that most of the traits relevant to atherosclerosis are highly complex and frequently exhibit epistasis. In crosses between a handful of strains, dozens of loci for plasma lipoprotein levels, body fat, and lesion size have been mapped [reviewed in ref. (16)].
|
In cases in which the effect of a QTL is very modest or the coefficient of variation of the trait is very large (as in the size of atherosclerotic lesions), progeny testing or the construction of subcongenic lines is required for fine mapping (18). The goal of fine mapping is to reduce the size of the critical region to
1 or 2 Mb so that a relatively small number of candidate genes remain.
The construction of congenic strains is expensive and time-consuming, even when using a "rapid congenic" approach. Several whole-genome congenic libraries have now been constructed, allowing this step to be bypassed if the QTL alleles differ between the appropriate strains (1921). Recently, Singer et al. (21) surveyed one set of "chromosome substitution strains" (congenic strains in which entire chromosomes are substituted) between strains A and C57BL/6 for several traits relevant to atherosclerosis, including plasma levels of cholesterol, campesterol, and sitosterol, weight gain in response to two different diets, and plasma levels of various metabolites. Altogether, in a survey of 53 traits, they identified
150 different loci. These included loci for cholesterol on 8 different chromosomes, loci for sitosterol on 14 chromosomes, and loci for weight gain on 17 chromosomes. The authors suggest that direct surveys of such congenic strains provides a more sensitive way of locating QTLs compared with genetic crosses, because the latter exhibit "phenotypic noise" resulting from the simultaneous segregation of multiple QTLs (21).
It has been suggested that in silico SNP haplotype analysis (analysis of haplotypes that are available in databases) across inbred strains of mice might be a useful strategy for mapping complex traits (22). Although the approach is probably of limited utility for highly complex traits (23, 24), it can be very useful in conjunction with analysis of QTLs in multiple crosses to identify which strains are likely to share a common allele (25). Extensive SNP databases for a number of strains are now available and are rapidly expanding.
The identity of the gene underlying a QTL is normally confirmed by examining the effects of a knockout or a transgene on the phenotype. For this, one would normally first search the literature for previously engineered mice, including gene-trap libraries. If none can be identified, it may be possible to examine aspects of the phenotype in cultured cells. We have also used bacterial artificial chromosomes harboring candidate genes for the construction of transgenic mice, reasoning that for most quantitative traits, a 1- or 2-fold perturbation in the level of expression of a gene will influence the final phenotype (although this will not always be the case). The strongest evidence, of course, would be to replace one allele for another using a "knock-in" strategy, although this should not be required as "proof" of the identity of the underlying gene.
Although QTL mapping has great power to detect linkage, the identification of genes underlying the QTL has proven to be very difficult. For example, more than 20 different loci for atherosclerotic lesions have been identified in mice, but of these, only 2 genes, both positional candidates, have been confirmed using transgenic approaches (16). The recent completion of the sequencing of the mouse and rat genomes will considerably aid in the harvest of genes, but the identification of novel genes will still be limited by recombination intervals.
Williams, Haines, and Moore (6) recently proposed the construction of a very large (
1,000) set of recombinant inbred (RI) strains to provide a tool for rapid fine mapping of QTLs. RI strains are produced by crossing two or more inbred strains and then inbreeding the progeny to genetically fix particular combinations of alleles from the parental strains. The RI strains would be derived from eight highly diverse inbred strains to incorporate a great deal of naturally occurring variation and would be genotyped at a very high density, allowing resolution of
100,000 unique recombination breakpoints with an average spacing of
25 kb (26). Envisioned as a "collaborative cross" that would be used and maintained by multiple scientists and institutions, the RI set would be used for QTL analysis in three stages. First, a subset of RI strains would be studied to roughly map the QTL. Second, 100200 strains with breakpoints in the interval of interest would be examined for the phenotype. Third, all mice with relevant breakpoints (including other QTLs for the trait) would be studied.
| GENOMICS, TRANSCRIPTOMICS, PROTEOMICS, AND METABOLOMICS |
|---|
In the case of complex disorders, differences in gene expression may be subtle and thus difficult to detect, and human studies are likely to be complicated by genetic heterogeneity. For example, attempts to identify significant differences in the expression profiles of muscle from type II diabetics compared with normal volunteers have failed to reveal differences in individual genes. Mootha et al. (32) used an ingenious approach to the problem: rather than test for differences in the expression of individual genes, they tested for overall differences in expression patterns of various sets of genes in annotated pathways. In this study, they used 149 metabolic pathways and groups of functionally (or spatially) related genes and computed a score for each pathway/group based on the combined differential expression measure of the genes in each group. The score each pathway received was proportional to the number of genes enriched in the microarray profiling data. Pathways were then ranked based on the score they received, and the statistical significance of the score of top-ranking pathways was determined using a permutation test. The analysis revealed that groups of genes involved in oxidative phosphorylation and mitochondrial functions ranked highest, although the overall changes in gene expression in diabetics compared with controls were relatively modest (32). One of the genes downregulated in diabetic patients was PGC-1
, a primary regulator of metabolism. Overexpression of PGC-1
in a mouse skeletal muscle cell line resulted in increased expression of many of the oxidative phosphorylation genes in the identified pathways. Although this study did not result in the identification of the causal genes in type II diabetes, it did suggest that they act by perturbing oxidative phosphorylation. As discussed below, expression array analysis in combination with genetic or environmental perturbations provides a powerful approach not only for the identification of candidate genes underlying complex traits but also for the elucidation of causal interactions between genes and traits.
Gene expression, of course, will not capture many important interactions within a cell. Thus, the correlation between transcript levels and protein levels is poor for many proteins, and the activities of many proteins are further regulated by modifications such as phosphorylation or proteolysis. Moreover, structural variations such as missense mutations or alternative splicing are unlikely to be detected by standard expression arrays. Large-scale analysis of proteins has the potential to provide a more comprehensive understanding of complex biological processes, but methods for comprehensive screening for differences in protein levels or structures have not yet been developed. Two-dimensional polyacrylamide gel electrophoresis (2D PAGE) has very limited sensitivity (usually
1,000 proteins). Nevertheless, several studies have used 2D PAGE to identify numerous differences in protein levels that occur during atherogenesis. An extension of 2D PAGE is differential in-gel electrophoresis, in which two pools of proteins are labeled with different fluorescent dyes, allowing detection of quantitative differences between the pools (33, 34). Mass spectrophotometric methods have great sensitivity but are difficult to apply on a genome-wide level. Protein microarrays have been designed to capture various features of functional proteomics, including protein levels, protein-protein interactions, and activity. These arrays are essentially high-throughput versions of enzyme-linked immunosorbent assays, in which characterized peptides or antibodies are immobilized on the surface of a chip and subsequently probed with the sample of interest.
A number of different applications have been developed to characterize protein-protein interactions, including the yeast two-hybrid system, a genetic assay in which binding is detected upon induction of reporter genes (35, 36). To facilitate the characterization of post-translational modifications, such as phosphorylation, several mass spectrophotometry-based techniques, including multi-dimensional protein identification technology, isotope-coded affinity tagging, and Fourier transform ion cyclotron resonance, have the capability to detect protein alterations. In the section on Biological Networks below, we discuss the results of genome-wide yeast two-hybrid analysis that has provided a comprehensive network of protein-protein interactions in several organisms.
Like the other "omic" technologies, metabolomics seeks to identify all gene products (transcripts, proteins, or metabolites) present in biological samples and to elucidate the quantitative dynamics of these products. The principal tools for metabolomics are gas-liquid chromatography coupled with mass spectrometry. Most progress in the metabolomics field has involved plant biology, but there are now a number of reports relevant to atherosclerosis and diabetes. For example, Watkins et al. (37) carried out a comprehensive metabolic assessment of lipid metabolites to identify the specific effects of the PPAR
agonist rosiglitazone in a mouse model of type 2 diabetes. The authors demonstrated a large number of tissue-specific metabolic effects and proposed that metabolomics has excellent potential for the clinical assessment of responses to drug therapy (37). Metabolomics will be most powerful when coupled with other functional genomics approaches.
| COMBINING GENETICS AND GENE EXPRESSION |
|---|
An example of this approach was the analysis of HDL levels in a cross between two strains differing in the response of HDL to an atherogenic diet. C3H mice maintain high levels of HDL on a high-fat diet, whereas strain C57BL/6 mice show a reduction in response to the diet. To test for the potential involvement of bile acid metabolism in this trait, Machleder et al. (40) quantified mRNA levels of cholesterol-7
-hydroxylase (CYP7A) as well as HDL levels in the cross. They observed three loci that segregated for HDL levels, and at each locus they also observed QTLs for the mRNA levels of CYP7A. Because the structural gene for CYP7A was located outside of any of these regions, it was clear that it was regulated in trans by several unlinked genes. The observation that the CYP7A transcript levels segregated with HDL suggested that it was involved in the HDL trait.
More recently, microarrays have been used to assess genome-wide transcriptional activity in segregating populations, offering a powerful tool to dissect causal relationships between genes and traits. As discussed by Jansen and Nap (38, 39), the analysis of gene expression in segregating populations with multiple genetic perturbations can potentially reveal much information about gene-gene and gene-clinical trait interactions. This approach was first applied to yeast, in which genome-wide analyses of transcript levels in a cross between two divergent strains revealed a large number of loci of both the cis-acting and trans-acting variety (41). Subsequently, two studies were performed in mice involving crosses of strains differing in diabetes-related traits (42, 43).
Lan et al. (42) studied the levels of expression of a number of candidate genes for insulin resistance and lipid metabolism segregating in a mouse cross. Using principal components analysis, they were able to identify groups of transcripts whose levels were explained by principal components (Fig. 3) . Such principal components likely correspond to trans-acting factors influencing a set of genes.
|
One benefit of a "genetics of gene expression" approach is that it provides candidate genes for QTL studies. Figure 2 illustrates the use of cis eQTL underlying a phenotypic QTL to prioritize candidate genes. In this example, a QTL for plasma HDL levels was identified in a cross between DBA/2 and C57BL/6 (44). This region encompasses more than 69 genes in 12 Mb, but of these genes only 3 exhibited significant cis-acting eQTLs (Fig. 2). A genetics of gene expression approach can also be used to subclassify animals in a cross based on their expression profiles, similar to the use of microarrays for the classification of cancers. For example, Schadt et al. (43) identified genes that best distinguished thin from fat mice and showed that these fell into groups relating to different QTLs for body fat. Probably the most important application of the genetics of gene expression will be to construct gene networks for biological traits and identify causal interactions.
| STATISTICAL ANALYSIS OF DATA FOR COMPLEX TRAITS |
|---|
The use of large data sets, involving thousands of genes and multiple traits, raises statistical issues such as false discovery rates and difficulties in integrating multidimensional information (42, 45). Dimension reduction techniques can simplify such data sets and avoid the issue of multiple comparisons (46, 47). One such technique is principal component analysis (42, 48). Principal component analysis captures orthogonal linear combinations of correlated variables such as gene expression values, and each combination is called a principal component (PC). PCs are ranked based on their significance in explaining the variance in a data set. Two- or three-dimensional plots can be constructed with the first two or three PCs that capture most of the information in the data. The resulting visual display may elucidate how the variables are grouped into clusters and how important each variable is in each PC. Figure 3 shows an example of principal component analysis in a study conducted by the Attie group (42). In this study, the expression levels of seven genes involved in metabolic pathways were analyzed against several phenotypes, including glucose level, insulin level, and body weight, in an F2-ob/ob cross between C57BL/6J and BTBR. Two PCs were identified, with the first PC encompassing mRNA levels of SCD1, FAS, GPAT, and PEPCK and the second encompassing mRNA levels of PPAR
, SREBP, and ACO. The first principal component, mostly driven by the expression levels of SCD1 and FAS, was found to be strongly associated with the insulin trait. In this case, by performing QTL mapping for the two PCs instead of seven individual genes, the dimensions of the analysis were significantly reduced.
In studying complex biological traits, various data-mining tools have gained popularity. These methods can allow efficient and flexible integration of a large number of genetic and environmental factors as well as their interactions into the overall picture (49, 50). The essential technique of data mining used in such applications is pattern recognition, that is, extraction of hidden covariates of predictive value for a complex trait from a given data set. In addition, complex nonlinear interactions between the covariates may be detected. Figure 4 describes two data-mining approaches, neural network analysis and tree-based recursive partitioning, that have proved useful in linkage and population association studies with various traits relevant to atherosclerosis (4956). In addition to neural networks and tree-based methods, other data-mining tools, such as discriminant analysis, Bayesian variable selection, combinatorial partitioning, stepwise regression, and automated detection of informative combined effects, have shown promise in dissecting the genetics of complex traits such as myocardial infarction, hypertension, and cholesterol levels (5759).
|
) and sample data (x) are used to calculate the posterior (or conditional) distribution of
given data x using the equation P(
|x) = [P(x|
)P(
)]/P(x), where P(
|x) is the probability of
given x, P(x|
) is the probability of x given
, P(
) is the probability of
, and P(x) is the probability of x. Graphical models that use Bayes's rule of inference, termed Bayesian networks, have been used increasingly to model complex biological processes such as metabolic and transcriptional regulatory pathways (6164). An excellent introduction to Bayesian networks can be found at http://www.ai.mit.edu/~murphyk/Bayes/bnintro.html. Bayesian networks combine probability and directed graphs to visually depict conditional dependencies between large numbers of variables. Figure 5
illustrates one possible Bayesian network for atherosclerosis. The network incorporates measurements of diet, genotype, obesity, diabetes, cholesterol, and atherosclerosis. Analysis of such a network (6567) could be used to classify genes into functional categories. If the Bayesian network correctly captures the causal dependencies between variables, then given enough data, genes that mediate cholesterol's effect on atherosclerosis (gene A in Fig. 5) could be distinguished from those that act on atherosclerosis via obesity (gene B in Fig. 5), and both could be distinguished from those genes that act directly on atherosclerosis risk (gene C in Fig. 5). More complex models are possible and are the norm. Bayesian networks allow the integration of data from multiple studies, enable ready incorporation of medical and biochemical background knowledge, and can assess the consistency of observational and experimental data with different functional roles for genes.
|
| BIOLOGICAL NETWORKS |
|---|
An important characteristic of networks compared with linear pathways is increased flexibility to respond to diverse conditions. For example, in a network the same output can be produced in multiple ways. This "buffering" capacity explains in part the common finding of knockouts with little or no apparent effects. Although "redundancy" is frequently invoked to explain the absence of phenotypes in knockouts, different genes cannot be completely redundant because natural selection would not maintain two genes for exactly the same function. The plasticity of a response is also greatly increased by multicellularity (as is the case with atherosclerosis). Thus, interactions between cells that are themselves nonidentical result in exponential increases in the possible combinations (68). Although such networks have increased buffering capacity and plasticity, their extensive interactions make them sensitive to many different perturbations. Thus, in the case of cardiovascular disease, large numbers of genetic and environmental factors are seen to influence susceptibility. This is strikingly observed in mouse models of atherosclerosis, in which more than 100 different knockouts have been observed to influence the development of lesions (2).
A recent review by Barabasi and Oltvai (69) highlights the emerging properties of biological networks. Networks can be constructed using various "nodes," including proteins, metabolites, or genes. Although networks have been studied in most detail in yeast and bacteria, the networks of all organisms appear to share similar global properties. Typically, most nodes in a network have few links, although some nodes have numerous links. Such networks are termed "scale-free." These contrast with "random networks," in which all nodes have similar connectivities (Fig. 6) . In scale-free networks, nodes with numerous links, also referred to as hubs, play a central role in shaping the network's behavior. Scale-free networks are characterized by a high degree of robustness. That is, if a change occurs in nodes of the network with few connectivities, there would likely be strong resistance against perpetuation of the change throughout the network. Biologically, this means that mutations or environmental factors affecting a gene or a pathway will not result in drastic changes in the overall structure of the network. For example, knockout of a gene that happens to be a node with few connections to other genes will generally have a much smaller effect than knockout of a hub gene. Consistent with this notion, Jeong and colleagues (70) reported that in yeast knockouts of genes with many connections were much more likely to be lethal than knockouts of genes with few connections.
|
One striking example is the Drosophila protein interaction network, assembled based on genome-wide yeast two-hybrid analysis and other data (75). In this study, the authors were able to examine local interactions and identify previously unrecognized motifs, assign pathway membership to uncharacterized proteins, assign subcellular locations to proteins, derive new links in signal transduction cascades, elucidate intercompartmental and intracompartmental interactions, and predict a mechanism of action for the ortholog of a human gene associated with B-cell lymphoma. An interesting observation was that after organizing the protein interactions according to cellular compartments (nuclear, cytoplasmic, membrane), the authors were able to demonstrate that interactions within compartments were much more frequent than those between compartments (75).
| CARDIOVASCULAR NETWORKS |
|---|
In a similar study, Stoll et al. (15) constructed a map of correlated cardiovascular traits by combining physiological profiles (correlation matrices between "likely determinant phenotypes" of cardiovascular traits) and genetic linkage analysis to unravel potential functional interactions between these traits that were not apparent using linkage analysis alone. Similarly, naturally occurring variation affecting the expression of genes in segregating populations has the potential to establish causal relationships among genes and could be used to construct gene-gene and gene-phenotype interaction networks.
Another network, illustrated in Fig. 7 , represents a "pathway interaction" network. This network was constructed by identifying annotated pathways that contain three or more genes previously implicated in atherosclerosis. In summary, a list of 92 genes (see supplementary table) associated with atherosclerosis was selected (2). The genes in this list either have been shown to affect atherosclerosis through studies in genetically altered animals (transgenic or gene-targeted mice) or have shown evidence of association with atherosclerosis-related traits in multiple population studies. Publicly available annotated biological and metabolic pathways at KEGG (http://www.genome.ad.jp/kegg) and Biocarta (http://www.biocarta.com) were then searched for the presence of these atherosclerosis genes. Each node in Fig. 7 represents a pathway that contains a minimum of three atherosclerosis genes. From the original 92 genes, 39 genes exist in pathways that contain a minimum of 3 atherosclerosis genes. The links between the nodes represent the co-occurrence of a gene (or genes) in two biological pathways. As shown in Fig. 7, there are 16 pathways containing 353 unique genes, several of which overlap in various pathways. This analysis reveals numerous genes for which no known function has previously been associated with atherosclerosis. Thus, these genes should be considered as potential candidates, particularly if they reside at loci identified using linkage analysis.
|
| PROSPECTS |
|---|
The elucidation of networks for atherosclerosis will certainly require genome-wide approaches such as microarray analyses. Because atherosclerosis involves many systemic influences and multiple cell types, cell-based studies will be able to reveal only a subset of the important interactions. Thus, animal models, most likely the mouse, will be central to such studies. The most promising approach at present appears to be the combination of genetics and expression array analyses. The multiple perturbations in genetic crosses should allow the modeling of networks, but the validation of such models will probably require defined perturbations such as knockouts or RNAi-based approaches.
| ACKNOWLEDGMENTS |
|---|
Manuscript received July 16, 2004 and in revised form July 27, 2004.
| REFERENCES |
|---|
<