|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Journal of Lipid Research, Vol. 48, 1-8, January 2007
Copyright © 2007 by American Society for Biochemistry and Molecular Biology
Thematic Review |
,
* Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA
Department of Physiology, University of California Los Angeles, Los Angeles, CA
Department of Medicine, University of California Los Angeles, Los Angeles, CA
Published, JLR Papers in Press, October 25, 2006.
1 To whom correspondence should be addressed. e-mail: tdrake{at}mednet.ucla.edu
| ABSTRACT |
|---|
|
|
|---|
Supplementary key words biomarker mass spectrometry network
| INTRODUCTION |
|---|
|
|
|---|
As a category of "omics" molecules, proteins play a central role in a systems view of biologic processes. Proteins make up a major portion of cell structural molecules and are the effector molecules for all cell functions, either directly or indirectly through their actions on metabolites and other nonprotein substrates. The complexity of the proteome far exceeds that of the genome and transcriptome as a result of a range of posttranslational modifications and interactions with other molecules. Subcellular compartmentalization and a potential multiplicity of interaction partners expand this further.
The complexity of the proteome and our current inability to globally interrogate it to the extent we can the transcriptome significantly limits the scope of current proteomics studies. In contrast to the transcriptome, in which the chemistry of nucleic acids allows a common technological approach that is capable of approaching global assessment, the technologies used for proteomic investigations are numerous and varied, often focused on a specific protein subset. Any one "high-throughput" method typically detects on the order of a few thousand proteins at most, and generally far fewer. Because these technological issues frame the questions that can be asked and the data collected, we will provide a summary of methodologies at the start (Fig. 1 ).
|
| TOOLS USED IN PROTEOMICS |
|---|
|
|
|---|
10,000, proteins are typically hydrolyzed to peptides before analysis. Liquid chromatography combined with tandem mass spectroscopy (LC-MS/MS) has the capability to capture ions at a particular m/z and perform sequential MS analysis of fragmented products of these ions, which allows for peptide identification. Other protein separation methodologies typically use mass spectroscopy as a final step for identification. There are many variations on instrumentation and sample preparation and their use in various applications. A number of recent reviews discuss these, and we will describe some in the context of their use below (15). Two-dimensional gel electrophoresis has been used for many years for the separation of intact proteins. Samples typically undergo sequential electrophoresis, first using isoelectric focusing to separate by charge, followed by polyacrylamide gel electrophoresis to separate by size. Separated proteins are visualized by imaging after staining with fluorescent dye. Prelabeling of samples with dye allows the use of several different dyes for different samples and simultaneous analyses of several samples on one gel (6). This allows for internal controls and more accurate quantitation. The proteins composing the resulting "spots" can be identified by physically removing a plug of the gel containing the protein, hydrolyzing it with trypsin, and then performing tandem MS as described above. For a given protein, multiple spots are often visualized, reflecting charge differences attributable to posttranslational modifications. Resolution and sensitivity are improved for complex samples by prior fractionation or by removal of high-abundance proteins, such as albumin and immunoglobulins, when analyzing plasma.
Protein microarray refers to a general platform for capturing proteins by immobilizing potential binding partners on a solid surface, followed by detection and identification of the bound protein (7). Binding partners are typically predefined and could be DNA that includes potential transcription factor binding sequences, an array of antibodies for a designated set of targets, or an array of proteins, etc. Miniaturization of platforms for efficient high-throughput use is an active area of development. Similarly, microfluidic in-solution assay systems are being developed that allow for multianalyte high-throughput use.
Yeast two-hybrid systems assess possible paired protein-protein interactions on a large scale, sometimes referred to as the "interactome" (8). These use specialized yeast libraries carrying cloned protein coding open reading frames (ORFs) that are expressed as fusion proteins with transcription factor elements whose activity is necessary and in principle occurs only when there is physical interaction between the two expressed ORFs. The identity of the ORF is determined by sequencing of PCR amplification products of the ORFs of positive sets. These are prone to significant rates of false-positive and false-negative results in any one experiment. Methodological variations on the fusion protein elements and the mechanism of selection of positively interacting clones have been developed.
Antibody development is an important enabling tool set being developed that can be applied in a number of different settings. The ultimate goal is to generate antibody reagents available for all mammalian proteins. Their applications in conjunction with protein microarrays, imaging technologies, and other biochemical tools will enable the illustration of the mammalian protein atlas in individual tissues/organs. In addition, they may be used in conjunction with tissue microarrays to define patterns of cellular expression for signaling networks and pathways, with two-dimensional gel electrophoresis for protein identifications, and as a prefractionation method (e.g., immunoaffinity-based purifications) for subsequent MS analyses.
Computational tools for proteomic sciences contain at least two major elements: one is bioinformatic analyses of cellular proteins, and the second is the construction of protein databases available to the public domain (912). The development of both of these tools in proteomics has benefited from achievements made in genomic research. For example, in most uncharacterized proteins, putative amino acid sequences may be predicted on the basis of their DNA data. Leader peptide sequences, if present, may suggest what compartment the protein likely resides in, such as mitochondria, or secretion, and various domain motifs can be indicative of transmembrane proteins or DNA binding functions, for example (13). Comparative analyses across species for conserved proteins may also be helpful. There is an increased consensus in the proteomic community for protein data standardization; this effort has gained support and input from investigators, proteomic instrumentation vendors, journals, and funding agencies. Protein databases designated for protein data archiving and analyses are made available to the public and can be readily used for comparison analyses among different species, organs, organelles, cellular networks, and pathways.
| QUESTIONS PROTEOMICS CAN ANSWER |
|---|
|
|
|---|
Addressing the exploratory question "What proteins are present in a biologic compartment?" has been and continues to be a major focus of proteomic study. The starting point of any systems-based approach is the definition of the "parts list," and from a global perspective, this is far from complete for any tissue or disease. Tissue or cell samples may be analyzed in toto or undergo varying levels of cellular or subcellular fractionation before proteomic analysis (14). In the latter situation, investigators are often interested in a specific compartment or organelle, such as membrane proteins or mitochondria. When specific fractions are studied, it is important to recognize that "pure" fractions are impossible to obtain and that additional steps are necessary to validate localization, such as immunolocalization. Because this can usually be done on only a subset of proteins, there are always some false-positives in any given study.
Data from comprehensive analyses of all subcellular fractions are limited, but one such study published recently is informative as an example of protein profiling of multiple fractions and organelles (15). It demonstrates the extent of analysis involved for just one tissue and condition and is also representative of a common MS methodology used, LC-MS/MS. In preparatory steps, mouse liver homogenates underwent separation and gradient centrifugation, yielding a total of 32 fractions. Each fraction was digested to completion with trypsin before liquid chromatographic fractionation. Each LC fraction underwent MS using a linear ion trap/Fourier transform hybrid mass spectrometer (an instrument with very high resolution), and the five most abundant peaks from each spectra underwent fragmentation and MS for identification. More than 22,000 peptides were identified, representing just <2,200 unique proteins. Approximately 1,400 of these were able to be localized to one or more of 10 compartments using an analytical approach termed protein correlation profiles, which uses marker proteins to help assign compartment membership. These ranged from 50 in proteasomes to >300 in recycling endosomes. The extent of effort involved (mostly analysis time) was such that most data were obtained from one liver, emphasizing that detailed proteomic analyses remain extremely time- and effort-intensive relative to gene expression analyses and sample a smaller fraction of the total proteins expected to be present. Nevertheless, data from a number of studies using various methodologies have accumulated concerning the protein complement of various tissues and organelles relevant to cardiovascular disease, including studies of cultured endothelial and smooth muscle cells, as reviewed recently (16, 17). The plasma lipoproteins can also be considered a compartment of sorts, and several recent studies have analyzed these using the methods described above (1821).
A caveat for this and subsequent discussion is that tissue samples are always complex with regard to the cell types present. Technical considerations preclude analysis of single cells, except by immunostaining methods. One cell type may compose the bulk of tissue, as in myocardium or liver, but in pathologic settings significant numbers of inflammatory cells may be present or there may be a change in the normal proportion of parenchymal to stromal cells (e.g., cardiomyocytes to fibroblasts). The analysis of atherosclerotic lesion material is perhaps an extreme example of this (22). The use of isolated or cultured cells addresses the complex cell type issue but raises other concerns, such as relevance to the in vivo setting. An intriguing approach that may find application in complex tissues such as the vessel wall is the use of MS to obtain spectra directly from intact tissue sections, termed "tissue profiling" (23).
A second important question is "What proteins change in concentration in response to physiological or pathological perturbations?" The ability to make sensitive quantitative measurements is fundamental to the essential nature of systems biology investigation, which is to assess system elements over a range of perturbations. Unfortunately, most standard undirected proteomic methods are inherently semiquantitative at best. Although abundance differences look obvious in a given gel or MS spectrum, obtaining accurate measures over multiple samples analyzed at different times is difficult because of numerous run-to-run sample preparation and technical variations. A sense of protein abundance can be gained simply from the frequency of detecting a given protein in a series of samples. High-abundance proteins will be identified many more times than low-abundance proteins. However, significant effort has been made to develop truly quantitative approaches, and these are available. Most of these assess relative abundance between test and control samples using differential labeling, but one recent method allows absolute quantitation.
When using two-dimensional gel methods, normalization for technical differences influencing spot location and intensity across multiple gels is the major problem for accurate quantitation. Reasonable quantitation has been obtained by careful attention to conditions and using normalization methods based on total spot intensities. However, better relative quantitation can be achieved using the differential in-gel expression (DIGE) technique (6). DIGE addresses this by creating a control composed of a pool of aliquots from all of the experimental samples that is run with every gel. Different fluorescent dyes are used to label the proteins in test and control samples, typically a Cy3 or Cy5 label for each individual test and control sample and Cy2 for the pooled sample. In a given gel, for each spot the results of the test samples are expressed as a ratio to the pool. Coefficients of variation of 10% or less can be obtained with this method. A related method described recently is the intact protein analysis system (IPAS), in which paired samples are labeled with Cy3 or Cy5 and then subjected to extensive fractionation based on charge (by isoelectric focusing), hydrophobicity (reverse-phase HPLC), and mass (SDS-PAGE) (24, 25).
Among MS approaches, direct quantitation of peak intensities can be used when carefully performed to minimize sample handling differences and including standards and controls for instrument calibration and normalization. However, more accurate quantitation is achievable using differential sample labeling methods. For MS analyses, differential labeling yields peaks with predictable differences in m/z ratio, allowing one to obtain a ratio of experimental to control intensity for any given peak. Such labels that change peptide mass can be applied exogenously, as in the Isotope Coded Affinity Technology method, or endogenously by providing a heavy isotope in culture medium or diet, which becomes incorporated into proteins (2628). Reagents for labeling are relatively expensive, and additional analytical steps are needed to match paired peaks for data analysis. Recently, a method was developed that allows absolute quantitation using synthesized labeled peptides (29, 30). These are added to test samples in known amounts at an early stage of sample separation and therefore are subjected to the same processing and technical variations. Because an absolutely defined amount has been added to a sample, the ratio of test to control peptide intensity can provide an absolute quantitation. Again, the reagents are expensive, and one needs to determine beforehand which proteins one wishes to measure and prepare reagents for each.
Answering "Which proteins interact physically with one another?" is a central aim of much proteomics research. Although important relationships may exist between proteins in the absence of direct physical interactions, the converse is not likely to be true. Most important cellular processes occur with at least the transient association of multiple proteins, and in many instances associations are quite stable and required for function, such as the majority of the electron transport chain in mitochondria or the structural molecules composing organelles.
There are two general approaches for determining protein-protein interactions. A "local" approach is centered on an individual protein of interest as a "capture" protein and isolates the complex of proteins that are physically attached to that. The complex of proteins is then analyzed by MS, and the individual proteins present are identified. One can then extend this local set of interacting proteins by repeating the process using one of the newly identified proteins as the capture protein. Through repetition of this sequence, a network of interactions can be constructed. The strength of this approach is that in most configurations it examines protein interactions as they occur in the cell. However, it should be noted that although such constructs are studied in living cells, their level of expression may be outside the physiologic range and thus subject to various biases. A variety of methods and configurations have been developed for this approach (3133). For example, some use expression of transfected constructs containing the primary protein as a fusion with elements that facilitate its capture, whereas others evaluate endogenous proteins by treating cells or tissue to cause cross-linking, followed by pull-down with antibodies. We present examples of this approach from our own work below.
A "global" approach is represented by yeast two-hybrid analyses, in which all possible pairs of interacting proteins are examined. By the nature of the system, the protein interactions detected are not taking place in their natural setting within a cell, so a positive finding indicates the potential for interaction but not proof that the interaction occurs in a natural environment. Dynamic changes that might occur in varying physiological or disease states cannot be assessed. These approaches are also subject to relatively high false-negative and false-positive results in any one experiment, but the finding of consistent results over multiple experiments reduces these. The usefulness of this type of approach is that it examines a much larger set of proteins than is possible otherwise and suggests potential interactions among sets of proteins that can be validated by other means. A recent article reports such an analysis that is the largest to date of human proteins. It used a set that included
7,200 distinct human protein-encoding genes and identified
2,800 high-confidence interacting pairs, almost 80% of which in sampling could be validated by alternative techniques (34).
Assessing "What is the functional state of a protein?" is a major challenge. In the absence of a direct functional assay, detection of several types of common structural modifications to proteins can be used to infer functional differences. The best known is phosphorylation, which may confer enzymatic or binding activity to an otherwise unreactive protein. Modifications made by the specific addition of ubiquitin or related small molecules such as SUMO (small ubiquitin-related modifier) also confer altered protein properties, and are highly regulated in cells. Ubiquitination targets proteins for degradation while SUMOylation affects subcellular localization and stabilization of proteins. Glycosylation is a common posttranslational modification but does not have such specific functional implications. These posttranslational modifications occur at selective sites on proteins and confer an altered mass; they also establish a new epitope. The latter has been used to develop antibody reagents that can distinguish phosphorylated from nonphosphorylated states for a number of important proteins, especially those involved in signaling pathways. These have been configured into array formats for simultaneously examining many at one time. MS can be used in an undirected manner to identify phosphorylated proteins (phosphoproteins). For a recent example, see Wolf-Yadlin et al. (35). This is done in conjunction with preanalysis steps to enrich for these proteins. The most well established is immobilized metal affinity chromatography. Trypsinized homogenates of cells or tissue are applied, and the metal chelating surface selectively binds phosphorylated peptides from the digest, which then can be identified by MS. Ubiquitinated peptides from a cell digest can be enriched for using immunoaffinity chromatography with antibodies to ubiquitin, or in systems in which a recombinantly modified ubiquitin is expressed wherein a capture tag has been added. MS is then used to identify the peptides and verify the presence of the ubiquitin modification.
Another approach toward assessing the functional states of proteins relates to active versus inactive enzymes, in which the inactive form may exist as a zymogen or as an intact protein bound by an inhibitor. These may not be distinguished by the standard methods described above. An approach referred to as "activity-based protein profiling" uses active site-directed probes to capture the active enzymes, but not the inactive or inhibited forms, for identification by MS or other means (3638).
"How can proteomic studies advance cardiovascular biology and inform clinical medicine?" is a focus of those interested in using this set of tools to advance the fundamental knowledge of biology and/or to identify potential "biomarkers" that may be indicative of a diseased phenotype.
Advancing our understanding of biology and disease by any of the above means benefits medicine. However, finding proteins that reflect health/disease state or risk and are present in noninvasively accessible samples from people is a special challenge. Biomarkers may be sought in a directed manner based on the kinds of studies described above in cells or tissue, followed by developing the means to assess their presence in blood, urine, saliva, or other body fluid, typically by immunoassay. The alternative approach is to directly analyze such samples by MS in an undirected search, and this is an active area of proteomics research.
Work in this area falls into two main categories, which are not mutually exclusive. The first encompasses studies to systematically define the proteomes of the different compartments, and the second encompasses studies searching for biomarkers for specific diseases. The plasma proteome has received the most attention, as it contains proteins originating from any part of the body that find their way into the circulation. The bulk of proteins are secreted, many from the liver, but nonsecreted proteins arising from cell "leakage" or death, matrix turnover, or pathologic processes are also present. They may be intact or proteolytic fragments of proteins and may circulate free or bound to carriers such as albumin or lipoprotein particles. These cover an extremely wide range of concentrations, estimated to be as much as 1010, which exceeds that of cells and tissues directly. The complexity and dynamic range makes defining the plasma proteome a challenge (39). The Human Proteome Organization (HUPO) has supported other research efforts in coordinating a systematic analysis of the plasma proteome by a number of laboratories using a range of techniques, as recently described and reported (4042). A composite list of reliable identifications has been compiled and is publicly available. This identifies
3,000 proteins. The coordinated effort also led to very useful data concerning the technical aspects of the proteomic analysis of plasma (43). We discuss aspects of this project specifically relevant to cardiovascular disease below.
In the second category, much of the work has been directed at cancer biomarker discovery (4446). All of the techniques described above have been used. To define a proteome, a thorough analysis that identifies as many proteins as possible is desired, and relatively few individual samples need to be analyzed. This approach can also be taken for biomarker discovery, making the assumption that a limited number of subjects from disease and reference populations are adequately representative, because such thorough analyses are extremely time- and effort-consuming. An alternative approach that has been applied trades thoroughness for high throughput (4749). Data are analyzed for correlation with disease state, and identification of the proteins these represent is deferred until such an association is found. However, substantial effort is needed for the identification of peaks of interest, quantitation may be problematic, peak resolution is poor with low-resolution instruments, and the sensitivity is low for low-abundance proteins. Because proteins with molecular weight > 20,000 are not well detected, gel-based methods (DIGE or IPAS, described above) can provide complementary data, but these are not high throughput. Zhang and colleagues (50) have described a high-throughput quantitative approach using LC-MS/MS that is relatively sensitive. The success of these screening approaches has been debated, and newer ways of approaching biomarker discovery are clearly needed (44).
In either situation, it is important that identified biomarkers be understood at the pathophysiological level. Likewise, proteomic-based discoveries at this level can serve as starting points for biomarker discovery. Recent investigations using proteomic tools have offered novel information of cellular organelles in cardiovascular tissues that would not have been made available in such a timely manner with conventional biochemical approaches (16, 17). These biological advancements are made in vascular endothelial cells and in cardiac cells. Detailed delineation of organelle proteins coupled with convincing target validation leads to a full characterization of vascular endothelium-based caveolae, cardiac proteasome complexes, cardiac mitochondria, and cardiac contractile apparatus (reviewed in 16). The illustration of a protein atlas in these organelles will provide the much needed fundamental data essential for future investigations aimed at exploring their functional roles in cardiovascular diseases.
A last point about clinical biomarker studies is that consistency of sample collection techniques, processing, and storage is critical for reliable results to be obtained. This can be challenging in the clinical setting, and the samples collected in the course of the usual clinical study typically are inadequate for proteomic analyses when the latter were not considered at the start.
| PROTEOMICS AND SYSTEMS BIOLOGY |
|---|
|
|
|---|
Protein interaction networks are scale-free
Systems biology attempts to characterize the structure and dynamics of interactions among elements of cells, tissue, and organisms. As presented in the introduction to this series and elaborated on by earlier reviews, biological systems are networks of interacting elements, and overall, these have a scale-free structure. Analyses based on "global" proteomic pair-wise interaction data obtained from various organisms have contributed significantly to the apparent universality of this observation (34, 51, 52). These networks are constructed with the proteins being the "nodes" and an observation of interaction being an "edge." Although not of apparent immediate relevance to cardiovascular disease per se, this general observation has important implications. The scale-free nature of the protein interaction network indicates that a limited number of proteins have a large number of interactions and function as "hubs." In general, hubs in biological networks are critical to overall network organization and function. As with gene expression-based networks, hub proteins are more frequently essential to organism survival than are nonhub proteins. A second characteristic of such networks is that they are composed of sets of "modules," groups of proteins that have significant numbers of within-group interactions compared with interactions outside the group. As physical interaction implies a functional consequence, such modules suggest that a particular group of proteins interact for a common function (or set of functions). Thus, protein interaction networks provide an indication of which proteins are more likely to be critical for overall functioning and indicate groups of proteins sharing common functions. Such information can guide more specific studies, as discussed below.
Proteomic studies are critical to elucidating functional modules
A systems-oriented paraphrasing of the saying "all politics is local" would be "all biology is local." Regardless of the high-level overall functioning of a cell, tissue, or organism, the nuts-and-bolts work happens at a local level among a set of closely interacting molecules. It is the summation and interactions of the effects of these smaller units that determine the direction of the entity as a whole, with the potential for developing the emergent properties of the larger system, as described by Weiss et al. (52a) in an earlier review in this series. One can envision the overall network as the highest level, or most abstract, view, with progressively smaller subsets of interacting molecules as modules, each with subnetworks that are dedicated to a specific functional process. From this perspective, the elucidation of these dedicated interacting subnetworks is key to defining the basic functional biochemical units of cells and tissues.
Our work in elucidating the signaling modules essential for protection of the heart against myocardial ischemic injury is illustrative in this regard (53, 54). In this line of investigation, a specific signaling kinase, protein kinase C
(PKC
), was first characterized to be a key signaling node in cardiac cell cytoprotection. Evidence from a number of laboratories document that the activation of PKC
is a shared common signaling event in cytoprotection of the heart in species such as mouse, rat, rabbit, and human. In subsequent analyses using a functional proteomic approach, we identified a pool of candidate proteins that may serve as key associating partners of PKC
; these proteins form functional multiprotein complexes with PKC
. Using a combined genetic and proteomic approach, we identified several signaling modules (e.g., PKC
-Lck, PKC
-Akt, and PKC
-Bmx) as key signaling complexes leading to the manifestation of a cardioprotective phenotype.
| PROTEOMICS-BASED STRATEGIES FOR DISEASE BIOMARKER DETECTION ARE BASED ON A SYSTEMS BIOLOGY PERSPECTIVE |
|---|
|
|
|---|
A second way that perceptions have changed is in the appreciation of how very limited our current set of analytes is relative to the number possible (55). Work toward defining the plasma proteome has increased the number of known proteins by more than an order of magnitude, and it still has a long way to go. We have analyzed the data obtained from the cooperative HUPO Plasma Proteome Project for proteins of likely relevance to cardiovascular disease (56). Both cardiac tissue-born proteins (released into the plasma) and plasma-born proteins are included in this annotation. These proteins may be categorized as in distinct functional groups, including molecular markers of inflammation and/or cardiovascular disease, vascular and coagulation, signaling, growth and differentiation, cytoskeletal, transcription factors, channels/receptors, heart failure, and remodeling. Importantly, our analyses of the peptide per protein ratio for LC-MS/MS identifications display group-specific trends, corroborating the functional classification of these plasma proteins. The incorporation of proteomic analyses into clinical trials, as described for the CardioGene Study (57), will greatly expand our knowledge in this area in the long run.
| CONCLUSIONS AND FUTURE DIRECTIONS |
|---|
|
|
|---|
Technological advances will undoubtedly be important drivers in the future, as they have been. Most proteomics studies are labor- and time-intensive, requiring a high level of expertise, and are relatively expensive, all of which limit the widespread use of these approaches. With time, the technologies can be expected to become more accessible, as happened with gene expression microarray technologies. However, as these become more and more used, a key issue of proteomic investigation that requires attention is that these unbiased analyses often only provide a way to a narrowed search: they may not lead to a definitive identification of the true targets. It cannot be overemphasized that subsequent studies using biochemical and functional tools to verify the proteins characterized through proteomic studies are of critical importance and are the integral part of successful proteomic investigations.
| ACKNOWLEDGMENTS |
|---|
Manuscript received October 11, 2006
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. C. Smith, E. C. Smith, M. L. Gilman, J. L. Anderson, and R. L. Taylor Jr. Differentially Expressed Soluble Proteins in Aortic Cells from Atherosclerosis-Susceptible and Resistant Pigeons Poult. Sci., July 1, 2008; 87(7): 1328 - 1334. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||