|
Advertisement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Papers In Press, published online ahead of print January 1, 2008 J. Lipid Res., doi:10.1194/jlr.M700377-JLR200
Journal of Lipid Research, Vol. 49, 183-191, January 2008
The repertoire of desaturases and elongases reveals fatty acid variations in 56 eukaryotic genomes
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ABSTRACT |
|---|
|
|
|---|
Supplementary key words genome bioinformatics lipid histidine box motif analysis phylogenetic analysis evolution
Abbreviations: HMM, hidden Markov model; KEGG, Kyoto Encyclopedia of Genes and Genomes; SCD, stearoyl-coenzyme A desaturase
| INTRODUCTION |
|---|
|
|
|---|
Fatty acid desaturases catalyze the introduction of a double bond into an acyl chain with strict regioselectivity and stereoselectivity. These enzymes can be classified into two phylogenetically unrelated groups: the membrane-bound fatty acid desaturases and the acyl-acyl carrier protein desaturases (10). The former is the dominant enzyme in desaturation and is ubiquitous in eukaryotes and bacteria. Their sequences are characterized by three histidine box motifs containing eight histidine residues (11). The latter is a plant enzyme, which specifically catalyzes the conversion of 18:0 to 18:1 (12). The enzymes that are responsible for the rate-limiting step of acyl chain extension also can be classified into two phylogenetically unrelated groups: the fatty acid elongases and the β-ketoacyl-CoA synthases (13). The former is the dominant enzyme in elongation and is widely distributed in eukaryotes, characterized by one histidine box. The latter is specific for saturated fatty acids or MUFAs, but not for PUFAs, and has only been identified to date in plants (14). Thus, fatty acid structures are generally determined by the combination of two types of enzymes: the membrane-bound fatty acid desaturases and the fatty acid elongases.
Bioinformatics approaches to lipid research have recently begun using large amounts of mass spectrometry and microarray data (15, 16). Phylogenetic analysis of protein families related to fatty acids has also been performed (10, 17). However, to our knowledge, there is no report that describes the investigation of fatty acid structures based on the comprehensive analysis of the gene contents in the genomes. In fact, the prediction of fatty acid structures from genomic information is difficult because of the functional diversity of the key enzymes. For example, the membrane desaturases constitute a highly diversified family including at least 10 different types of regioselectivities, such as
4,
5,
6,
8,
9,
10,
11,
12,
13, and
15, some of whose sequences are significantly close to one another (10). Several reports indicated that the function of experimentally characterized desaturases did not correspond to the annotation from similarity searches (18). It is thus difficult to connect enzymes with fatty acid structures from an analysis based merely on the sequence similarity of individual enzymes.
The resources and the strategy to solve such problems are provided by the Kyoto Encyclopedia of Genes and Genomes (KEGG) project, namely the integration of genomic information and chemical information (19). One successful example is that of functional glycomics, in which glycan structures are related to the repertoire of glycosyltransferases, which synthesize glycan chains in a stepwise manner with distinct substrate specificities (20, 21). Another example of the integrative analysis of genomic and chemical information is the prediction of polyketide and nonribosomal peptide structures, which are also complex natural compounds synthesized by distinct types of syntheses (22). We assume that the application of a similar strategy to fatty acids can allow us to understand how different lipid structures are found among organisms and what the meaning of the difference is. Taking the analysis from genomic information through biological components to phenotypes is a challenge in fatty acids, which have important interactions with an organism's environment.
In this study, we first investigated the diversity of membrane fatty acid desaturases and elongases. Although these enzymes have been classified in previous work, the classifications were not comprehensive and not consistent with each other. Our phylogenetic analysis indicated that desaturases are divided into four functional subfamilies and elongases are divided into two functional subfamilies. Each subfamily has a distinct motif, whose profiles can be used for functional assignments of desaturases and elongases in newly sequenced genomes. In the next step, we examined the ability of a set of organisms to synthesize fatty acids, especially six types of fatty acids widely distributed in nature, from the pathway viewpoint. Our analysis suggests that differences in the repertoires of enzymes as well as functional divergence in each subfamily underlie the fatty acid diversity among organisms. Adaptation to individual environments and the ability to synthesize specific metabolites may provide an explanation for the diversity of enzyme functions and subsequent fatty acid structures.
| METHODS |
|---|
|
|
|---|
Searching for similar sequences with PSI-BLAST
As PSI-BLAST targets, we used amino acid sequences from 56 complete or draft-quality whole eukaryotic genomes, including 21 animals, 20 fungi, 3 plants, and 12 protists. These data were derived from KEGG GENES and DGENES Release 41.0 (http://www.genome.jp/kegg/genes.html). As the query sequences for PSI-BLAST (blastp2.2.10) search, we used experimentally known desaturase and elongase sequences from a wide range of organisms, including Homo sapiens from animals, Saccharomyces cerevisiae from fungi, Arabidopsis thaliana from plants, Trypanosoma brucei from protists, and Bacillus subtilis, Pseudomonas aeruginosa, and Mycobacterium tuberculosis from bacteria (indicated by circles in supplementary Data I). By combining all of the PSI-BLAST results with E values < 0.01 into one file and removing duplicate hits, the initial data set was obtained.
Discarding false-positive sequences
To remove the false-positive sequences from the initial data set, a hierarchical clustering analysis was performed. First, the sequence similarity for each pair of whole sequences was calculated with the SSEARCH 3.4t06 program (25), which is an implementation of the Smith-Waterman algorithm (26). Next, we defined the distance between the sequences as (distance) = 1,000/(Smith-Waterman score). Then, using the distance, a hierarchical cluster was calculated with the complete linkage method of the R program package for statistical computing version 1.7.1 (27) and with the BioRuby library version 1.0 (http://bioruby.org/). The hierarchical cluster tree was separated into clusters with a proper threshold.
We manually checked all of the clusters and subsequently determined false-positive clusters using two criteria: literature information and motif information. If one or more proteins in a cluster were annotated as nondesaturase or nonelongase protein by the literature or database annotation, the cluster was discarded. Then, clusters that did not contain a specific motif were discarded, because all known desaturases and elongases conserved each histidine motif.
Phylogenetic analysis
We used MAFFT version 5.8 (28) to obtain all multiple alignments for phylogenetic and motif analysis. Although phylogenetic trees of the entire desaturases and elongases were calculated with the neighbor-joining method (29) using ClustalW version 1.83 (30), trees of the individual subfamilies were calculated with the Bayesian method using MrBayes version 3.1.2 (31). In the Bayesian method, Markov chain Monte Carlo analysis was performed with 20,000–500,000 generations and four independent chains. The Markov chain was sampled every 100 generations. Both the entire trees and the subfamily trees were reconstructed using conserved regions independent of the cytochrome b5 domain. For the display and manipulation of phylogenetic trees, we used a web-based tool, Interactive Tree Of Life (32).
Motif analysis
We used HMMER version 2.3.2 (http://hmmer.janelia.org/) to build hidden Markov model (HMM) profiles in subfamilies and to search for subfamily motifs. The cytochrome b5 domain was searched with the Pfam profile PF00173 with E values < 0.05 (33). Graphical representations of the conservation patterns of consensus sequences were generated by WebLogo (34).
Microarray analysis
Our microarray data set was derived from expression data of many human tissues provided by the Genomics Institute of the Novartis Research Foundation (35). We found expression data of the genes corresponding to five desaturases and three elongases of the human enzymes obtained in this study. Expressed genes were determined by Affymetrix MAS5 Absent/Present calls.
| RESULTS |
|---|
|
|
|---|
4 desaturases. All subfamilies contain sequences belonging to animals, fungi, plants, and protists; they probably diverged early. To clarify the difference in functions between subfamilies, we discuss them below.
The predominant members in the First Desaturase subfamily were stearoyl-coenzyme A desaturases (SCDs), which generally introduce a double bond to the
9 position of palmitic acid (16:0) or stearic acid (18:0). All experimentally confirmed SCDs were detected and classified into this subfamily, including two in H. sapiens (36, 37) and four in Mus musculus (38).
The Omega Desaturase subfamily contained 13 known desaturases whose functions were
12 or
15 desaturases. The two functions did not form two distinct branches but fell in each of four branches representing the four kingdoms: animals, fungi, plants, and protists. Additional phylogenetic analysis of this subfamily indicated that these functions independently diverged in each lineage (see supplementary Data II-2). For example, in the animal kingdom, the
12 and
15 sequences of Caenorhabditis elegans diverged after C. elegans separated from other animal species. This is supported by high posterior probabilities. In the same way, the two functions diverged in the fungi kingdom after it separated from the others. Plants also obtained both
12 and
15 desaturases at different duplication points.
The Front-End Desaturase subfamily also comprised desaturases whose substrate is unsaturated acyl chains. The difference from the Omega Desaturase subfamily is the position of the double bond, which in this case is introduced between an existing double bond and the carboxyl end. This subfamily included
4,
5,
6, and bifunctional
6/sphingolipid
8 desaturases. Similar to Omega Desaturases, additional phylogenetic analysis suggested with high probability that the
5 and
6 desaturases of nematodes, vertebrates, and others diverged separately in each lineage (see supplementary Data II-3Q1).
The last group is the Sphingolipid Desaturase subfamily, whose sole function is the sphingolipid
4 desaturase. Previous research has already indicated that these sequences form a distinct subfamily (39), and our results supported this conclusion.
Elongases consist of two functionally distinct subfamilies
We obtained 265 elongase homologs from 56 eukaryotic genomes (listed in supplementary Data I-2). The phylogenetic tree of elongase sequences was roughly separated into two branches (see supplementary Data II-4). Each branch was defined as follows: a) S/MUFA Elongase, elongating a saturated fatty acid or a MUFA; or b) PUFA Elongase, elongating a polyunsaturated fatty acid.
The S/MUFA Elongase subfamily contained 11 experimentally known elongases. Six of them elongate saturated fatty acids. Four of them elongate both saturated fatty acids and MUFAs. There is one exception, whose function is to elongate 18:2 (40). A notable feature of this subfamily is the various specificities for the length of the acyl chain. For example, EVOLV6 in M. musculus elongates C12–16 (41), whereas S. cerevisiae has three different enzymes whose substrate specificities are C14–16, C14–24, and C14–26 (42, 43). In addition, recent research evaluated three elongases in T. brucei whose substrate specificities were found to be C4–10, C10–14, and C14–18 (44).
Sequences in the PUFA Elongase subfamily are composed of animals and protists. No sequences were detected in fungi and plants. Vertebrates have the most paralogs; for example, human and mouse have five paralogs, some of which have been experimentally confirmed to be involved in the elongation of PUFAs. Four of the seven sequences detected in protists have also proved to be elongases of PUFAs by a recent study (45). This subfamily also has one exception, which has been characterized as an elongase for short MUFAs in Drosophila melanogaster (46).
Several amino acids are clearly different in subfamilies
As described above, desaturases were divided into four subfamilies and elongases were divided into two subfamilies. In this section, first, we describe the difference in amino acids between subfamilies. Next, we show that HMM profiles constructed for each subfamily can classify test sequences into appropriate subfamilies.
Figure 1A
shows sequence logos of three histidine boxes in desaturase subfamilies, which were clearly different even in conserved regions. The first histidine box including two histidines is located in the N-terminal region (Fig. 1A, 1st). There were three amino acids between the histidines in three subfamilies, whereas four amino acids existed between them in the First Desaturase subfamily. The second histidine box including three histidines is positioned
30 amino acids downstream of the first one (Fig. 1A, 2nd). In this region, the number of amino acids between the histidines was different only in the Front-End Desaturase subfamily. In addition, a strongly conserved arginine was observed in the First Desaturase subfamily. The last histidine box is located in the C-terminal region far from the others (Fig. 1A, 3rd). As is often reported, in many sequences the first histidine changed to glutamine in the Front-End Desaturase subfamily. The second amino acid also has clear difference between subfamilies. It is strongly conserved as asparagine in First Desaturases and as valine in Omega Desaturases. Figure 1B shows one histidine box and the surrounding region conserved in fatty acid elongases. Several different amino acids were found, such as leucine in the S/MUFA subfamily and glutamine in the PUFA subfamily.
|
200 amino acids for all of the sequences in each subfamily. To validate these profiles, we calculated the P values of test sequences against the profiles. The test sequences were 30 known desaturases and 17 elongases derived from a wide range of organisms, such as Thalassiosira pseudonana, Spodoptera littoralis, and Mortierella alpina. The sequences were not used in calculating the profiles, because our data set only consisted of nearly complete genomes. Figure 2
summarizes the P values, indicating that the profiles can clearly classify desaturases into appropriate subfamilies, namely,
9 into First Desaturase,
12 and
15 into Omega Desaturase, and
4,
5,
6, and
8 into Front-End Desaturase. The profiles of elongases also distinguished test sequences into two elongase subfamilies, although not as clearly as the desaturase case (see supplementary Data II-5).
|
-linolenic acid (18:3), arachidonic acid (20:4), eicosapentaenoic acid (EPA; 20:5), and docosahexaenoic acid (DHA; 22:6), which are widely distributed in the biological membrane. The repertoires of the subfamilies and predicted fatty acids in 56 eukaryotic genomes are summarized in supplementary Data II-6. The repertoires of desaturases and elongases in organisms show great diversity, which yields the diversity of fatty acids observed. We present an overview of the prediction results using the simplified pathway illustrated in Fig. 3A . There are three reaction processes catalyzed by the distinct functional subfamilies in the pathway. First, through process 1 (Fig. 3A), oleic acid (18:1) is synthesized from stearic acid (18:0) by First Desaturase introducing the first double bond. We predicted that organisms that possessed First Desaturases could synthesize oleic acids.
|
-linolenic acid (18:3) are synthesized from oleic acid (18:1) by
12 and
15 desaturases. Because they were categorized in the Omega Desaturase subfamily, we determined that the process would be present in many fungi and plants that possess the subfamily, in contrast to vertebrates, which require such fatty acids as a diet to survive. These results agree with the fact that plasma membranes in such organisms are abundant in linoleic acid (18:2) and
-linolenic acid (18:3) (47, 48). Process 3 (Fig. 3A) is a complicated step involving two distinct pathways and plural subfamilies (details mentioned in Discussion). Front-End Desaturases and PUFA Elongases dominate the process and cooperatively synthesize arachidonic acid (20:4), EPA (20:5), and DHA (22:6). Therefore, we determined that organisms can perform the conversion if they possess both of the subfamilies. As a result, sea urchin and trypanosomatids had this pathway as well as vertebrates.
| DISCUSSION |
|---|
|
|
|---|
12 and
15 in the Omega Desaturase subfamilies and
4,
5,
6, and
8 in the Front-End Desaturase subfamilies, within the individual subfamilies in the second phase. We discuss the history of functional diversification of desaturases and elongases through the two phases. In the first phase, desaturases and elongases diverged into four and two subfamilies, which had different ranges of substrate specificities. They can still be distinguished based on sequence similarities, because different amino acids are conserved (Fig. 1). For example, arginine residues in the first and second histidine boxes are conserved in the sequences of >95% of First Desaturases, whereas the other subfamilies possess few arginines in this region. Such specific residues could relate to the differences in functions. To elucidate the functions of such residues, further experiments, such as to determine the three-dimensional structures, are required.
In the second phase, we believe that a variety of substrate specificities or regioselectivities diverged within the individual subfamilies. Recent reports also show the independent divergence of
12 and
15 in several organisms, such as Mortierella alpina and Saprolegnia diclina (49, 50). It should be noted that the first diversification occurred in the old common ancestor, whereas the second occurred independently in each lineage. The second diversification likely causes the difference in fatty acids between even closely related organisms. In other words, the first phase restricted the range of functions and then, in the second phase, functions diverged within their ranges. However, some exceptional functions beyond the ranges of subfamilies were detected in elongases, suggesting that the functional constraints of the subfamilies are not always effective. In particular, elongases seem to be more flexible about substrate specificities.
Two pathways converting EPA to DHA are the consequences of independent divergence in the second phase. Figure 3B, pathway 1 shows the Sprecher pathway, which contains four steps: two consecutive elongation steps from EPA to 24:5, the
6 desaturation to 24:6, and the subsequent β-oxidation from 24:6 to 22:6 (51). The C20 and C22 elongases that convert EPA to 24:5 are key enzymes in this pathway, because the latter two steps are catalyzed by reused enzymes that also serve in other pathways. In our phylogenetic analysis (see supplementary Data II-7), it was found that many animals had multiple copies of elongases, as for the PUFA Elongase subfamily. In particular, vertebrates have five distinct branches, some of which include elongases characterized as C18, C20, and C22 elongases (52, 53). Another pathway for DHA, identified in lower eukaryotes, is simpler, consisting of two steps: an elongation from EPA to DPA, and the
4 desaturation to DHA (Fig. 3B, pathway 2) (54). This pathway was only detected in three trypanosomatids. We conclude that DHA can be synthesized by vertebrates and trypanosomatids, each of which acquired different enzymes for the extension of the pathway to DHA in the second phase diversification.
Fatty acids as an adaptation to environments and precursors of metabolites
Expansions and contractions of pathways caused by the diversity of desaturases and elongases lead to a considerable diversity in fatty acids among organisms. What roles do such a variety of fatty acids play in each organism? With respect to the adaptation to individual environments, one interesting example is T. brucei, a human parasite that causes sleeping sickness. When this parasite invades the human blood from the tsetse fly, it rapidly replaces fatty acids in the plasma membrane with myristic acid to evade the host's immune response (55). A recent report identified three elongases for a novel type of fatty acid synthesis, each of which has different conversion ranges, such as C4–10, C10–14, and C14–18 (44). This remarkable finding suggested that the parasite can readily produce stage-specific fatty acids by regulating the expression of elongases. Our phylogenetic analysis of S/MUFA Elongase indicates that these three elongases fall into a single branch comprising exclusively trypanosomatid sequences apart from other protists. The branch then separates into three subbranches with different substrate specificities (Fig. 4
; see supplementary Data II-8). Hence, these genes have probably been acquired in relatively recent duplication events and have subsequently mutated into different substrate specificities. Other trypanosomatids, Trypanosoma cruzi and Leishmania major, also have more S/MUFA elongases, which have arisen in a similar manner to T. brucei. They are intracellular parasites unlike T. brucei and use stage-specific fatty acids in their reproductive cycles [details of the lipid biology of trypanosomatids were reviewed recently (56)]. Hence, such paralogs are likely to have specificities corresponding to their life stages. Their characterization will lead to improved understanding of the lipid biology of parasites.
|
From genomic information to chemical structures
Mapping microarray data to pathways is another potential use of our results (21). The human pathway for the biosynthesis of unsaturated fatty acids is divided in two, MUFA and PUFA, because of the lack of the Omega Desaturase subfamily. MUFAs are synthesized by SCDs, including two paralogs, which are expressed in a tissue-specific manner (Fig. 5A
). For example, SCD2 is expressed exclusively in pancreas and kidney. Previous reports also suggested that SCD1 was expressed in liver, muscle, and other tissues (65) and SCD2 was abundantly expressed in adult brain and pancreas (37). For PUFA biosynthesis, both Front-End Desaturases and PUFA Elongases are required. Their expressions were also different across tissues, as expected (Fig. 5B). A previous study indicated that the proportion of PUFAs including three or more double bonds increased in patients with acute lymphoblastic leukemia (66). These results suggest that genes encoding desaturases and elongases are strictly regulated according to tissues or cell types to control the composition of fatty acids in membranes. Similar analysis can be performed with all of the complete genomes. Furthermore, analysis of other enzymes would enable us to predict whole lipid structures, including head groups.
|
| ACKNOWLEDGMENTS |
|---|
Manuscript received August 23, 2007 and in revised form September 28, 2007.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. Fang, C.-T. Ting, C.-R. Lee, K.-H. Chu, C.-C. Wang, and S.-C. Tsaur Molecular Evolution and Functional Diversification of Fatty Acid Desaturases after Recurrent Gene Duplication in Drosophila Mol. Biol. Evol., July 1, 2009; 26(7): 1447 - 1456. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. V. Michaelson, S. Zauner, J. E. Markham, R. P. Haslam, R. Desikan, S. Mugford, S. Albrecht, D. Warnecke, P. Sperling, E. Heinz, et al. Functional Characterization of a Higher Plant Sphingolipid {Delta}4-Desaturase: Defining the Role of Sphingosine and Sphingosine-1-Phosphate in Arabidopsis Plant Physiology, January 1, 2009; 149(1): 487 - 498. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, M. Torres-Gonzalez, S. Tripathy, D. Botolin, B. Christian, and D. B. Jump Elevated hepatic fatty acid elongase-5 activity affects multiple pathways controlling hepatic lipid and carbohydrate composition J. Lipid Res., July 1, 2008; 49(7): 1538 - 1552. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Molecular and Cellular Proteomics | ASBMB Today |