|
|
||||||||
Papers In Press, published online ahead of print December 1, 2007
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Journal of Lipid Research, Vol. 48, 2736-2750, December 2007
Copyright © 2007 by American Society for Biochemistry and Molecular Biology




* Kennedy Krieger Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21205
Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205
Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205
The online version of this article (available at http://www.jlr.org) contains supplementary data in the form of 3 Tables. ![]()
Published, JLR Papers in Press, August 30, 2007.
2 In our previous publications, motifs were designated by Arabic numerals (Motifs 1 and 2) (11, 61, 62). Because this work has led to the refinement of consensus sequences, motifs are now designated by Roman numerals. ![]()
1 To whom correspondence should be addressed. e-mail: watkins{at}kennedykrieger.org
| ABSTRACT |
|---|
|
|
|---|
Supplementary key words fatty acid activation fatty acid metabolism conserved motifs bioinformatics consensus sequence phylogenetic analysis structure-function
Abbreviations: ACS, acyl-coenzyme A synthetase; ACSAc, yeast or bacterial acetyl-coenzyme A synthetase; ACSBG, bubblegum ACS; ACSF, ACS family; ACSL, long-chain ACS; ACSM, medium-chain ACS; ACSS, short-chain ACS; ACSVL, very long-chain ACS; BLAST, Basic Local Alignment Search Tool; EST, expressed sequence tag; FATP, fatty acid transport protein; HUGO, Human Genome Organization; LCFA, long-chain fatty acid; NCBI, National Center for Biotechnology Information; ttACS, Thermus thermophilus ACS; VLCFA, very long-chain fatty acid
| INTRODUCTION |
|---|
|
|
|---|
The diversity of fatty acids in nature is extensive. Fatty acids can range widely in their chain lengths, from the 2 carbon acid, acetate, to those containing >30 carbons in some waxes and plant lipids. Furthermore, fatty acids can be found that are totally saturated, that contain one (monounsaturated) or more (polyunsaturated) double bonds, or that have methyl branches. Thus, hundreds of naturally occurring fatty acid species exist. It is not surprising, therefore, that higher organisms contain multiple enzymes with ACS activity to facilitate both anabolic and catabolic reactions of fatty acids.
Before the era of abundant bioinformatic data, fatty acid activation activity was often characterized biochemically by chain length preference. The ACS activities found in different tissues and in different subcellular locations, particularly mitochondria and endoplasmic reticulum membranes (microsomes), were also characterized. These early investigations gave rise to the notion that there was an acetyl-CoA synthetase, a butyryl-CoA synthetase, a medium-chain ACS (ACSM), and a long-chain ACS (ACSL) (2–5). Subsequent studies predicted the existence of a very long-chain ACS (ACSVL) (6). However, there is significant overlap in the chain length specificity of, for example, ACSMs and ACSLs, and this may also vary depending on the degree of unsaturation of the fatty acid substrate. Nonetheless, classification provides a useful framework for defining subfamilies of related enzymes. Short-chain ACSs (ACSSs) typically activate acetate, propionate, or butyrate. ACSMs are those that activate C6 to C10 fatty acids. ACSLs are typically thought of as those that activate palmitate (C16:0) and oleate (C18:1), the most common fatty acids found in nature. However, the optimal chain lengths for these enzymes are frequently shorter, such as C12:0 (5, 7). ACSVLs have been so named not necessarily because they prefer very long-chain fatty acid (VLCFA) substrates but because they are capable of using these substrates. These enzymes often have a higher rate of activation of long-chain fatty acids (LCFAs) than VLCFAs (8–10).
With the sequencing of the human (and other) genomes completed, it is now possible to identify the entire complement of an organism's ACS genes and predicted protein products. Using highly conserved amino acid sequence motifs, 26 proven or likely human ACS genes were detected. Some have been characterized biochemically, whereas others have not yet been investigated. Two new candidate human ACSs were found to have enzymatic activity. Examination of amino acid sequences of all identified ACSs revealed conserved residues predicted by structural or biochemical studies to be important for catalysis and/or substrate binding. The availability of this information will facilitate future studies to elucidate the specific metabolic function of each ACS.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Two pattern searches were also performed with the following query sequence derived from Motif 2: T-G-D-x(6,8)-G-x(3)-[F,I,V,M]x(2)-R-x(4)-[I,l,F,V]x(3,4)-G-x(2)-[l,I,V,F]x(4)-[V,I,l]-E. The Protein Information Resource Pattern/Peptide Match program (http://pir.georgetown.edu/pirwww/search/pattern.shtml) was used to interrogate the Protein Information Resourc nonredundant reference protein database of human proteins, and pattern-hit-initiated BLAST was used to probe the human nonredundant protein database. These searches yielded no additional candidate ACSs. In the syntax used for the above query sequence and for consensus sequences, any amino acid shown in square brackets can occupy that position. X(n) indicates a sequence of n unspecified amino acid residues.
Validation of candidate human ACS genes and proteins
To verify that previously unidentified candidate sequences a) represented likely ACSs and b) represented proteins that were expressed in humans, several analyses were performed. First, the overall size of the predicted protein was considered, as most ACSs consist of 600–700 amino acid residues. Second, amino acid sequences were examined for the presence of the aforementioned conserved residues (Motifs 1 and 2), and the positions of these motifs within the sequences were determined (see Results). Third, putative open reading frames were used in BLAST searches of the mammalian nonredundant protein database to identify potential orthologs in nonhuman species. Fourth, BLAT (BLAST-Like Alignment Tool; http://genome.ucsc.edu) was used to search the human genome to ascertain whether the sequence was supported by genomic evidence. Fifth, a human expressed sequence tag (EST) database at the National Center for Biotechnology Information (NCBI) was queried to determine whether sequences supporting the expression of putative ACS genes were present. With the exception of one protein tentatively identified as aminoadipic semialdehyde dehydrogenase [ACS family 4 (ACSF4)], candidate sequences not meeting these criteria were not considered further.
Identification of additional conserved motifs and derivation of consensus sequences
Data from the crystal structures of two bacterial ACSs (12, 13) and one yeast ACS (14) were used to identify regions other than Motifs 1 and 2 that potentially contained conserved amino acid sequences. For all protein sequences, regions of interest containing up to 30 amino acids were chosen based on their locations relative to Motifs 1 and 2 and aligned using ClustalW (http://www.ebi.ac.uk/clustalw), MAFFT (15) (http://www.ebi.ac.uk/mafft/), or MUSCLE (16) (http://www.ebi.ac.uk/muscle/). Aligned sequences were further analyzed by generation of sequence logos (17) using WebLogo (18) (http://weblogo.berkeley.edu). Consensus sequences were derived from multiple sequence alignments and sequence logos.
cDNA cloning and protein expression
A plasmid containing full-length human ACSF2 was purchased from OriGene (TrueClone collection, catalog number TC108350). For transfer to the mammalian expression vector, pcDNA3 (Invitrogen), the open reading frame was amplified by PCR in a two-step process. The first amplification used TC108350 as template and forward oligonucleotide 5'-aatttggatccagagccatggctgtctacgtc-3', which incorporates a BamHI restriction site (underlined), and reverse oligonucleotide 5'-gttcccggacccatccaggag-3'. The PCR product was used as template for a second round of amplification with the same forward oligonucleotide and reverse oligonucleotide 5'-aaatttgcggccgctattcacagatttagatg-3', which incorporates a NotI restriction site (underlined). After digestion with restriction enzymes, the resulting 1,847 bp fragment was cloned into the BamHI and NotI sites of pcDNA3. The nucleotide sequence was determined and found to be identical to that of NM_025149. Human ACSF3 full-length cDNA was also amplified by PCR using a human liver cDNA library (Clontech) as template and forward oligonucleotide 5'-cccgaattccttacctcctctctctggct-3', which incorporates an EcoRI site (underlined), and reverse oligonucleotide 5'-ggatctagacgtggttctcggtgtgaagg-3', which incorporates an XbaI site (underlined). After restriction enzyme digestion of the PCR product, the 1,865 bp fragment was cloned into the EcoRI and XbaI sites of pcDNA3 and completely sequenced. The sequence was identical to that of NM_174917.
ACS assays
COS-1 cells (American Type Culture Collection) were transfected with full-length cDNA constructs encoding ACSF2, ACSF3, or the empty pcDNA3 vector by electroporation as described previously (10). Three days after transfection, cells were harvested and subjected to at least one freeze-thaw cycle as described (10) and assayed for their ability to activate [1-14C]octanoic acid (American Radiolabeled Chemicals), [1-14C]palmitic acid (Moravek Biochemicals), or [1-14C]lignoceric acid (Moravek Biochemicals) as described previously (19). Final fatty acid concentrations in assays were 400 µM for octanoate and 20 µM for palmitate and lignocerate and included
100,000 dpm (1 nmol) of labeled fatty acid. Fatty acids were solubilized using
-cyclodextrin (10 mg/ml in 10 mM Tris, pH 8.0) and incubated for 20 min at 37°C in 40 mM Tris, pH 7.5, 10 mM ATP, 10 mM MgCl2, 0.2 mM CoA, 0.2 mM DTT, and cell suspension (15 µg protein/assay for octanoate and palmitate, 60 µg protein/assay for lignocerate) in a total volume of 250 µl. For assay of ACSF3, reaction mixtures also contained Triton X-100 (final concentration, 0.1%). Assays were terminated by the addition of ice-cold Dole's solution, and separation of aqueous (acyl-CoA) and organic (fatty acid) phases was done according to the method of Dole (20). Radioactivity in the aqueous phase was quantitated by scintillation counting.
ACS nomenclature
Many inconsistencies in the names of the various ACS enzymes can be found in the literature, some of which have affected the approved names for genes encoding these proteins. Recently, a consensus was reached between investigators and the Human Genome Organization (HUGO) Gene Nomenclature Committee regarding the mammalian ACSLs (21). The gene name "acyl-CoA synthetase long-chain family member_" was approved. Approved gene symbols "ACSL_" have a hierarchical structure, with the root "ACS" followed by the letter "L" for long-chain and a number designating each subfamily member. In part based on findings reported here, similar changes were approved in 2005 for some, but not all, members of the ACSS, ACSM, and ACSBG subfamilies.
The rationale for the ACSS family nomenclature is as follows. The older approved gene symbols for the two known human acetyl-CoA synthetases were ACAS2 and ACAS2L (for ACAS2-like); there was no ACAS1. Converting to the uniform nomenclature system for the ACS proteins, ACAS2 became ACSS2 and ACAS2L became ACSS1. The proposed name for a third subfamily member, identified in this work, is ACSS3. The ACSM nomenclature changes were based on the following rationale. The protein encoded by the BUCS1 gene was also referred to as MACS1 (22), which was changed to ACSM1 in the uniform nomenclature system. Another well-described protein was the HXM-A form of xenobiotic/medium-chain fatty acid:CoA ligase (23). Its gene name was changed by HUGO first to ACSM2 and, more recently, to ACSM2B. The human SA protein has been designated ACSM3. Three additional ACSM subfamily members are described in the present study. The proposed name for the human protein most similar to an olfactory-specific ACSM described in rats is ACSM4. The proposed names for the two remaining human ACSM candidates are ACSM5 and ACSM6. However, because of the high sequence similarity between ACSM2B and ACSM6, the latter was recently renamed ACSM2A by HUGO. The bubblegum ACS first reported in Drosophila (24) and later in humans (11) was designated ACSBG1, and a second homolog was designated ACSBG2 (25).
The approved gene names and symbols for the six members of the ACSVL subfamily have not yet been changed. These proteins were also reported to be fatty acid transport proteins (FATPs) (26, 27); thus, their approved gene names and symbols are "solute carrier family 27 (fatty acid transporter) member_" and "SLC27A_," respectively. We propose that the first enzyme described as being capable of activating VLCFAs (28, 29), currently SLC27A2, be designated ACSVL1. We suggest that SLC27A6, the protein with the highest amino acid identity to ACSVL1, be called ACSVL2. We propose that SLC27A3 be named ACSVL3, SLC27A1 (FATP1) be named ACSVL4, and SLC27A4 (FATP4) be named ACSVL5. Finally, we suggest that SLC27A5, which preferentially activates the acyl side chain of bile acids rather than fatty acids (30, 31), be called ACSVL6.
Four proteins identified herein could not be assigned to the ACSS, ACSM, ACSL, ACSVL, or ACSBG subfamily. All have structural features suggesting that they belong to the greater ACS family. HUGO nomenclature advisors have suggested using the interim designation ACSF (for ACS Family) members 1–4.
Phylogenetic analysis
ACS sequences for five additional species (the mouse Mus musculus, the zebrafish Danio rerio, the fruitfly Drosophila melanogaster, the nematode Caenorhabditis elegans, and the yeast Saccharomyces cerevisiae) were identified using a limited subset of the criteria used to identify human ACS sequences. BLAST searches used YTSGTTGLPK and FTSGTTGLPK as query sequences, and only matches with provisional or final RefSeq entries in NCBI databases were considered further. Amino acid sequence alignment was performed using MUSCLE and MAFFT; if Motif 2 was absent, the sequence was discarded. Although this method was not exhaustive, 113 likely ACS sequences from nonhuman species were identified. Phylogenetic trees were generated using the PAUP (Phylogenetic Analysis Using Parsimony) (32) and MEGA4 (Molecular Evolutionary Genetic Analysis) (33) programs. For neighbor-joining analysis in MEGA4 (34), evolutionary distances were computed using the Poisson correction method. All positions containing gaps were eliminated from the data set (complete deletion option). For the 139 proteins (26 human plus 113 nonhuman) shown in Fig. 3 below, there were 175 total positions in the final data set. The robustness of tree topology was evaluated by bootstrap analysis using a resampling size of 1,000 replicates. Segregation of ACSs into families was done by phylogenetic analyses of supported clades (see Results).
|
| RESULTS |
|---|
|
|
|---|
![]() | (I) |
![]() | (II) |
|
|
|
![]() | (3) |
In all previously documented ACS sequences, this second conserved domain was invariably located downstream of Motif I, with the conserved Arg
260 residues from Motif I (see below). Therefore, to identify all human ACSs, we used both Motif II (Table 1, Fig. 1B) and the previously described longer sequences (11) in secondary and tertiary screens of human protein and DNA databases for candidate ACSs, as described in Materials and Methods.
Identification of new candidate human ACS genes and proteins
During the last several decades, many human and other mammalian ACS genes and their predicted protein products have been reported. These include enzymes capable of activating short-, medium-, long-, and very long-chain FAs and related acyl-containing compounds such as bile acids, bile acid precursors, and acetoacetate. All of these known ACSs were found to contain both Motif I and Motif II, and the relative positions of these domains within the coding sequences are as expected, with the conserved Arg of Motif II
260 residues downstream of Motif I. We hypothesized that the human genome might encode additional proteins with ACS activity.
Our primary screen, which probed NCBI databases for the four most commonly encountered variants of Motif I, identified
100 human proteins or protein fragments containing sequences with significant homology to Motif I of bona fide ACSs. Many of these proteins were previously identified as ACSs, and many redundant sequences were present. However, several new candidate proteins were also detected. Secondary BLAST searches using the Motif II sequences from representative human ACSSs, ACSMs, ACSLs, and ACSVLs, as well as from ACSBG1, as query sequences did not identify additional candidate genes or proteins. However, subsequent BLAST searches using full-length amino acid sequences of new candidate proteins as the query identified additional proteins or protein fragments containing potential Motif I and/or Motif II sequences. A tertiary screening using either the Pattern/Peptide Match program or pattern-hit-initiated BLAST revealed no additional new sequences.
Using the above sequence analyses and related bioinformatics tools, we found a total of 26 ACS genes in the human genome (Tables 2 , 3 ). Twenty of these genes encode proteins previously reported to have ACS activity in either humans or rodents, and six remain candidate ACS genes. The latter include genes provisionally designated ACSM2A, ACSS3, ACSM5, ACSF2, ACSF3, and ACSF4 (Tables 2, 3). [Concurrent with this work, we established that murine ACSF2 was enzymatically active (D. Maiguel, M. Morita, Z. Pei, M. L. Maguire, Z. Jia, and P. A. Watkins, unpublished observation).] The amino acid sequences of two of these candidate ACSs, ACSM2A and ACSM5, have been published because of their homology to ACSM3 (formerly known as SAH) (40, 41). However, the biological functions of ACSM2A and ACSM5 have not yet been characterized in any species.
|
|
ACSM2A and ACSM2B are distinct genes
The ACSM2A and ASCM2B genes and their encoded proteins are nearly identical and thus difficult to distinguish. The coding sequences of these genes are 98.8% identical, and their amino acid sequences are 97.6% identical. Thus, it would be possible to infer that experimentally determined differences were attributable to polymorphisms or sequencing errors. However, there is ample evidence supporting the existence of both genes. Both are located on chromosome 16p12.3, but whereas ACSM2A is on the plus strand, ACSM2B is on the minus strand (Table 3). Both nucleotide sequences are supported by genomic sequence data and the existence of informative ESTs (Table 3). ACSM2A and ACSM2B have 20 nucleotide differences in the coding region (involving 19 codons), of which 14 are nonsynonymous substitutions and 6 are synonymous substitutions. Of the amino acid changes resulting from nonsynonymous substitution, only one lies within a conserved motif. Residue 463, found in Motif II, is Asn in ACSM2A and Asp in ACSM2B (Table 1). Although the 3' untranslated regions of ACSM2A and ACSM2B are also very related (94.7% identity over 113 bp), the 5' untranslated regions of the two transcripts show more variability (59.6% identity over 146 bp). Despite these differences, distinguishing these genes/proteins experimentally (e.g., by Northern blot or Western blot) would be extremely difficult.
ACS transcript variants
Multiple isoforms of seven human ACSs were detected by BLAST searching. Two variants each of human ACSS2, ACSM3, ACSL3, ACSL4, ACSL6, and ACSVL2 were found, along with three variants of ACSL5 (Table 3). The ACSL3 and ACSVL2 variants, along with two of three ACSL5 variants, differ only in their 5' untranslated regions and are expected to encode identical proteins. The ACSS2 variants are predicted, according to NCBI RefSeq annotation, to arise via an alternative splicing event whereby a different first exon (which includes the initiator methionine codon) is used. The protein encoded by ACSS2_v2 is shorter than that encoded by ACSS2_v1 by 50 amino acids at the N terminus.
The ACSL4 variants arise from the use of different exons 3 (each containing an initiator methionine). ACSL4_v2 also has an additional exon (exon 4) not found in ACSL4_v1. The encoded proteins thus differ at their predicted N termini, with ACSL4_v2 containing 41 additional amino acids than ACSL4_v1. The longer isoform is the predominant form in human brain (44). ACSL5_v1 is also longer than either v2 or v3 at the N terminus. Whereas exon 1 in ACSL5_v1 contains an in-frame ATG codon, exons 1 in the other two variants do not. The latter variants use an alternative in-frame start codon found in exon 2, according to NCBI RefSeq annotation.
ACSL6 transcript variants are distinct from the other ACS variants identified in that they differ not at their N or C termini but internally. Two alternative exons 11 are found in the human ACSL6 gene. Exon 11 contains the sequence encoding the conserved domain referred to as Motif IV (see below). To date, the proteins encoded by rat ACSL6_v1 and ACSL6_v2 are the only variants whose enzymatic activity has been investigated in any species (45).
The NCBI database contains a reference sequence (NM_202000) for a variant of ACSM3 (ACSM3_v2) whose predicted protein product would lack Motif II and thus would not satisfy our criteria for a candidate ACS. This variant is predicted to arise via an alternative splicing event that results in a longer exon 9, which contains a stop codon. We predict that ACSM3_v2 would not be an enzymatically active ACS, but we include it in Table 3 for the sake of completeness.
Two of the newly identified human ACS candidates are enzymatically active
As proof of principle, we chose two candidate ACSs identified in this screen for functional studies. We cloned full-length cDNA encoding human ACSF2 and ACSF3 and expressed the proteins in COS-1 cells. We then examined the ability of ACSF2- or ACSF3-overexpressing cells to activate a representative medium-, long-, or very long-chain FA. Compared with vector-transfected cells, ACSF2-expressing cells robustly activated the 8 carbon medium-chain fatty acid, octanoate, but not the 16 or 24 carbon substrate (Table 4
). In contrast, ACSF3-expressing cells showed a preference for lignoceric acid, a 24 carbon VLCFA (Table 4). Neither ACSF2 nor ACSF3 showed significant ability to activate the 16 carbon LCFA, palmitate. Thus, both ACSF2 and ACSF3 are, as predicted, fatty acyl-CoA synthetases.
|
Pairwise alignments of full-length amino acid sequences of these proteins revealed that amino acid identities between subfamily members ranged from 29% to 96% [averaging 48 ± 13% (mean ± SD) for 44 alignments], whereas the identity of nonsubfamily pairs ranged from 15% to 27% (averaging 20 ± 2% for 281 alignments) (see supplementary Table I). Although the average percentage identity within subfamilies was higher for Motif II (71 ± 13% for 44 alignments) than for the full-length sequences, there was a broader range (39–97%). The average identity of nonsubfamily pairs (32 ± 7% for 281 alignments) was slightly higher than that for full-length sequences, and the values ranged from 14% to 53% (see supplementary Table II). The highest degree of intersubfamily Motif II identity was observed between the ACSS and ACSM proteins, for which the range was 42–53%.
Phylogenetic analyses of human and nonhuman ACS sequences
We performed phylogenetic analyses to infer the evolutionary relationships of the human proteins. Twenty-six human proteins were multiply aligned with MUSCLE; alignments with ClustalW or MAFFT differed somewhat (as is expected given their varying strategies for optimizing alignments) but yielded comparable phylogenetic trees (data not shown). We obtained comparable results using the neighbor-joining distance-based algorithm as well as maximum parsimony, and we also obtained similar results for the relationships of a) 26 human paralogs or b) these 26 human paralogs analyzed together with 113 orthologs identified in five additional species: mouse, zebrafish, fruitfly, nematode, and yeast. A neighbor-joining tree with all 139 homologs is shown in Fig. 3
. The 26 human proteins are indicated by arrows and labels with larger font size. We observed 10 major groups, including five clades corresponding to ACSS, ACSM, ACSL, ACSVL, and ACSBG proteins. We also observed five clades that we designated ACSF1, ACSF2, ACSF3, ACSF4, and worm/fly. We performed 1,000 bootstrap replicates to assess the robustness of each node and observed strong support for the topology of the tree.
We noted several features of the phylogenetic tree shown in Fig. 3. First, the worm/fly clade lacked human, mouse, fish, or yeast members and thus appears to represent a nonvertebrate, nonfungal expansion. The worm/fly clade consisted of two subgroups (one set of 11 Drosophila proteins forming a clade with 98% bootstrap support, and a second group of seven Drosophila and C. elegans proteins). Second, the medium-chain clade had six human members but no Drosophila or C. elegans members, and additional BLAST searching revealed no apparent medium-chain family members in other insect or nematode species. These species differences may reflect different metabolic requirements of these organisms. Five of the six human ACSM genes in the medium-chain clade are located on chromosome 16p12.2-13.11 and thus may have arisen by tandem duplication. Third, of the 10 clades we outlined in Fig. 3, most had an ancestral node that had
95% bootstrap support, indicating a robust estimate of the topology. The ACSF4 clade included an ancestral node with >95% support, with a Drosophila protein (dm11) placed as an outgroup with less support. Also, the long-chain clade had 35 members (including the five human proteins ACSL1, ACSL3, ACSL4, ACSL5, and ACSL6). Of these proteins, a subgroup of 14 including ACSL3 and ACSL4 was evident. Nonetheless, all of these proteins share similar substrate specificity and thus were classified as a single subfamily. Finally, in addition to the 10 major clades, an outlier containing a single yeast protein (sc6) was noted. This protein, known as Pcs60p or Fatp2, is a peroxisomal protein predicted to have ACS activity (46). We previously reported that Fatp2 belonged to an ACS subfamily that contains fungal, plant, and bacterial, but no mammalian, enzymes (11).
Structure-function correlations and identification of additional conserved ACS domains
At present, our knowledge of structure-function relationships among the various ACSs remains limited, particularly with respect to the mammalian enzymes. Mutagenesis experiments involving bacterial (38, 39, 47), yeast (48), and plant (49) ACSs and related proteins have identified several residues that are critical for enzyme activity. Crystal structures of yeast or bacterial acetyl-coenzyme A synthetases (ACSAcs; members of the ACSS subfamily) from the bacterium Salmonella enterica (12) and the yeast S. cerevisiae (14), and a putative bacterial ACSL from the extreme thermophile Thermus thermophilus (ttACS; a member of the ACSL subfamily) (13), have recently been described, allowing further predictions of functional residues. Not surprisingly, many of the amino acids identified as critical for enzymatic activity are those found in either Motif I or Motif II, as these are the most highly conserved residues. We hypothesized that examination of the human ACS sequences should permit the identification of additional conserved residues or domains that may be important for substrate binding, catalysis, enzyme regulation, or protein-protein interactions.
Hisanaga et al. (13) defined four structurally significant domains in ttACS that they referred to as the P-loop and the L-, A-, and G-motifs. The P-loop residues are the last 9 of 10 amino acids that constitute Motif I (Table 1), and the L-motif consists of 6 amino acids (432-DRLKDL-437) found within Motif II (T416 through E451) of ttACS (13). The A-motif of ttACS is a sequence of seven amino acids (323-GYGLTET-329) located between Motifs I and II. Examination of the human ACSs revealed that 22 of the 26 proteins contained a related sequence (consensus, YGXTE), herein referred to as Motif III (Table 1, Fig. 1C). Motif III was found 70–100 residues upstream of Motif II. Only the three members of the ACSS family and ACSF1 lacked Motif III. However, the related sequence [W,F]WQTE was found in a similar region of the ACSS proteins but not in ACSF1. The WWQTE motif was also found instead of the A-motif in S. enterica ACSAc (12). A nine amino acid G- (or gate) motif (226-VPMFHVNAW-234) is located just downstream of Motif I in ttACS (13). A conserved motif (Motif IV) homologous to the first five residues of the gate motif, with consensus LPLXH, was found in 15 human ACSs, including all ACSL, ACSVL, and ACSBG family members, as well as ACSF2 and ACSF3 (Table 1, Fig. 1D).
Mutation of a lysine residue (K592) near the C terminus of S. enterica propionyl-CoA synthetase (a member of the ACSS subfamily) to either alanine or glutamate prevented the conversion of propionate to propionyl-CoA (47). Interestingly, K592 was found to be essential for the formation of propionyl-AMP but not for the conversion of propionyl-AMP to propionyl-CoA. This residue corresponds to K609 of S. enterica ACSAc, which is regulated by acetylation/deacetylation (50). Acetylation effectively blocks the formation of acetyl-AMP (reaction I shown above) without affecting thioesterification to CoA (reaction II) (50). Deacetylation, catalyzed by the Sir2 protein in a NAD-dependent reaction, activates the enzyme by releasing this inhibition of reaction I (50). The motif containing the conserved lysine (underlined) in both S. enterica enzymes is PKTRSGKXXR (50). This motif, with consensus PKTX[S,T]GKIX[R,K], can be found in 13 human ACS sequences, including all ACSS and ACSM family members and ACSF1, ACSF2, ACSF3, and ACSF4 (Motif V) (Table 1, Fig. 1E). Although the sequence preceding the conserved K in the consensus motif is not found in members of the ACSL, ACSVL, or ACSBG families, the KXX[R,K] motif is present (Table 1). The KXX[R,K] motif is found in 24 of the human ACS sequences; in ACSF1 it is KXXE, and in ACSF4 it is KXXV.
| DISCUSSION |
|---|
|
|
|---|
The strategy used here to identify human ACS sequences using two conserved amino acid sequence motifs built upon previous work by us (11) and others (37–39, 49, 51–53). All enzymatically active ACSs, including ACSF2 and ACSF3 described in this work, contained both Motifs I and II. The locations of Motifs I and II, and the distance separating these sequences, were similar in all of these sequences, and most human ACSs were similar in size. Two exceptions were ACSM3_v2 and ACSF4 (Table 3). Transcript variant 2 of ACSM3 was shorter than other ACSs (438 amino acids) and lacked Motif II. Although experimental proof is not yet available, we predict that ACSM3_v2 is devoid of ACS activity, although it may have other biological functions.
ACSF4 was identified as an ACS gene in this report. Although the ACSF4 amino acid sequence contains structural features suggesting that it belongs to the ACS family (e.g., the relative positions of Motifs I and II with respect to the initiator methionine and with respect to each other), the protein is substantially larger than all other ACSs (1,098 amino acids) and had previously been identified as 2-aminoadipic 6-semialdehyde dehydrogenase (43). ACSF4 is homologous to the yeast enzyme LYS2, which is required for lysine biosynthesis in lower eukaryotes (54). In humans, in which lysine is an essential amino acid, this enzyme operates in the reverse direction and serves a catabolic function. In S. cerevisiae, LYS2 is activated by the phosphopantetheinylation of serine 880 in an ATP-dependent reaction catalyzed by LYS5; CoA is the donor of the phosphopantetheine group (55). A homologous residue, serine 589, is found in ACSF4. Thus, it is possible that conserved ACS motifs in ACSF4 serve a function other than fatty acid activation. Further experimentation is necessary to establish whether ACSF4 has ACS activity.
The arginine residue found in the center of Motif II (Table 1) is essentially invariant and is present in all ACSs from all species from archaea to humans (P. A. Watkins, unpublished observation). To the best of our knowledge, only one bona fide ACS, human ACSBG2, has a different residue (histidine) at this position. ASCBG2 sequences from chimpanzees, rhesus monkeys, dogs, rats, and mice all retain the conserved arginine. We previously reported that the arginine-to-histidine substitution in human ACSBG2 decreased the pH optimum of the enzyme (25), but the physiologic significance of this change remains obscure.
There was a reasonably good correlation between the phylogenetic placement of ACS sequences into families of structurally related proteins and the substrate preferences of these enzymes. Enzymes known to activate short-, medium-, long-, and very long-chain FAs were assigned to the ACSS, ACSM, ACSL, and ACSVL families, respectively. For a few enzymes, particularly ACSS3, ACSM5, and ACSM2A, the appropriateness of their placement remains speculative until confirmed experimentally. The two ACSBG family members have unique substrate specificities. Although ACSBG1 was thought to activate VLCFA substrates based on overexpression studies (11), subsequent investigation of the endogenous enzyme using RNA interference revealed a high specificity for palmitic acid (C16:0) (56). ACSBG2 preferentially activates oleic (C18:1) and linoleic (C18:2) acid substrates (25).
In a previous study, we identified an ACS subfamily that contained Fat2p, a putative ACS from S. cerevisiae (46). This family contained proteins from Schizosaccharomyces pombe, Mycobacterium tuberculosis, and Arabidopsis thaliana, but no human or other mammalian homologs were identified (11). Interestingly, Fat2p (designated sc6 and located between the worm/fly clade and the ACSF2 clade in Fig. 3) also appears to have no zebrafish, fruitfly, or worm homologs. Enzymatic activity of Fatp2 has not yet been verified.
Knowing the amino acid sequences of all human ACSs facilitated the evaluation of conserved domains. This knowledge should enhance our understanding of structure-function relationships in these enzymes. Motif I (Table 1, Fig. 1A) includes the P-loop described by Hisanaga et al. (13). Often referred to as the AMP binding domain, the P-loop is found in close proximity to the adenosine moiety and helps maintain the substrate in the proper orientation. Mutagenesis of several residues within Motif I in the Escherichia coli ACSL, FadD, resulted in decreased enzyme activity (38). Mutation of the first Motif I residue in FadD, Y213, nearly abolished activity, whereas mutations in residues 2, 4, 5, and 10 (T214, G216, T217, and K222) led to reduced catalytic efficiency. A similar result was found when the homologous lysine (K248) of S. enterica propionyl-CoA synthetase, a member of the ACSS family, was mutated (47). Mutagenesis of Motif I residues 1 and 5 (Y256 and T260) of S. cerevisiae Fat1p, a member of the ACSVL family, produced only a mild reduction in enzyme activity, whereas a mutation in residue 3 (S258) had a more severe effect on activity (48).
The Motif II sequence (Table 1, Fig. 1B) contains the L-motif (432-DRLKDL-437) that in ttACS acts as a linker between the large N-terminal domain and the smaller C-terminal domain (13). The linker region is thought to be critical for catalytic function, as it facilitates a conformational change upon ATP binding that permits subsequent binding of the fatty acyl and/or CoA substrates. This hypothesis was reinforced by examination of the crystal structures of ACSAc from both yeast and bacteria, in which the "hinge" residue was identified as an aspartate residue (corresponding to the underlined D in the L-motif) (12, 14). This aspartate residue is conserved in 18 human ACS sequences. Because of the variability at this position in the other eight sequences (E, E, H, H, G, N, S, and V), this residue was not included in the Motif II consensus. However, if the conformational change predicted by the three available crystal structures is applicable to all ACSs, a hinge amino acid is likely to be critical for enzyme activity.
The signature motif, identified by Black and coworkers (39) in a group of enzymes from diverse species belonging to the ACSL family, overlaps with the first 20 residues of Motif II and contains the first four amino acids of the linker motif. Mutagenesis of several Motif II residues in FadD, including amino acids 1 and 3 (T436 and D438), and the highly conserved arginine (R453) significantly decreased catalytic function (39). Similarly, mutagenesis of the corresponding aspartate (D508) and arginine (R523) of Fat1p was deleterious for fatty acid activation (48). Interestingly, mutations in two Fat1p Motif II residues (Y519 and S536) that are not well conserved between the different ACS families and thus are not part of the consensus also decreased catalytic function. A lysine found six residues downstream of the conserved arginine was critical for the activity of the related plant enzyme, coumarate-CoA ligase (49), and was proposed to participate catalytically in ttACS (13). Although this lysine can be found in the five human ACSL proteins and in ACSF4, it is not conserved among the other 20 human ACSs.
Motif III (Table 1, Fig. 1C) was found in nearly all human ACSs and is part of the A- (or adenine) motif of ttACS (13). This region has been described as an ATP/AMP binding domain in other ACSs (38, 47, 57). Structural analysis of ttACS showed that Y324 was an adenine binding residue (13). Site-directed mutagenesis of the glutamate residue of Motif III in the E. coli ACSL, FadD (E361), abolished enzyme activity (38). The crystal structure of S. enterica ACSAc revealed that the conserved glutamate residue of Motif III is positioned near oxygen O1 of the AMP phosphate (12). The tryptophan residues in this loop, like the adenine binding Y324 of ttACS, are in proximity to the adenine ring, suggesting an essential role in substrate binding or stabilization.
Motif IV (Table 1, Fig. 1D) was found in 15 human ACS sequences. This motif comprises the first five residues of the nine amino acid G- (or gate) motif (226-VPMFHVNAW-234) of ttACS (13). From the crystal structure of ttACS, it was proposed that the indole ring of W234 acts as a gate and blocks the entry of fatty acids into its substrate binding tunnel unless ATP is first bound, resulting in a conformational change that swings the gate open (13). However, a tryptophan residue corresponding to W234 was not found in any human ACS sequences. In contrast, although no highly conserved sequences were identified, candidate gate tryptophan residues were found in the expected region of ACSS and ACSM family members as well as in ACSF1. Because ACSL, ACSVL, and ACSBG enzymes activate longer chain fatty acid substrates, a corresponding gate residue may be located elsewhere in the structure.
The conserved lysine residue of Motif V (Table 1, Fig. 1E), required for the catalytic activity of S. enterica propionyl-CoA synthetase (47), was recently found to be essential for the ACS activity of murine ACSF2 (D. Maiguel and P. A. Watkins, unpublished). In the yeast ACSAc crystallized as a binary complex with AMP, the corresponding lysine residue was located near the catalytic site (14). In contrast, this amino acid was found on the surface of the bacterial ACSAc crystallized as a ternary complex with propyl-AMP and CoA (12). These observations are consistent with the proposed role of this residue in the first half-reaction and with the subsequent large conformational change (rotation of
140°) in the C-terminal domain that may help create the CoA binding pocket. In ttACS crystallized as a complex with myristoyl-AMP, the homologous 524-KXXK-527 motif is part of a loop-helix also found on the surface of the protein near the C terminus (13). However, K527 and not K524 was one of three residues proposed to stabilize the closed conformation of the protein (after ATP binding) by forming noncovalent interactions with residues of the L-motif (found within motif II) and the N-terminal domain (13). Evidence for the control of mammalian ACSS1 and ACSS2 activity by reversible acetylation of the conserved Motif V lysine residue was published recently, solidifying the importance of this domain (58, 59).
For several human ACS genes, alternative transcripts were identified. To date, only the two variants of ACSL6 have been investigated at the biochemical level. These variants differ in a 27 amino acid stretch that encompasses Motif IV and the gate domain. These two variants differed primarily in their Km for ATP (45). Soupene and Kuypers (60) recently reported additional transcript variants of human ACSL family members, which were isolated primarily from erythroid cells using PCR. These authors found two additional variants of ACSL1 and three additional variants of ACSL6. We did not include these variants in Table 3, either because a) full-length sequences were not available or b) we could not find corroborating evidence in public databases to support the existence of these variants. One ACSL1 variant (a 373 amino acid fragment) contained an alternatively spliced exon encompassing Motif IV, highly similar to the situation with the ACSL6 variants. Although no supporting sequences were present in the nonredundant databases, one relevant human EST (from trachea) was found, suggesting that this represents a true variant. A fragment of a third ACSL1 variant (93 amino acids) was also reported, but no nonredundant or EST sequences were found to substantiate it. Two of the additional ACSL6 variants, designated ACSL6_v3 (622 amino acids) and ACSL6_v5 (712 amino acids), could represent full-length ACS sequences, but the third variant, ACSL6_v4, was a fragment of 115 amino acids. However, we were unable to detect unequivocal supporting evidence for any of these ACSL6 variants in nonredundant or EST databases. Further studies are thus needed to establish the validity of these ACSL variants.
Finally, to establish the validity of our ACS identification strategy, we demonstrated that two candidate human ACSs that had not been studied previously are indeed enzymatically active. For this, we chose two of the four ACS sequences that were evolutionarily divergent from established ACS families, namely ACSF2 and ACSF3. Because these putative ACSs were "orphans," we had no preconceived notions regarding their substrate chain length preferences. One of these, ACSF2, preferred a medium-chain substrate, whereas the other, ACSF3, preferred the very long-chain substrate. More studies of these enzymes are needed to establish their complete substrate profiles and their normal role(s) in lipid metabolism.
| ACKNOWLEDGMENTS |
|---|
Manuscript received February 9, 2007 and in revised form August 23, 2007.
| REFERENCES |
|---|
|
|
|---|