Shorthand notation for lipid structures derived from mass spectrometry.

There is a need for a standardized, practical annotation for structures of lipid species derived from mass spectrometric approaches; i.e., for high-throughput data obtained from instruments operating in either high- or low-resolution modes. This proposal is based on common, officially accepted terms and builds upon the LIPID MAPS terminology. It aims to add defined levels of information below the LIPID MAPS nomenclature, as detailed chemical structures, including stereochemistry, are usually not automatically provided by mass spectrometric analysis. To this end, rules for lipid species annotation were developed that reflect the structural information derived from the analysis. For example, commonly used head group-specific analysis of glycerophospholipids (GP) by low-resolution instruments is neither capable of differentiating the fatty acids linked to the glycerol backbone nor able to define their bond type (ester, alkyl-, or alk-1-enyl-ether). This and other missing structural information is covered by the proposed shorthand notation presented here. Beyond GPs, we provide shorthand notation for fatty acids/acyls (FA), glycerolipids (GL), sphingolipids (SP), and sterols (ST). In summary, this defined shorthand nomenclature provides a standard methodology for reporting lipid species from mass spectrometric analysis and for constructing databases.

stereospecifi c numbering) in glycerolipids and glycerophospholipids categories has been carried out, the fatty acyl/alkyl position level is applicable. Fatty acyl/alkyl/sphingoid base structure level describes structural details of these components. Full structural analysis of the lipid species is covered by the LIPID MAPS nomenclature. The proposal presented here covers the major lipid classes of fi ve of the eight LIPID MAPS categories, namely, fatty acyls (FA), glycerolipids (GL), glycerophospholipids (GP), sphingolipids (SP), and sterols (ST), with a focus on mammalian lipids. Other minor lipid classes or lipids from other organisms could be the subject of further proposals.

GENERAL RULES FOR SHORTHAND NOTATION
All presentations of lipid species data include an a priori statement on structural resolution attained by the method of MS analysis. It should be a requirement that lipids are the different levels of structural information provided by MS into account ( Fig. 2 ). At the lowest level of resolution, the respective LIPID MAPS abbreviation ( Table 1 ) is followed either by the detected nominal mass (lipid class level mass) or by the sum of components that are expressed as their total number of carbon atoms and of double bonds (lipid species level); variable components of the species are not identifi ed. In the presence of fatty acids with oddnumbered carbon atoms, ambiguities regarding functional groups or bond types may occur. Therefore, such species are either assigned by their molecular mass or based on assumptions which should be presented with the result.
Bond type level additionally describes the type of linkage of the variable components to the lipid species' backbone without knowing the single components. When MS resolves the variable components of the lipid species (in most cases fatty acids) the fatty acyl/alkyl level notation is applicable. Finally, when a specifi c analysis for backbone position ( sn -for Fig. 1. Ambiguities in interpretation of MS data. Examples for annotation of (A) phosphatidylcholine (PC) species and (B) phosphatidylethanolamine (PE) species and typical MS approaches to identify lipid species. *The annotation is based on the assumption that ester bonds are present. **The annotation is based on the assumption of even numbered carbon chains only. ***Unambiguous identifi cation of an alkyl bond is only possible in the case of a saturated alkyl chain. ****Unambiguous determination of an alk-1-enyl bond requires specifi c MS experiments; e.g., according to Zemski, Berry, and Murphy ( 7 ). mass may identify functional groups. The following rules apply; examples are given in Table 2 : • Shorthand notation: FA number of C-atoms:number of double bonds.
• Functional groups, whose positions in the acyl chain are not known, are shown after the number of double bonds separated by an underscore and followed by the number of groups if more than one.
• Proven positions of functional groups are shown after the number of double bonds (each type of functional group inside a separate pair of parentheses). Positions according to ⌬ -nomenclature are stated in front of the functional groups that are separated by a comma if more than one.
• Double bond position is indicated by a number according to ⌬ -nomenclature (geometry unknown) or a number followed by geometry (Z for cis , E for trans ). •

GLYCEROLIPIDS (GL) AND GLYCEROPHOSPHOLIPIDS (GP)
Lipidomic approaches frequently apply direct infusion tandem MS using a low mass resolution analyzer, such as a triple quadrupole mass spectrometer ( 4,9,10 ). In this way lipid classes and species are identifi ed by selective precursor and neutral-loss scans ( 6 ). A major problem of lipid class-specifi c scans is that bond type, i.e., ester-or etherlinkage to glycerol backbone of constituent acyl-, alkyl-, and alk-1-enyl-chains, cannot be differentiated because these species are quasi-isobaric ( Fig. 1 ).
Frequently, such data are annotated based on the assumption that ester bonds are present. To demonstrate this possibility of incorrect assignation, Fig. 1 includes definition of the lipid species level for glycerophospholipids; this could equally apply to glycerolipids and fatty acyls. An approach to resolve ester and ether bonds is the application of high-resolution MS ( 11 ). However, even in highresolution MS, unsaturated O-alkyl groups cannot be differentiated from O-alk-1-enyl linked residues ( Fig. 1B ). Yet, specifi c MS methods exist to clearly identify O-alk-1-enyl linked residues having no or further double bonds in the alkyl-chain (plasmalogens ) ( 7 ).
The following rules apply for shorthand notation of both the major classes of GL and GP categories; examples are given in Tables 3 and 4 : • Shorthand notation: lipid class abbreviation followed by number of C-atoms:number of double bonds.
• Fatty acids linked to the glycerol are known: o Separator _: sn -position of the fatty acids is not known.
o Separator /: sn -position of fatty acids is proven (order sn -1/ sn -2/ sn -3 for GL; sn -1/ sn-2 or sn-2/ sn-3 for GP); no FA linked 0:0. defi ned by both their class and their nominal mass (Da). The following rules apply to all lipid categories described below: • Lipid class abbreviation heads each species description.
• Variable components (constituents), such as fatty acids, are assigned based on their mass as number of C-atoms and number of double bonds (C-atoms:double bonds).
• Only experimentally proven structural details of constituent fatty acids are assigned according to the rules defi ned for fatty acyls (see below).
• When structural ambiguities are present (e.g., bond type, hydroxyl groups, branched chains, see examples in Fig. 1 and Tables 2-7 ), species are assigned by one of the following rules: o Lipid class and the (uncharged) molecular mass (Da) in parentheses (preferred for reporting in databases lipid class level mass). For fatty acyl substituents, the mass of the corresponding free fatty acid is used (see example in Table 6 ).
o Annotation based on assumptions must be clearly visible (preferred for publications lipid species level).
• Detailed structures including stereochemistry are covered by LIPID MAPS nomenclature.

FATTY ACYLS (FA)
Fatty acids in free form and as variable fatty acyls in lipids are prevalent lipid structures. Therefore, we use fatty acids and acyls as a paradigm for application of the shorthand notation. For the sake of simplicity, we include some frequent functional groups of fatty acids but do not treat complex FA classes, such as eicosanoids and docosanoids ( 8 ). When an annotation of the fatty acid is only based on the mass (low mass resolution), usually it is assumed that a straight-chain fatty acid with no functional groups is present beyond double bond(s). High mass resolution with accurate • Four ether bonds are indicated in front of the bond type as tetra , abbreviated e.
• Rules for phosphoinositides require additional information if the position of phosphates on the inositol ring are known (see Table 4 ). The exception is PI3,4,5P 3 , as the only known tris isomer is 3,4,5. While phosphoinositides are generally presented as PIP 2 and PIP 3 in Tables 1 and 4 , we adopted the annotation PIP2 and PIP3 for ease of handling by databases. Should additional species be identified, this will require further clarification.
• Bond types other than ester bonds are indicated as follows in front of the sum of C-atoms or fatty acid: o O = proven O-alkyl-bond (it is important to note that "O" after the number of carbons designates a keto bond; see previous section) o P = proven O-alk-1-enyl-bond (acid-sensitive ether bond in "plasmalogens").
• More than one "non"-ester bond is indicated in front of the bond type as d for di , t for tri .
Additional rules for glycerophospholipids (GP): • Lysophospholipid classes are abbreviated as stated in the LIPID MAPS nomenclature ( Table 1 ). Where applicable they can be presented formally by their respective phospholipid class indicating the empty sn-position by 0:0. of hydroxyl groups is known, it is shown in front of the number of carbons (hydroxyl group level, e for tetra ).
• For further characterization of N-linked fatty acids, rules as described in an earlier section apply. A fatty acid that is ester-bound to an N-linked -OH fatty acid is shown in square brackets as [ FA C-atoms:double bonds].
• Shorthand notation for sugar moieties is stated in Table 1 .
This proposal, however, does not cover complex glycosphingolipids, which we suggest could be subject to a separate proposal.

STEROLS (ST)
We use the term sterol to embrace all molecules based on the cyclopentanoperhydrophenanthrene skeleton. All natural mammalian sterols are derived from cholesterol or its precursors, although plant sterols can also be a source. The stereochemistry of the cholesterol molecule is maintained to a large extent by mammalian sterols, which all contain at least one alcohol or oxo group attached to carbon 3. Thus, at the lipid species level, we assume the sterol has at least one alcohol group. High-resolution MS with accurate mass may identify other functional groups, as will precursor ion and neutral loss scans. Stereochemistry can often be defi ned by comparing the chromatographic retention time to authentic standards and, in some cases, by MS/MS. The following rules for shorthand nomenclature have been adopted in the examples given in Table 7 : • Shorthand notation: ST number of carbon atoms : number of double bonds.

SPHINGOLIPIDS (SP)
Several MS methods for sphingolipid species analysis use fragments resulting from the sphingoid base ( 12 ), but the commonly used precursor ion scan of m/z 184 for SM analysis is not able to differentiate between the N-linked fatty acid and sphingoid base ( 13 ), although there is a more time-consuming procedure that can provide this information ( 12 ). In this case, lipid species level annotation could be based on the assumption of the major sphingoid backbone in the respective organism; e.g., sphingosine (d18:1) in mammals or phytosphingosine (t18:0) in yeast. This assumption must be indicated a priori. High-resolution MS allows identifi cation of the number of hydroxyl groups in sphingolipids together with the sum of carbons and double bonds in the sphingoid base and N-linked fatty acid ( 14 ).
The following rules apply; examples are given in Tables  5 and 6 : • The sphingoid backbone is annotated by the number of hydroxyl groups in the sphingoid base (m for mono , d for di , t for tri ) and separated by a slash from the number of carbons:number of double bonds of the N-linked fatty acid. Positions of hydroxyl groups and double bonds including geometry are indicated as described for fatty acyls (FA).
• If the sphingoid base is not known, the sum of sphingoid base and fatty acid is shown as number of carbons:number of double bonds. Calculations are based on the number of hydroxyl groups of the major sphingoid base for that organism (dihydroxy in mammals). When the number  can be used ( Table 1 ). CE is followed by number of carbons : number of double bonds of the fatty acid esterifi ed to the hydroxyl group at position 3 ( Table 7 ).
• In the case of unproven structures and other sterol esters (SE), the shorthand notation is used as above, followed by slash number of carbons : number of double bonds of the fatty acid esterifi ed to the hydroxyl group ( Table 7 ).
• In the case of bile acids, the shorthand notation is prefaced by A to indicate an acid, and at the structure level, the location of the acid group (CO 2 H) is indicated.
• Precursor-ion scans reveal the presence of conjugating groups, i.e., taurine (T) or glycine (G), conjugated to the carboxylic acid group of bile acids through an amide bond, sulfuric acid (S) conjugated to a hydroxyl group through • Annotation at the lipid species level is based on natural sterols possessing 18, 19, 21, 24, or 27 carbons and at least one hydroxyl group at position 3.
• Functional groups, including all hydroxyl groups, are shown after the number of double bonds, separated by an underscore and followed by the number of groups if more than one.
• Proven positions of functional groups are shown after the number of double bonds, separated by a slash. Specific stereochemistry of functional groups is shown in square brackets. ( R ) and ( S ) configurations are preferred for side-chain stereochemistry and are given in italics in parentheses.
• In the case of fully proven structures of cholesterol and cholesteryl esters, abbreviations FC and CE, respectively,  different labs, its interpretation, and more importantly, the storage of the information in LIMS systems or lipidcentric public databases. The existence of such resources would also assist bioinformaticians in developing novel tools and/or approaches required to perform analyses using integrative data-mining approaches. At the moment, such studies are extremely cumbersome. We have experienced this problem in the context of the EU LipidomicNet project (http://www.lipidomicnet.org), in which more than 20 different European groups are involved. This problem triggered the concept to generate the presented standard nomenclature system. As an outcome of the LipidomicNet project, a new resource called LipidHome (http://www.ebi. ac.uk/apweiler-srv/lipidhome) has been developed specifi cally to accommodate data from high-throughput MS-based lipidomics approaches ( 15 ). LipidHome is a database of theoretical lipid species and not only uses this nomenclature but also organizes the lipids into the same hierarchy. This proposal should be considered as an important further step building upon and contributing to the work of the ILCNC. Currently, the proposal covers only the major lipid category/classes of mammalian lipids. Future proposals could add minor lipid classes or lipid classes of other organisms.
an ester bond, glucuronic acid (GlcA), N -acetylglucosamine (GlcNAc), and hexose (Hex) sugars assumed to be linked to a hydroxyl group through an acetal linkage.
• In the case full stereochemistry is known, the abbreviations given in Table 1 can be used.

DISCUSSION AND CONCLUSIONS
The presented shorthand notation for MS-derived lipid structures provides a system to report MS lipidomic data in a standardized way. The system presents two options to assign species in the presence of ambiguities, e.g., when bond type or functional groups may not be resolved by the analysis. Annotation may be either based on assumptions or according to the molecular mass. When using assumptions, these should be based on current biological knowledge and should be made clearly visible. Self-explanatory annotations based on assumptions are advantageous for publications compared with annotation of molecular mass, which should be reserved for the reporting in databases. However it is desirable and, indeed, must be the long-term aim that assumptions are not used.
Establishment of a standardized approach with which to report results from high-throughput MS lipid experiments would facilitate the interchange of information between