Submitted on December 6, 2005
Revised on January 17, 2006
Accepted on January 27, 2006
Prediction of the functional class of lipid-binding proteins from sequence derived properties irrespective of sequence similarity
Honghuang Lin, Lianyi Han, Hailei Zhang, Chanjuan Zheng, Bin Xie, and Yuzong Chen
Department of Computational Science, National University of Singapore, Singapore 117543
Corresponding Author: csccyz{at}nus.edu.sg
Lipid-binding proteins play important roles in signaling, gene regulation, membrane trafficking, immune response, lipid metabolism and transport. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting lipid-binding proteins irrespective of sequence similarity. This work explores a statistical learning method, support vector machines (SVM), as such a method. A total of 14776 lipid-binding and 133441 non-lipid-binding proteins are used to develop the SVM prediction systems, which are evaluated by an independent set of 6768 lipid-binding and 64761 non-lipid-binding proteins. The computed prediction accuracy is 78.9%, 79.5%, 82.2%, 79.5%, 84.4%, 76.6%, 90.6%, 79.0%, and 89.9% for the class of lipid degradation, lipid metabolism, lipid synthesis, lipid transport, lipid-binding, lipopolysaccharide biosynthesis, lipoprotein, lipoyl, and all lipid-binding proteins respectively. The accuracy for the non-member proteins of each of these classes is 99.9%, 99.2%, 99.6%, 99.8%, 99.9%, 99.8%, 98.5%, 99.9% and 97.0% respectively. Of the 76 lipid-binding proteins non-homologous to any protein in the Swissprot database and not included in SVM training sets, 86.8% is correctly predicted. These suggest the usefulness of SVM for facilitating the prediction of lipid-binding proteins, particularly those of novel ones. Our software can be accessed at the SVMProt server http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.