A Support Vector Machine based Framework for Protein Membership Prediction



The support vector machine (SVM) is a key algorithm for learning from biological data and in tasks such as protein membership prediction. Predicting structural information for a protein from its sequence alone is possible, but the extreme complexity of data demands string kernels with a dedicated design like the state-of-the-art profile kernel that exploits a very large feature space. Such a huge representation and the enormous data bases used in proteomics require an effort mirrored in an increased processing time that must be reduced to an acceptable amount. Considering the present computation paradigm the implementation of such systems must take advantage of parallelization and concurrency. In this paper a special machine learning architecture based on SVM binary models and a neural network is purposed to handle the very large multiclass problem of protein superfamily prediction parallelized through a multi-agent strategy that uses the software development framework JADE (Java Agent DEvelopment Framework) to reduce the total processing time when getting a prediction for a new query protein. The efficiency of the algorithm and the advantages of the parallelization are shown.


kernels machines;Biomedical data mining


International Symposium on Computational Intelligence for Engineering Systems 2009, November 2009

