A Support Vector Machine based Framework for Protein Membership Prediction



The support vector machine (SVM) is a key algorithm for learning from biological data and in tasks such as protein membership prediction. Predicting structural information for a protein from its sequence alone is possible, but the extreme complexity of data demands string kernels with a dedicated design like the state-of-the-art profile kernel that exploits a very large feature space. Such a huge representation and the enormous data bases used in proteomics require an effort mirrored in an increased processing time that must be reduced to an acceptable amount. Considering the present computation paradigm the implementation of such systems must take advantage of parallelization and concurrency. In this paper a special machine learning architecture based on SVM binary models and a neural network is purposed to handle the very large multiclass problem of protein superfamily prediction parallelized through a multi-agent strategy that uses the software development framework JADE (Java Agent DEvelopment Framework) to reduce the total processing time when getting a prediction for a new query protein. The efficiency of the algorithm and the advantages of the parallelization are shown.


kernels machines;Biomedical data mining


International Symposium on Computational Intelligence for Engineering Systems 2009, November 2009

Cited by

Year 2016 : 1 citations

 Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words
D Santoni, G Felici, D Vergni - Journal of theoretical biology, 2016 - Elsevier
Abstract Casual mutations and natural selection have driven the evolution of protein amino
acid sequences that we observe at present in nature. The question about which is the
dominant force of proteins evolution is still lacking of an unambiguous answer. Casual ...

Year 2015 : 1 citations

 Machine Learning for Adaptive Many-Core Machines-A Practical Approach
N Lopes, B Ribeiro - 2015 - Springer
Today the increasing complexity, performance requirements and cost of current (and future)
applications in society is transversal to a wide range of activities, from science to business
and industry. In particular, this is a fundamental issue in the Machine Learning (ML) area, ...

Year 2013 : 1 citations

 Computational intelligence techniques in bioinformatics
AE Hassanien, ET Al-Shammari, NI Ghali - Computational biology and …, 2013 - Elsevier
Abstract Computational intelligence (CI) is a well-established paradigm with current systems
having many of the characteristics of biological computers and capable of performing a
variety of tasks that are difficult to do using conventional techniques. It is a methodology ...