Machine Learning for Adaptive Multi-Core Machines



Today, the increasing complexity, performance requirements and cost of current (and future) applications in society is transversal to a wide range of activities, from science to industry. The scale of the data from Web growth and advances in sensor data collection technology have been rapidly increasing the magnitude and complexity of tasks that Machine Learning (ML) algorithms have to solve. This growth is driving the need to extend the applicability of existing ML algorithms to larger datasets and to devise parallel algorithms that scale well with the volume of data or, in other words, can handle “Big Data”. In this Thesis, we partly contribute to solving this problem, by making use of two complementary components: a body of novel ML algorithms and a set of high-performance ML parallel implementations for adaptive multi-core machines. In the first component, a new adaptive step size technique that enhances the convergence of Restricted Boltzmann Machines (RBMs), thereby effectively decreasing the training time of Deep Belief Networks (DBNs), is presented. Also, a novel Semi-Supervised Non-Negative Matrix Factorization (SSNMF) algorithm, aiming at extracting the most discriminating characteristics of each class, while reducing substantially the overall time required for generating the models, is proposed. In addition, a novel Incremental Hypersphere Classifier (IHC) with builtin multi-class support, which is able to accommodate memory and computational restrictions while providing good classification performance, is presented. This highly-scalable algorithm can update models and classify new data in real-time as well as handle concept drift scenarios. Moreover, since it keeps the samples that are near the decision frontier while removing noisy and less relevant ones, it can select a representative subset of the data for applying more sophisticated algorithms in a fraction of the time required for the complete dataset. A learning framework (IHC-SVM), encompassing the IHC and Support Vector Machine (SVM) algorithms is validated in a real-world case study of protein membership prediction. Overall the resulting system proved to be able to excel the baseline SVM (with an F-measure of 96.39%) using only a subset of the data (ca. 50%) and demonstrated its capacity to deal with the everyday dynamic changes of real-world biological databases. In another direction, and motivated by the need to deal with missing data often occurring in large-scale data, a novel solution, designated by Neural Selective Input Model (NSIM), is proposed. The method empowers Neural Networks (NNs) with the ability to handle Missing Values (MVs) and excels single imputation techniques while offering better or similar classification performance than the state-of-the-art multiple imputation methods. With the new methodology we have successfully addressed a real-world case study of bankruptcy prediction in a large dataset of French companies, with results (F-measure of 95.70%) that are superior to previous approaches. The backbone of the second component of this Thesis is a Graphics Processing Unit (GPU) computational framework, named GPU Machine Learning Library (GPUMLib), which aims at providing the building blocks for developing high performance GPU parallel ML software, promote cooperation within the field and contribute to the development of innovative applications. The rationale consists of taking advantage of the GPU high-throughput parallel architecture to expand the scalability of supervised, semi-supervised and unsupervised ML algorithms. Since its release, GPUMLib, now with over 2, 000 downloads, has benefited researchers worldwide. New GPU parallel implementations of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) supervised algorithms, integrating the NSIM, are presented, providing significant speedups (up to 180×). In particular, these implementations played an important role for the detection of Ventricular Arrhythmias (VAs) (with a sensitivity of 98.07%) that improved previous work, by reducing the computation time from weeks to hours. In this line, an Autonomous Training System (ATS) is designed to automatically find GPU high-quality solutions. In the unsupervised verge, a GPU parallel implementation of the CD–k algorithm, which boosts considerably the RBMs and DBNs training speed, is presented, achieving speedups up to 46x. Additionally, new GPU parallel implementations of the Non-Negative Matrix Factorization (NMF) algorithm are presented, yielding speedups up to 706x. Both unsupervised implementations are tested in benchmarks and in real datasets. Overall, this Thesis contributes with adaptive multi-core machines for exploring “Big Data”, which – as we hope – will have a positive impact in solving otherwise intractable ML problems.


Machine Learning, GPU computing

PhD Thesis

Machine Learning for Adaptive Multi-Core Machines, January 2014

Cited by

No citations found