# Combining Active Learning and Relevance Vector Machines for Text Classification

### Authors

### Abstract

Relevance Vector Machines (RVM) have proven successfulin many learning tasks. However, in large applications,

they scale poorly. In many settings there is a large

amount of unlabeled data which could be actively chosen

by a learner and integrated in the learning procedure. The

idea is to improve performance meanwhile reducing costs

from data categorization.

In this paper we propose an Active Learning RVM

method based on the kernel trick. The underpinning idea

is to define a working space between the Relevance Vectors

(RV) initially obtained in a small labeled data set and

the new unlabeled examples, where the most informative instances

are chosen. By using kernel distance metrics, such

a space can be defined and more informative examples can

be added to the training set, increasing performance even

though the problem dimension is not significantly affected.

We detail the proposed method giving illustrative examples

in the Reuters-21578 benchmark. Results show performance

improvement and scalability.

### Subject

Active learning;Text classification;RVM### Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization### Conference

IEEE ICMLA 2007, December 2007### Cited by

#### Year 2011 : 2 citations

Fully Bayesian analysis of the relevance vector machine with an extended hierarchical prior structure, E Fokoué, D Sun… - Statistical Methodology, 2011 - Elsevier

Fokoué, E.a , Goel, P.b

An optimal experimental design perspective on radial basis function regression

(2011) Communications in Statistics - Theory and Methods, 40 (7), pp. 1184-1195.

#### Year 2010 : 1 citations

An optimal experimental design perspective on redial basis function regression, E Fokoue, John D. Hromi Center for Quality and Applied Statistics (KGCOE), 2010 - ritdml.rit.edu