The Importance of Stop Word Removal on Recall Values in Text Categorization



Given a data set and a learning task such as classification, there are two prime motives for executing some kind of data set reduction. On one hand there is the possible algorithm performance improvement. On the other hand the decrease in the overall size of the data set can bring advantages in storage space used and time spent computing. Our purpose is to determine the importance of several basic reduction techniques on Support Vector Machines, by comparing their relative performance improvement when applied on the standard REUTERS-21578 benchmark.

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization


IJCNN 2003, July 2003

Cited by

Year 2011 : 4 citations

 Ayral, H., Yavuz, S.
An automated domain specific stop word generation method for natural language text classification
(2011) INISTA 2011 - 2011 International Symposium on INnovations in Intelligent SysTems and Applications, art. no. 5946149, pp. 500-503.

 Yao, Z., Ze-Wen, C.
Research on the construction and filter method of stop-word list in text preprocessing
(2011) Proceedings - 4th International Conference on Intelligent Computation Technology and Automation, ICICTA 2011, 1, art. no. 5750595, pp. 217-221.

 Abdelmoneim, Dareen, "Semantic deontic modeling and text
classification for supporting automated environmental compliance
checking in construction" , MsC Thesis, University of Illinois at Urbana-Champaign, USA, 2011

 Agrawal, N., "Auto complete using graph mining: A different approach",
Proceedings of IEEE Southeastcon Conference, 2011 , pp. 268 - 271,
17-20 March, 2011, doi:10.1109/SECON.2011.5752947

Year 2010 : 3 citations

 Sentiment text classification of customers reviews on the Web based on SVM, Huosong Xia; Min Tao; Yi Wang, Sixth International Conference on Natural Computation (ICNC), 2010, pp. 3633 - 3637.

 Characteristic pattern discovery in videos, Mihir Jain , C. V. Jawahar, ICVGIP '10 Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing, 2010

 Ditch the Smileys: Customizing a Stopword List for Email-based Data, Dinesh Rathi, Michael B. Twidale, Canadian Association for Infrmation Science Conference, 2010

Year 2009 : 2 citations

 Evaluation of stop word list in Mongolian , Bao, Y., Yang, G., Jin, W., Journal of Information and Computational Science 6 (3), pp. 1139-1145, 2009

 Prevention of Spyware by Runtime Classification of End User License Agreements, Muhammad Usman Rashid, MSc. Thesis, Blekinge Institute of Technology , Sweden, 2009.

Year 2008 : 1 citations

 Automatic classifications of malay proverbs using Naïve Bayesian Algorithm Noah, S.A., Ismail, F. Information Technology Journal 7 (7), pp. 1016-1022, 2008

Year 2007 : 2 citations

 CINDI robot: An intelligent web crawler based on multi-level inspection, Chen, R., Desai, B.C., Zhou, C, Proceedings of the International Database Engineering and Applications Symposium, IDEAS, art. no. 4318093, pp. 93-101, 2007.


Year 2005 : 1 citations

 Automatic selection of Chinese stoplist
Show Abstract Gu, Y.-J., Fan, X.-Z., Wang, J.-H., Wang, T., Huang, W.-J. 2005 Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology 25 (4), pp. 337-340