Feature Selection: An Important Issue in Text Categorization

Text categorization is a problem of assigning a document into predefined classes. Feature selection is one of the important issues in text categorization. Wide variety of feature selection methods exist for text categorization like Information Gain (IG), Document Frequency (DF),Term Strength (TS), Mutual Information (MI) etc. Feature selection methods can improve the efficiency and performance of text categorization. This paper reports a controlled study on a large number of feature selection techniques for text classification. We also discuss some variation and combinations of these feature selection methods.

Refer attachemnt for full article