TY - JOUR
T1 - Alternative Relative Discrimination Criterion Feature Ranking Technique for Text Classification
AU - Alshalif, Sarah Abdulkarem
AU - Senan, Norhalina
AU - Saeed, Faisal
AU - Ghaban, Wad
AU - Ibrahim, Noraini
AU - Aamir, Muhammad
AU - Sharif, Wareesa
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023/7/12
Y1 - 2023/7/12
N2 - The use of text data with high dimensionality affects classifier performance. Therefore, efficient feature selection (FS) is necessary to reduce dimensionality. In text classification challenges, FS algorithms based on a ranking approach are employed to improve the classification performance. To rank terms, most feature ranking algorithms, such as the Relative Discrimination Criterion (RDC) and Improved Relative Discrimination Criterion (IRDC), use document frequency (DF) and term frequency (TF). TF accepts the actual values of a term with frequently and rarely occurring terms used in existing feature ranking algorithms. However, these algorithms focus on the number of terms in a document rather than the number of terms in the category. In this research, an alternative method to RDC, called Alternative Relative Discrimination Criterion (ARDC) was proposed, which aims to improve the accuracy and effectiveness of RDC feature ranking. Specifically, ARDC is designed to identify terms commonly occurring in the positive class. The results obtained were compared to the existing RDC methods, which are RDC and IRDC, and standard benchmarking functions such as Information Gain (IG), Pearson Correlation Coefficient (PCC), and ReliefF. The experimental results reveal that using the suggested ARDC on the Reuters21578, 20newsgroup, and TDT2 datasets provides better performance in terms of precision, recall, f-measure, and accuracy when employing well-known classifiers such as multinomial naïve Bayes (MNB), Support Vector Machine (SVM), Multilayer perceptron (MLP), k-nearest neighbor (KNN), and decision tree (DT). Another experiment was performed to validate the proposed technique, which aims to showcase the novelty of the ARDC approach. The experiment utilized the 20newsgroup dataset and employed the Relevant-Based Feature Ranking (RBFR) technique. Naïve Bayes (NB), Random Forest (RF) and Logistic Regression (LR) classifiers were used in this experiment to demonstrate the effectiveness of the suggested ARDC.
AB - The use of text data with high dimensionality affects classifier performance. Therefore, efficient feature selection (FS) is necessary to reduce dimensionality. In text classification challenges, FS algorithms based on a ranking approach are employed to improve the classification performance. To rank terms, most feature ranking algorithms, such as the Relative Discrimination Criterion (RDC) and Improved Relative Discrimination Criterion (IRDC), use document frequency (DF) and term frequency (TF). TF accepts the actual values of a term with frequently and rarely occurring terms used in existing feature ranking algorithms. However, these algorithms focus on the number of terms in a document rather than the number of terms in the category. In this research, an alternative method to RDC, called Alternative Relative Discrimination Criterion (ARDC) was proposed, which aims to improve the accuracy and effectiveness of RDC feature ranking. Specifically, ARDC is designed to identify terms commonly occurring in the positive class. The results obtained were compared to the existing RDC methods, which are RDC and IRDC, and standard benchmarking functions such as Information Gain (IG), Pearson Correlation Coefficient (PCC), and ReliefF. The experimental results reveal that using the suggested ARDC on the Reuters21578, 20newsgroup, and TDT2 datasets provides better performance in terms of precision, recall, f-measure, and accuracy when employing well-known classifiers such as multinomial naïve Bayes (MNB), Support Vector Machine (SVM), Multilayer perceptron (MLP), k-nearest neighbor (KNN), and decision tree (DT). Another experiment was performed to validate the proposed technique, which aims to showcase the novelty of the ARDC approach. The experiment utilized the 20newsgroup dataset and employed the Relevant-Based Feature Ranking (RBFR) technique. Naïve Bayes (NB), Random Forest (RF) and Logistic Regression (LR) classifiers were used in this experiment to demonstrate the effectiveness of the suggested ARDC.
KW - Dimensionality Reduction
KW - Text Classification
KW - Feature Selection
KW - Feature Ranking
KW - Relative Discrimination Criterion
UR - https://www.open-access.bcu.ac.uk/14612/
U2 - 10.1109/ACCESS.2023.3294563
DO - 10.1109/ACCESS.2023.3294563
M3 - Article
AN - SCOPUS:85164742088
SN - 2169-3536
VL - 11
SP - 71739
EP - 71755
JO - IEEE Access
JF - IEEE Access
ER -