Dimension Reduction and Classifier-Based Feature Selection for Oversampled Gene Expression Data and Cancer Classification

Olutomilayo Olayemi Petinrin (Corresponding / Lead Author), Faisal Saeed (Corresponding / Lead Author), Naomie Salim (Corresponding / Lead Author), Muhammad Toseef (Corresponding / Lead Author), Zhe Liu (Corresponding / Lead Author), Ibukun Omotayo Muyide (Corresponding / Lead Author)

    Research output: Contribution to journalArticlepeer-review

    1 Citation (SciVal)

    Abstract

    Gene expression data are usually known for having a large number of features. Usually, some of these features are irrelevant and redundant. However, in some cases, all features, despite being numerous, show high importance and contribute to the data analysis. In a similar fashion, gene expression data sometimes have limited instances with a high rate of imbalance among the classes. This can limit the exposure of a classification model to instances of different categories, thereby influencing the performance of the model. In this study, we proposed a cancer detection approach that utilized data preprocessing techniques such as oversampling, feature selection, and classification models. The study used SVMSMOTE for the oversampling of the six examined datasets. Further, we examined different techniques for feature selection using dimension reduction methods and classifier-based feature ranking and selection. We trained six machine learning algorithms, using repeated 5-fold cross-validation on different microarray datasets. The performance of the algorithms differed based on the data and feature reduction technique used.
    Original languageEnglish
    Article number1940
    Pages (from-to)1
    Number of pages13
    JournalProcesses
    Volume11
    Issue number7
    DOIs
    Publication statusPublished (VoR) - 27 Jun 2023

    Funding

    The authors would like to thank the Research Management Center at Universiti Teknologi Malaysia for funding this research using (Vot No: Q.J130000.21A6.00P48) and the Ministry of Higher Education, Malaysia (JPT(BKPI)1000/016/018/25(58)) through the Malaysia Big Data Research Excellence Consortium (BiDaREC) (Vot No: R.J130000.7851.4L933), (Vot No: R.J130000.7851.5F568), (Vot No: R.J130000.7851.4L942), (Vot No: R.J130000.7851.4L938), and (Vot No: R.J130000.7851.4L936). We are also grateful to (Project No: KHAS-KKP/2021/FTMK/C00003) and (Project No: KKP002-2021) for their financial support of this research. This research was funded by the Research Management Center at Universiti Teknologi Malaysia (Vot No: Q.J130000.21A6.00P48) and the Ministry of Higher Education, Malaysia (JPT(BKPI)1000/016/018/25(58)) through the Malaysia Big Data Research Excellence Consortium (BiDaREC) (Vot No: R.J130000.7851.4L933), (Vot No: R.J130000.7851.5F568), (Vot No: R.J130000.7851.4L942), (Vot No: R.J130000.7851.4L938), and (Vot No: R.J130000.7851.4L936). We are also grateful to (Project No: KHAS-KKP/2021/FTMK/C00003) and (Project No: KKP002-2021) for their financial support of this research.

    FundersFunder number
    JPTR.J130000.7851.4L938, R.J130000.7851.4L936, KKP002-2021, R.J130000.7851.4L933, KHAS-KKP/2021/FTMK/C00003, R.J130000.7851.4L942, R.J130000.7851.5F568, 1000/016/018/25
    Ministry of Higher Education, Malaysia
    Universiti Teknologi Malaysia
    Research Management Centre, Universiti Teknologi MalaysiaQ.J130000.21A6.00P48

      Keywords

      • cancer classification
      • gene expression
      • machine learning
      • microarray data
      • sampling methods

      Fingerprint

      Dive into the research topics of 'Dimension Reduction and Classifier-Based Feature Selection for Oversampled Gene Expression Data and Cancer Classification'. Together they form a unique fingerprint.

      Cite this