Deep Behavioral Analysis of Machine Learning Algorithms Against Data Poisoning

Anum Paracha, Junaid Arshad, Mohamed Farah, Khalid N. Ismail

Research output: Contribution to journalArticlepeer-review

Abstract

Poisoning attacks represent one of the most common and practical adversarial attempts on machine learning systems. In this paper, we have conducted a deep behavioural analysis of six machine learning (ML) algorithms, analyzing poisoning impact and correlation between poisoning levels and classification accuracy. Adopting an empirical approach, we highlight practical feasibility of data poisoning, comprehensively analyzing factors of individual algorithms affected by poisoning. We used public datasets (UNSW-NB15, BotDroid, CTU13, and CIC-IDS-2017) and varying poisoning levels (5% - 25%) to conduct rigorous analysis across different settings. In particular, we analyzed the accuracy, precision, recall, f1-score, false positive rate and ROC of the chosen algorithms. Further, we conducted a sensitivity analysis of each algorithm to understand the impact of poisoning on its performance and characteristics underpinning its susceptibility against data poisoning attacks. Our analysis shows that, for 15% poisoning of UNSW NB15 dataset, the accuracy of Decision Tree (DT) decreases by 15.04% with an increase of 14.85% in false positive rate. Further, with 25% poisoning of BotDroid dataset, accuracy of K-nearest neighbours (KNN) decreases by 15.48%. On the other hand, Random Forest (RF) is comparatively more resilient against poisoned training data with a decrease of 8.5% in accuracy with 15% poisoning of UNSW-NB15 dataset and 5.2% for BotDroid dataset. Our results highlight that 10%-15% of dataset poisoning is the most effective poisoning rate, significantly disrupting classifiers without introducing overfitting, whereas 25% is detectable because of high performance degradation and overfitting algorithms. Our analysis also helps understand how asymmetric features and noise affect the impact of data poisoning on machine learning classifiers. Our experimentation and analysis are publicly available at: https://github.com/AnumAtique/Behavioural-Analaysis-of Poisoned-ML/
Original languageEnglish
JournalInternational Journal of Information Security
DOIs
Publication statusPublished (VoR) - 25 Nov 2024

Fingerprint

Dive into the research topics of 'Deep Behavioral Analysis of Machine Learning Algorithms Against Data Poisoning'. Together they form a unique fingerprint.

Cite this