Comparison of Multiple Feature Selection Techniques for Machine Learning-Based Detection of IoT Attacks

The Internet of Things (IoT) has become increasingly practical in applications such as smart homes, autonomous vehicles, and environmental monitoring. However, this rapid expansion has led to significant cybersecurity threats. Detecting these threats is critical, and while machine learning techniques are valuable, they struggle with high-dimensional data. Feature selection helps by reducing computational costs while maintaining model generalization. Selecting the most effective feature selection method is a crucial task. This research addresses this gap by testing five feature selection methods: Random Forest (RF), Recursive Feature Elimination (RFE), Logistic Regression (LR), XGBoost Regression (XGBoost), and Information Gain (IG) using the CIC-IoT 2023 dataset. It evaluates these methods when being used with five machine learning models: Decision Tree (DT), Random Forest (RF), k-Nearest Neighbors (k-NN), Gradient Boosting (GB), and Multi-layer Perceptron (MLP) using metrics like accuracy, precision, recall, and F1-score across three datasets. The results show that RFE, especially with the RF model, achieves the highest accuracy (99.57%) with 30 features. RF is the most stable, with accuracy from 83% to 99.56%. Additionally, the 5-feature scheme is best for implementing IDS on resource-limited IoT devices, with RFE paired with the k-NN model being the optimal combination.
The Internet of Things (IoT) has become increasingly practical in applications such as smart homes, autonomous vehicles, and environmental monitoring. However, this rapid expansion has led to significant cybersecurity threats. Detecting these threats is critical, and while machine learning techniques are valuable, they struggle with high-dimensional data. Feature selection helps by reducing computational costs while maintaining model generalization. Selecting the most effective feature selection method is a crucial task. This research addresses this gap by testing five feature selection methods: Random Forest (RF), Recursive Feature Elimination (RFE), Logistic Regression (LR), XGBoost Regression (XGBoost), and Information Gain (IG) using the CIC-IoT 2023 dataset. It evaluates these methods when being used with five machine learning models: Decision Tree (DT), Random Forest (RF), k-Nearest Neighbors (k-NN), Gradient Boosting (GB), and Multi-layer Perceptron (MLP) using metrics like accuracy, precision, recall, and F1-score across three datasets. The results show that RFE, especially with the RF model, achieves the highest accuracy (99.57%) with 30 features. RF is the most stable, with accuracy from 83% to 99.56%. Additionally, the 5-feature scheme is best for implementing IDS on resource-limited IoT devices, with RFE paired with the k-NN model being the optimal combination.

Keywords

IoT , Anomaly Detection , IDS , Machine Learning , Feature Selection , IoT , Anomaly Detection , IDS , Machine Learning , Feature Selection

Citation

ARES '24: Proceedings of the 19th International Conference on Availability, Reliability and Security. 2024, p. 1-10.
https://dl.acm.org/doi/10.1145/3664476.3670440

Document type

Peer-reviewed

Document version

Published version

Language of document

en

DOI

10.1145/3664476.3670440

URI

http://hdl.handle.net/11012/250054

Collections

Ústav telekomunikací

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution 4.0 International

Citace PRO

Full item page

Comparison of Multiple Feature Selection Techniques for Machine Learning-Based Detection of IoT Attacks

Files

Date

Authors

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

Altmetrics

Abstract

Description

Keywords

Citation

Document type

Document version

Date of access to the full text

Language of document

Study field

Comittee

Date of acceptance

Defence

Result of defence

DOI

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Citace PRO