Methodology for detecting anomalies in cyber attack assessment data using Random Forest and Gradient Boosting in machine learning
https://doi.org/10.21822/2073-6185-2024-51-3-72-85
Abstract
Objective. The research aims to detect anomalies in data using machine learning models, in particular random forest and gradient boosting, to analyze network activity and detect cyberattacks. The research topic is relevant as cyber attacks are becoming increasingly complex and sophisticated. Developing effective methods for detecting anomalies and protecting against cyber threats is becoming a priority for organizations. Method. The research is carried out using two machine learning algorithms: Random Forest and gradient boosting. The process includes analyzing important metrics, visualizing solutions, evaluating the performance of each model, and analyzing error matrices for attack categories. Result. The Random Forest model showed an accuracy of about 94% when using the top 10 important features. The graph provides insight into how the model makes decisions based on features. The Xgboost gradient boosting model achieved high accuracy and reliability of results. The report provides a description of the model's performance for each category. Conclusion. The work done is the result of a comprehensive analysis of a machine learning model designed to detect cyberattacks. It includes several key steps and methods that allow us to evaluate the effectiveness of the model, identify important features, and analyze performance for various attacks.
About the Authors
A. S. KechedzhievRussian Federation
Alexander S. Kechedzhiev, Master's Student, Department of Computer Systems and Information Security
1 Gagarina Square, Rostov-on-Don, 344002
O. L. Tsvetkova
Russian Federation
Olga L. Tsvetkova, Cand. Sci. (Eng), Assoc. Prof., Assoc. Prof., Department of Computer Systems and Information Security
1 Gagarina Square, Rostov-on-Don, 344002
A. I. Dubrovina
Russian Federation
Angelina I. Dubrovina, Assistant, Department of Computer Systems and Information Security
1 Gagarina Square, Rostov-on-Don, 344002
References
1. Gaiduk, K. A. On the implementation of algorithms for identifying internal threats using machine learning / K. A. Gaiduk, A. Yu. Iskhakov. Bulletin of SibSUTI. 2022;16(4): 80-95. DOI 10.55648/1998-6920-2022-16-4-80-95. - EDN SGBSIH. ( In Russ)
2. Savitsky D. E., M. E. Dunaev, K. S. Zaitsev. Detecting anomalies in real-time streaming data processing. International Journal of Open Information Technologies. 2022;10(6):70-76. - EDN IGAWAO. ( In Russ)
3. Tokarev D.M., Gorodnichev M. G. Machine Learning-Based Anomaly Detection Using a Combination of K-MEAN and SMO Algorithms. Telecommunications and Information Technologies. 2023;10(1):5-13. - EDN ILCJZP. ( In Russ)
4. Melnik, M. V. Detection of Anomalous Behavior of Users and Entities in Container Systems Based on Machine Learning Methods / M. V. Melnik, I. V. Kotenko. Information Security of Russian Regions (IBRR-2023): XIII St. Petersburg Interregional Conference. Conference Proceedings, St. Petersburg, October 25-27, 2023. - St. Petersburg: St. Petersburg Society for Informatics, Computer Engineering, Communications and Control Systems, 2023; 97-98. - EDN QOBTZP. ( In Russ)
5. Terskikh M.G., E.M. Tishina. Detecting abnormal user behavior in Windows security event logs using machine learning algorithms. Theory and practice of modern science. 2018; 5(35):821-839. - EDN UYMTHC( In Russ)
6. Safin, A.R. Detecting abnormal network traffic behavior based on statistical methods using machine learning / A.R. Safin // Information security and personal data protection. Problems and solutions: Proceedings of the XIII Interregional Scientific and Practical Conference, Bryansk, April 30, 2021. - Bryansk: Bryansk State Technical University, 2021;228-231. - EDN UDRGDA. ( In Russ)
7. Dynamic user authentication based on the analysis of work with a computer mouse / A.V. Bereznik, M.A. Kazachuk, I.V. Mashechkin [et al.] Bulletin of Moscow University. Series 15: Computational Mathematics and Cybernetics. 2021;4:3-16. - EDN XIQNIZ. ( In Russ)
8. Popova I.A. Detecting anomalies in a dataset using unsupervised machine learning algorithms Isolation Forest and Local Outlier Factor StudNet. 2020;3(12):1460-1470. - EDN XILRBX. ( In Russ)
9. Asuncion, D. Newman. UCI Machine Learning Repository, 2007.
10. M.M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Density-based local outlier identification. ACM SIGMOD Record, 29(2):93–104, 2000.
11. T. Shi and S. Horvath. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1):118–138, March 2006.
Review
For citations:
Kechedzhiev A.S., Tsvetkova O.L., Dubrovina A.I. Methodology for detecting anomalies in cyber attack assessment data using Random Forest and Gradient Boosting in machine learning. Herald of Dagestan State Technical University. Technical Sciences. 2024;51(3):72-85. (In Russ.) https://doi.org/10.21822/2073-6185-2024-51-3-72-85