Methodology for detecting anomalies in cyber attack assessment data using Random Forest and Gradient Boosting in machine learning

A. S. Kechedzhiev; O. L. Tsvetkova; A. I. Dubrovina

doi:10.21822/2073-6185-2024-51-3-72-85

Methodology for detecting anomalies in cyber attack assessment data using Random Forest and Gradient Boosting in machine learning

A. S. Kechedzhiev, O. L. Tsvetkova, A. I. Dubrovina

https://doi.org/10.21822/2073-6185-2024-51-3-72-85

Full Text:

PDF (Rus)

Generate QR code

Abstract

Objective. The research aims to detect anomalies in data using machine learning models, in particular random forest and gradient boosting, to analyze network activity and detect cyberattacks. The research topic is relevant as cyber attacks are becoming increasingly complex and sophisticated. Developing effective methods for detecting anomalies and protecting against cyber threats is becoming a priority for organizations. Method. The research is carried out using two machine learning algorithms: Random Forest and gradient boosting. The process includes analyzing important metrics, visualizing solutions, evaluating the performance of each model, and analyzing error matrices for attack categories. Result. The Random Forest model showed an accuracy of about 94% when using the top 10 important features. The graph provides insight into how the model makes decisions based on features. The Xgboost gradient boosting model achieved high accuracy and reliability of results. The report provides a description of the model's performance for each category. Conclusion. The work done is the result of a comprehensive analysis of a machine learning model designed to detect cyberattacks. It includes several key steps and methods that allow us to evaluate the effectiveness of the model, identify important features, and analyze performance for various attacks.

Keywords

data anomaly, machine learning, Random Forest algorithm, gradient boosting model

About the Authors

A. S. Kechedzhiev

Don State Technical University
Russian Federation

Alexander S. Kechedzhiev, Master's Student, Department of Computer Systems and Information Security

1 Gagarina Square, Rostov-on-Don, 344002

O. L. Tsvetkova

Don State Technical University
Russian Federation

Olga L. Tsvetkova, Cand. Sci. (Eng), Assoc. Prof., Assoc. Prof., Department of Computer Systems and Information Security

1 Gagarina Square, Rostov-on-Don, 344002

A. I. Dubrovina

Don State Technical University
Russian Federation

Angelina I. Dubrovina, Assistant, Department of Computer Systems and Information Security

1 Gagarina Square, Rostov-on-Don, 344002

References

1. Gaiduk, K. A. On the implementation of algorithms for identifying internal threats using machine learning / K. A. Gaiduk, A. Yu. Iskhakov. Bulletin of SibSUTI. 2022;16(4): 80-95. DOI 10.55648/1998-6920-2022-16-4-80-95. - EDN SGBSIH. ( In Russ)

2. Savitsky D. E., M. E. Dunaev, K. S. Zaitsev. Detecting anomalies in real-time streaming data processing. International Journal of Open Information Technologies. 2022;10(6):70-76. - EDN IGAWAO. ( In Russ)

3. Tokarev D.M., Gorodnichev M. G. Machine Learning-Based Anomaly Detection Using a Combination of K-MEAN and SMO Algorithms. Telecommunications and Information Technologies. 2023;10(1):5-13. - EDN ILCJZP. ( In Russ)

4. Melnik, M. V. Detection of Anomalous Behavior of Users and Entities in Container Systems Based on Machine Learning Methods / M. V. Melnik, I. V. Kotenko. Information Security of Russian Regions (IBRR-2023): XIII St. Petersburg Interregional Conference. Conference Proceedings, St. Petersburg, October 25-27, 2023. - St. Petersburg: St. Petersburg Society for Informatics, Computer Engineering, Communications and Control Systems, 2023; 97-98. - EDN QOBTZP. ( In Russ)

5. Terskikh M.G., E.M. Tishina. Detecting abnormal user behavior in Windows security event logs using machine learning algorithms. Theory and practice of modern science. 2018; 5(35):821-839. - EDN UYMTHC( In Russ)

6. Safin, A.R. Detecting abnormal network traffic behavior based on statistical methods using machine learning / A.R. Safin // Information security and personal data protection. Problems and solutions: Proceedings of the XIII Interregional Scientific and Practical Conference, Bryansk, April 30, 2021. - Bryansk: Bryansk State Technical University, 2021;228-231. - EDN UDRGDA. ( In Russ)

7. Dynamic user authentication based on the analysis of work with a computer mouse / A.V. Bereznik, M.A. Kazachuk, I.V. Mashechkin [et al.] Bulletin of Moscow University. Series 15: Computational Mathematics and Cybernetics. 2021;4:3-16. - EDN XIQNIZ. ( In Russ)

8. Popova I.A. Detecting anomalies in a dataset using unsupervised machine learning algorithms Isolation Forest and Local Outlier Factor StudNet. 2020;3(12):1460-1470. - EDN XILRBX. ( In Russ)

9. Asuncion, D. Newman. UCI Machine Learning Repository, 2007.

10. M.M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Density-based local outlier identification. ACM SIGMOD Record, 29(2):93–104, 2000.

11. T. Shi and S. Horvath. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1):118–138, March 2006.

Review

For citations:

Kechedzhiev A.S., Tsvetkova O.L., Dubrovina A.I. Methodology for detecting anomalies in cyber attack assessment data using Random Forest and Gradient Boosting in machine learning. Herald of Dagestan State Technical University. Technical Sciences. 2024;51(3):72-85. (In Russ.) https://doi.org/10.21822/2073-6185-2024-51-3-72-85

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2073-6185 (Print)
ISSN 2542-095X (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Herald of Dagestan State Technical University. Technical Sciences

Methodology for detecting anomalies in cyber attack assessment data using Random Forest and Gradient Boosting in machine learning

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy