Detection of phishing portals through machine learning algorithms
https://doi.org/10.21822/2073-6185-2024-51-3-154-162
Abstract
Objective Analysis and practical implementation of the phishing portal detection functionality through machine learning algorithms. Method. Systematization of disparate information, analysis of the field, description of available developments are the main methods that were used in the study. The work is divided into three large sub-blocks. The first one analyzes the concept of machine learning, describes the main ways to correctly interpret the information entered, indicates the most popular techniques and databases. In the second part of the work, an analysis of artificial neural networks is carried out. In particular, their subspecies are shown with a description of the implementation features, and a comparison with living neurons is carried out. In the third part, the practical implementation of the two techniques and their comparison are carried out, recommendations are given regarding their use in detecting phishing portals. Result. The paper investigates the methods of analyzing phishing portals. The analysis showed that it is most rational to use a random forest, because it provides quality according to the precession, recall, F1-score, 98% metrics with a significant number of parametric values entered. Conclusions. When implementing various search methodologies for phishing portals, it is necessary to take into account their decrease in efficiency from the entered parameters. To do this, it is important to conduct preliminary tests. However, the test result can be interpreted in different ways. In particular, the effectiveness of the methods can be improved if you limit the number of input parameters, but at the same time rigidly structured according to one search criterion.
About the Author
E. A. TrushnikovRussian Federation
Evgenij A. Trushnikov, Graduate student
32 Narodnogo Opolcheniya St., Moscow 123423
References
1. Zav'yalov A.N. Internet-moshennichestvo (fishing): problemy protivodejstviya i preduprezhdeniya. Baikal Research Journal. 2022;13(2):20-26 (In Russ)
2. Number of unique phishing sites detected worldwide from 3rd quarter 2013 to 2nd quarter 2023. - URL: https://www.statista.com/statistics/266155/number-of-phishing-domain-names-worldwide/ (data obrashcheniya 21.02.2024).
3. Platonov A.V. Mashinnoe obuchenie : uchebnoe posobie dlya vuzov M.: Izdatel'stvo Yurajt, 2023; 85 (In Russ)
4. Techtarget. Machine learning. - URL: https://searchenterpriseai.techtarget.com/definition/machinelearning-ML (data obrashcheniya 21.12.2022).
5. Google Dataset Search. - URL: https://datasetsearch.research.google.com/ (data obrashcheniya: 25.12.2022)
6. Visual Data Discovery. - URL: https://visualdata.io/discovery (data obrashcheniya: 25.12.2022)
7. Kaggle. - URL: https://www.kaggle.com/ (data obrashcheniya: 25.12.2022)
8. Portal otkrytyh dannyh Rossijskoj Federacii. – URL: https://data.gov.ru/opendata (data obrashcheniya: 25.12.2022)
9. Voronova L.I., Brus V.R., Voronov V.I., Bashirov A.N. Predobrabotka dannyh dlya nejrosetevogo upravleniya: uchebnoe posobie. M.: MTUSI, 2021; 44. (In Russ)
10. Potapkin K.O. Iskusstvennye nejronnye seti. Nejronnaya set' // XLVI Ogaryovskie chteniya : Materialy nauchnoj konferencii: V 3-h chastyah, Saransk, 06–13 dekabrya 2017 goda / Otvetstvennyj za vypusk P.V. Senin. Tom Chast' 1. – Saransk: Nacional'nyj issledovatel'skij Mordovskij gosudarstvennyj universitet im. N.P. Ogaryova, 2018;315-320. (In Russ)
11. Voronova L.I., Voronov V.I. Machine Learning: regressionnye metody intellektual'nogo analiza dannyh : uchebnoe posobie. – M.: Moskovskij tekhnicheskij universitet svyazi i informatiki, 2018; 82. (In Russ)
12. The perceptron: a probabilistic model for information storage and organization in the brain. - URL: https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf (data obrashcheniya: 21.12.2022)
13. Neural networks and physical systems with emergent collective computational abilities. - URL: https://bi.snu.ac.kr/Courses/g-ai09-2/hopfield82.pdf (data obrashcheniya: 21.12.2022)
14. Raznovidnosti nejronnyh setej. - URL: https://tproger.ru/translations/neural-network-zoo-1/ (data obrashcheniya: 21.12.2022)
15. Chto takoe nejronnye seti, chto oni mogut. - URL: https://neural-university.ru/neural-networks-basics (data obrashcheniya: 20.12.2022)
16. Nejronnye seti: raspoznavanie obrazov i izobrazhenij c pomoshch'yu II. - URL: https://center2m.ru/airecognition (data obrashcheniya: 20.12.2022)
17. Markova S.V., Zhigalov K.Yu. Primenenie nejronnoj seti dlya sozdaniya sistemy raspoznavaniya izobrazhenij. Fundamental'nye issledovaniya. 2017;8-16 60-64(In Russ)
18. How a neural network works: algorithms, training, activation functions and losses. - URL: https://neurohive.io/ru/osnovy-data-science/osnovy-nejronnyh-setej-algoritmy-obuchenie-funkcii-aktivaciii-poteri/
19. Xuan C.D. Nguyen H.D., Nikolaevich T.V. Malicious URL detection based on machine learning. International Journal of Advanced Computer Science and Applications. 2020; 11(1):148-153
20. Manjeri A.S., Kaushir R., Ajay M.N.V., Nair P.C. A Machine Learning Approach for Detecting Malicious Websites using URL Features // 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), 12-14 June 2019; 555–561. Coimbatore, India.
21. Sarasjati W. et al., Comparative Study of Classification Algorithms for Website Phishing Detection on Multiple Datasets. International Seminar on Application for Technology of Information and Communication (iSemantic). Semarang, Indonesia, 2022; 448-452.
22. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. - URL: https://www.sciencedirect.com/science/article/abs/pii/S0020025519300763?via%3Dihub (data obrashcheniya: 21.12.2022)
23. Uchebnik po biblioteke NumPy: uchites' na primerah. - URL: https://pythonist.ru/uchebnik-po-bibliotekenumpy-uchites-na-primerah/ (data obrashcheniya: 27.12.2022)
24. Rukovodstvo po Matplotlib. - URL: https://indico-hlit.jinr.ru/event/151/attachments/340/492/Project_school_Matplotlib_original.pdf (data obrashcheniya: 27.12.2022)
25. Seaborn dlya vizualizacii dannyh v Python. - URL: https://pythonru.com/biblioteki/seaborn-plot (data obrashcheniya: 27.12.2022)
26. Scikit-learn Machine Learning in Python. - URL: https://scikit-learn.org/ (data obrashcheniya: 21.12.2022)
Review
For citations:
Trushnikov E.A. Detection of phishing portals through machine learning algorithms. Herald of Dagestan State Technical University. Technical Sciences. 2024;51(3):154-162. (In Russ.) https://doi.org/10.21822/2073-6185-2024-51-3-154-162