Tjada Nelson, Austin O’Brien and Cherie Noteboom, Dakota State University, South Dakota
With a text mining and bibliometrics approach, this study reviews the literature on the evolution of malware classification using machine learning. This work takes literature from 2008 to 2022 on the subject of using machine learning for malware classification to understand the impact of this technology on malware classification. Throughout this study, we seek to answer three main research questions: RQ1: Is the application of machine learning for malware classification growing? RQ2: What is the most common machine-learning application for malware classification? RQ3: What are the outcomes of the most common machine learning applications? The analysis of 2186 articles resulting from a data collection process from peerreviewed databases shows the trajectory of the application of this technology on malware classification as well as trends in both the machine learning and malware classification fields of study. This study performs quantitative and qualitative analysis using statistical and N-gram analysis techniques and a formal literature review to answer the proposed research questions. The research reveals methods such as support vector machines and random forests to be standard machine learning methods for malware classification in efforts to detect maliciousness or categorize malware by family. Machine learning is a highly researched technology with many applications, from malware classification and beyond.
Malware, Malware Classification, Machine Learning.