Academic research on machine learning-based malware classification appears to leave very little room for improvement, boasting F1 performance figures of up to 0.99. Is the problem solved? In this talk, we argue that there is an endemic issue of inflated results due to two pervasive sources of experimental bias: spatial bias, caused by distributions of training and testing data not representative of a real-world deployment, and temporal bias, caused by incorrect splits of training and testing sets (e.g., in cross-validation) leading to impossible configurations. To overcome this issue, we propose a set of space and time constraints for experiment design. Furthermore, we introduce a new metric that summarizes the performance of a classifier over time, i.e., its expected robustness in a real-world setting. Finally, we present an algorithm to tune the performance of a given classifier. We have implemented our solutions in TESSERACT, an open source evaluation framework that allows a fair comparison of malware classifiers in a realistic setting. We used TESSERACT to evaluate two well-known malware classifiers from the literature on a dataset of 129K applications, demonstrating the distortion of results due to experimental bias and showcasing significant improvements from tuning.
The main results of this talk are published in: - Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, Lorenzo Cavallaro . TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time. USENIX Security Symposium, 2019.
Fabio Pierazzi is currently a Lecturer (Assistant Professor) in Computer Science at King's College London, where he is also a member of the Cybersecurity (CYS) group. His research expertise is on statistical methods for malware analysis and intrusion detection, with a particular emphasis on settings in which attackers adapt quickly to new defenses (i.e., high non-stationarity). Before joining King’s College London as a Lecturer in Sep 2019, he obtained his Ph.D. in Computer Science in 2017 from University of Modena and Reggio Emilia, Italy, under the supervision of Prof. Michele Colajanni; he spent most of 2016 as a Visiting Researcher at the University of Maryland, College Park, USA, under the supervision of Prof. V.S. Subrahmanian; between Oct 2017 and Sep 2019, he has been a Post-Doctoral Researcher in the Systems Security Research Lab (S2Lab), first at Royal Holloway University of London and then at King’s College London, under the supervision of Prof. Johannes Kinder and Prof. Lorenzo Cavallaro. Home page: https://fabio.pierazzi.com