TDDEHT: Threat Detection using Distributed Ensembles of Hoeffding Trees on Streaming Cyber Datasets
In the evolving world of technology, massive streams of diverse data from disparate sources are generated incomparably. Recently, more advanced data stream mining (DSM) machine learning approaches have been proposed to efficiently process this emerging dissemination of data. Most of these researches propose the use of a well-known state-of-the-art classifier, Hoeffding Trees, generally focusing on achieving improved accuracy when exceedingly complex drifts are present. However, only a minor few have explored challenges faced in advanced DSM of anomaly-based network Intrusion Detection Systems (IDS), and frequently validate with outdated cyber datasets, despite the common relation between anomalies and concept-drift. In this paper, we propose an enhanced methodological distributed Hoeffding Tree ensemble framework IDS built on Spark Streaming. Our approach extends an existing machine learning ensemble based approach by combining diverse Hoeffding Trees and producing evaluation metrics to identify the most efficient type of Hoeffding Tree for detecting cyber-attacks, while providing a framework extensible for additional Linear classifiers. To demonstrate the accuracy of our approach, we evaluate using various up-to-date real-world and synthetic cyber-attack and concept-drift datasets from reputable sources. Our experimental results demonstrate that our approach is properly identifying classifiers, while increasing accuracy and supplemental evaluation metrics, with less resources and the reduction of processing speed.
"TDDEHT: Threat Detection using Distributed Ensembles of Hoeffding Trees on Streaming Cyber Datasets"
ETD Collection for Tennessee State University.