Scalable network traffic analysis on cloud computing platform

Sabah Mohammed Alzahrani, Tennessee State University


Understanding and quantifying network performance usually requires the analysis of a large volume of network traffic. Current network analysis does not scale well during the analysis of Terabyte or Petabyte traffic. Due to the emergence of a distributed computing platform, Spark facilitates the analysis of a large volume of data. However, there is no seamless method to analyze the vast volume of network traffic. In this study, the network traffic analysis framework on Amazon cloud computing environment has been developed. Different network scenarios were created in CloudSim to analyze the generated network traffic using scalable clustering machine learning techniques. The proposed system has two major subsystems; (i) data collection: the generation of different network traffic corresponding to different network topologies; and (ii) data analysis and distributed processing: Amazon EC2 was used for running the Spark program with different machine cores. The model took place on Spark MLlib and used three different clustering algorithms. The scalable K-means++ (K-means::) clustering algorithm was selected based in its speed and scalability for testing the system. It was faster than K-means and than GMM. The time for the analysis of K-means:: is 30.10% less than K-means and 75.18% less than for GMM algorithm for 150 million-line records of data. These findings allow the application of this technology for more complex problems with vast network traffic and large network topologies.

Subject Area

Computer Engineering|Electrical engineering|Computer science

Recommended Citation

Sabah Mohammed Alzahrani, "Scalable network traffic analysis on cloud computing platform" (2015). ETD Collection for Tennessee State University. Paper AAI1599443.