research interests
My PhD research focused on designing machine learning models to identify cyber attack patterns and to predict imminent cyber threats in order to alleviate their impact. Towards this aim, much of my work is centered on (i) analyzing technical data acquired by network sensors such network telescopes and honeypots, and (ii) finding the relationship between these technical data on the one hand, and social, political and economic events on the other hand.
I am currently interested in artificial intelligence with a particular attention to machine learning, deep learning, reinforcement learning, computer perception and natural language processing.
Journal Publications
Mehdi Zakroum, Jérôme François, Mounir Ghogho, Isabelle Chrisment, "Self-Supervised Latent Representations of Network Flows and Application to Darknet Traffic Classification," IEEE Access, March, 2023.
Abstract. Characterizing network flows is essential for security operators to enhance their awareness about cyber-threats targeting their networks. The automation of network flow characterization with machine learning has received much attention in recent years. To this aim, raw network flows need to be transformed into structured and exploitable data. In this research work, we propose a method to encode raw network flows into robust latent representations exploitable in downstream tasks. First, raw network flows are transformed into graph-structured objects capturing their topological aspects (packet-wise transitional patterns) and features (used protocols, packets’ flags, etc.). Then, using self-supervised learning techniques like Graph Auto-Encoders and Anonymous Walk Embeddings, each network flow graph is encoded into a latent representation that encapsulates both the structure of the graph and the features of its nodes, while minimizing information loss. This results in semantically-rich and robust representation vectors which can be manipulated by machine learning algorithms to perform downstream network-related tasks. To evaluate our network flow embedding models, we use probing flows captured with two /20 network telescopes and labeled using reports originating from different sources. The experimental results show that the proposed network flow embedding method allows for reliable darknet probing activity classification. Furthermore, a comparison between our self-supervised approach and a fully-supervised graph convolutional network shows that, in situations with limited labeled data, the downstream classification model that uses the derived latent representations as inputs outperforms the fully-supervised graph convolutional network. There are many applications of this research work in cybersecurity, such as network flow clustering, attack detection and prediction, malware detection, vulnerability exploit analysis, and inference of attacker’s intentions.
Mehdi Zakroum, Jérôme François, Isabelle Chrisment, Mounir Ghogho, "Monitoring Network Telescopes and Inferring Anomalous Traffic Through the Prediction of Probing Rates," IEEE Transactions on Network and Service Management, June, 2022.
Abstract. Network reconnaissance is the first step preceding a cyber-attack. Hence, monitoring the probing activities is imperative to help security practitioners enhancing their awareness about Internet's large-scale events or peculiar events targeting their network. In this paper, we present a framework for an improved and efficient monitoring of the probing activities targeting network telescopes. Particularly, we model the probing rates which are a good indicator for measuring the cyber-security risk targeting network services. The approach consists of first inferring groups of network ports sharing similar probing characteristics through a new affinity metric capturing both temporal and semantic similarities between ports. Then, sequences of probing rates targeting similar ports are used as inputs to stacked Long Short-Term Memory (LSTM) neural networks to predict probing rates 1 hour and 1 day in advance. Finally, we describe two monitoring indicators that use the prediction models to infer anomalous probing traffic and to raise early threat warnings. We show that LSTM networks can accurately predict probing rates, outperforming the nonstationary autoregressive model, and we demonstrate that the monitoring indicators are efficient in assessing the cyber-security risk related to vulnerability disclosure.
Conference Publications
Mehdi Zakroum, Abdellah Houmz, Mounir Ghogho, Ghita Mezzour, Abdelkader Lahmadi, Jérôme François and Mohammed El Koutbi, Exploratory Data Analysis of a Network Telescope Traffic and Prediction of Port Probing Rates, IEEE Intelligence and Security Informatics, 2018-11-08, Miami, FL, USA.
Abstract. Understanding the properties exhibited by large scale network probing traffic would improve cyber threat intelligence. In addition, the prediction of probing rates is a key feature for security practitioners in their endeavors for making better operational decisions and for enhancing their defense strategy skills. In this work, we study different aspects of the traffic captured by a /20 network telescope. First, we perform an exploratory data analysis of the collected probing activities. The investigation includes probing rates at the port level, services interesting top network probers and the distribution of probing rates by geolocation. Second, we extract the network probers exploration patterns. We model these behaviors using transition graphs decorated with probabilities of switching from a port to another. Finally, we assess the capacity of Non-stationary Autoregressive and Vector Autoregressive models in predicting port probing rates as a first step towards using more robust models for better forecasting performance.
Mehdi Zakroum, Mounir Ghogho, Mustapha Faqir, Deep Learning for Inferring the Surface Solar Irradiance from Sky Imagery, IEEE International Renewable and Sustainable Energy Conference, 2017-12-04, Tangier, Morocco.
Abstract. We present a novel approach to perform ground-based estimation and prediction of the surface solar irradiance with the view to predicting photovoltaic energy production. We propose the use of mini-batch k-means clustering to extract features, referred to as per cluster number of pixels (PCNP), from sky images taken by a low-cost fish eye camera. These features are first used to classify the sky as clear or cloudy using a single hidden layer neural network; the classification accuracy achieves 99.7%. If the sky is classified as cloudy, we propose to use a deep neural network having as input features the PCNP to predict intra-hour variability of the solar irradiance. Toward this objective, in this paper, we focus on estimating the deep neural network model relating the PCNP features and the solar irradiance, which is an important step before performing the prediction task. The proposed deep learning-based estimation approach is shown to have an accuracy of 95%.