Chen Hajaj — AI Research
I am an Associate Professor and Head of the Data Science Track at Ariel University — working at the intersection of AI theory and real-world impact. I also serve as Head of the Faculty Review Board and NVIDIA University Ambassador.
News
Leadership & Impact
Senior Faculty & Administrative Leadership
NVIDIA University Ambassador (2024–present) — Leading GPU computing and AI education initiatives at Ariel University.
Head of Faculty Review Board (2021–present) — Directing research quality and ethical standards across the Faculty of Engineering.
Head of Data Science Track (2021–present) — Overseeing curriculum development, student programs, and industry partnerships.
Director, Data Science & AI Research Center (2019–2024) — Founded and directed a research center that secured over $1.2M in competitive funding, produced 4 patents, and established national collaborations with the Israel Innovation Authority and the Ministry of Innovation.
Postdoctoral Research
Developed machine learning algorithms for cybersecurity and network traffic analysis. Served on the Postdoc Association Committee, contributing to research policy and mentoring programs.
Ph.D. in Computer Science
Specialized in game theory and mechanism design, developing theoretical models and algorithms for strategic decision-making in competitive and cooperative environments.
Research Focus
Multimodal Learning
Fusing text, vision, audio, and structured data for tasks like item similarity, emotion recognition, and cross-modal retrieval.
Cybersecurity & Network Analysis
Classifying encrypted traffic, detecting anomalies, and building ML-based intrusion detection systems without breaking encryption.
Medical Data Science
Applying machine learning to clinical data for ICU nutrition optimization, patient outcome prediction, and medical decision support.
eCommerce & Recommender Systems
Designing intelligent recommendation engines, pricing strategies, and incentive-aware models that improve user experience and revenue.
Design Spaces
Exploring human factors and cognitive dimensions in interface design, virtual environments, and constructivist learning systems.
Human Behavior Analysis
Modeling strategic and social behavior using computational tools — from multi-agent simulations to sentiment and emotion analysis.
Research Grants & Funding
Total Secured: $1M+
Multi-layered Classification of PQC-Encrypted Traffic
2025Ministry of Innovation, Science and Technology
Co-PIs: Chen Hajaj and Amit Dvir
482900 NISClinical Progression of Dystrophinopathies: Exploring the Role of Autonomic Nervous System Activity and Machine Learning Models
2025Ministry of Innovation, Science and Technology
Co-PIs: Sharon Barak , Chen Hajaj and Riki Tesler
412569 NISEncrypted Networks Traffic Monitoring
2021Israel Innovation Authority
PI: Chen Hajaj
341000 USDAPK Malware Detection and Analysis
2019Israel National Cyber Directorate
PI: Chen Hajaj
179000 USDQuality of Service Monitoring
2019Israel Innovation Authority
Co-PIs: Chen Hajaj and Amit Dvir
100000 USDViolence Prediction in Public Events
2019Israel Innovation Authority
Co-PIs: Chen Hajaj , Amit Dvir and Uzi Ben-Shalom
100000 USDStrategy Proof Mechanisms for Kidney Exchange
2015BSF
PI: Chen Hajaj
4000 USDJoin Our Lab
We welcome motivated graduate students passionate about advancing AI research. Our lab offers a collaborative environment, access to cutting-edge projects, strong industry connections, and a track record of high-impact publications.
selected publications
- ★ Cleaner Adversarial CAPTCHAs: Intelligent Targets and Precise Noise for Usable Security2026Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)Traditional CAPTCHAs are increasingly vulnerable to deep learning-based solvers that decode text and images with high accuracy. In this work, we propose methods to strengthen adversarial CAPTCHAs without compromising human usability. First, we introduce a Precise Gradient Method (PGM) that preserves gradient magnitude (rather than discarding it via a sign operator), producing adversarial perturbations with significantly lower perceptual noise. Second, we develop intelligent target class selection, using either dataset-level confusion structure (Class Relations Network) or image-specific softmax probabilities (Distance-Based Target), to steer adversarial perturbations more efficiently. Across multiple modern architectures (MobileNets, EfficientNets, ResNet, and Vision Transformer), our framework achieves faster convergence (fewer iterations), reduced visual distortion, and notably greater robustness under iterative adversarial retraining. Experiments show that our methods consistently reduce iteration counts and perceptual distortion while significantly increasing the difficulty for automated attacks. Our results offer a practical, scalable path toward the next generation of CAPTCHA systems and contribute new insights to the adversarial machine learning landscape focused on security and usability.Abstract
- ★ A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection2025Computers & SecurityApplication Programming Interface (API) Injection attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there remains a significant challenge when dealing with attackers who use novel approaches that don’t match the well-known payloads commonly seen in attacks. Also, attackers may exploit standard functionalities unconventionally and with objectives surpassing their intended boundaries. Thus, API security needs to be more sophisticated and dynamic than ever, with advanced computational intelligence methods, such as machine learning models that can quickly identify and respond to abnormal behavior. In response to these challenges, we propose a novel unsupervised few-shot anomaly detection framework composed of two main parts: First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach. Our framework allows for training a fast, lightweight classification model using only a few examples of normal API requests. We evaluated the performance of our framework using the CSIC 2010 and ATRDF 2023 datasets. The results demonstrate that our framework improves API attack detection accuracy compared to the state-of-the-art (SOTA) unsupervised anomaly detection baselines.Abstract DOI
- ★ The Art of Time-Bending: Data Augmentation and Early Prediction for Efficient Traffic Classification2024Expert Systems with ApplicationsComputational efficiency is an important consideration for deploying machine learning models for time series prediction in an online setting. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. Hyperparameters can significantly impact prediction accuracy. Traffic measurements, typically collected online by sensors, are serially correlated. Moreover, the data distribution may change gradually. A typical adaptation strategy is periodically re-tuning the model hyperparameters, at the cost of computational burden. In this work, we present an efficient and principled online hyperparameter optimization algorithm for Kernel Ridge regression applied to traffic prediction problems. In tests with real traffic measurement data, our approach requires as little as one-seventh of the computation time of other tuning methods, while achieving better or similar prediction accuracy.Abstract DOI
- ★ Measuring Flight-Destination Similarity: A Multidimensional Approach2024Expert Systems with ApplicationsE-tourism websites offer users a vast array of travel destinations and opportunities, necessitating tools that enable destination comparison and intelligent search capabilities. One key requirement for such tools is the ability to measure the similarity between destinations. Over the years, various similarity measurement techniques have been proposed, including user-based and content-based approaches. However, many of these techniques require data preparation or prior domain knowledge from experts. In contrast, this study proposes an innovative approach that requires no prior domain knowledge of flight destinations or their relationships, and utilizes only readily available data. Our approach draws upon concepts from image recognition and natural language processing (NLP) to extract hidden aspects of destinations. Using data from a flight-search website as a testbed, we analyze similarity metrics based on state-of-the-art methods for image recognition, NLP, and product-network analysis. We then compare these metrics to those obtained by human subjects. Our findings suggest that no single method dominates in all aspects, leading us to propose a hybrid method that leverages the strengths of each. The proposed method can be readily applied to measure product similarity in other domains.Abstract DOI
- ★ Hybrid Speech and Text Analysis Methods for Speaker Change Detection2021IEEE/ACM Transactions on Audio, Speech, and Language ProcessingSpeaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. Nowadays, many applications, such as Speaker Diarization (SD) or automatic vocal transcription, depend on this segmentation task. In this paper, we focus on the essential task of the SD problem, the audio segmenting process, and suggest a solution for the SCD problem, as well as the assignment of clustered speaker labels for the extracted segments, and applying the solution over two datasets: a commercial dataset in Hebrew and the ICSI Meeting Corpus. As such, we propose a hybrid framework for the SCD problem that is learned by textual information and speech signals and the meta-data features that can be extracted from them. Moreover, we demonstrate the negative correlation between an increase in the number of speakers in the training dataset and the influence on the overall diarization system’s performance, which is improved using our efficient SCD component. Finally, we show how our proposed hybrid framework remains robust compared to the ICSI Meeting Corpus, as the experimental evaluation’s training and testing is based on two languages.Abstract DOI
- ★ Improving Robustness of ML Classifiers Against Realizable Evasion Attacks Using Conserved Features201928th USENIX Security Symposium (USENIX Security 19)Machine learning (ML) techniques are increasingly common in security applications, such as malware and intrusion detection. However, ML models are often susceptible to evasion attacks, in which an adversary makes changes to the input (such as malware) in order to avoid being detected. A conventional approach to evaluate ML robustness to such attacks, as well as to design robust ML, is by considering simplified feature-space models of attacks, where the attacker changes ML features directly to effect evasion, while minimizing or constraining the magnitude of this change. We investigate the effectiveness of this approach to designing robust ML in the face of attacks that can be realized in actual malware (realizable attacks). We demonstrate that in the context of structure-based PDF malware detection, such techniques appear to have limited effectiveness, but they are effective with content-based detectors. In either case, we show that augmenting the feature space models with conserved features (those that cannot be unilaterally modified without compromising malicious functionality) significantly improves performance. Finally, we show that feature space models enable generalized robustness when faced with a variety of realizable attacks, as compared to classifiers which are tuned to be robust to a specific realizable attack.Abstract