Publications | Chen Hajaj

Download Full List

2026

Machine Learning Tools for Predicting Pediatric Urinary Tract Infections Caused by ESBL-Producing Bacteria

2026

Chen Hajaj, Shani Alkoby, Shai Ashkenazi, Avner Herman Cohen, Yael Reichenberg, Shiri Kushnir, and Vered Shkalim Zemer

Pediatric Infectious Disease Journal

Background: The prevalence of pediatric urinary tract infections (UTIs) caused by extended-spectrum β-lactamases (ESBL)-producing bacteria is increasing worldwide and is difficult to predict. As these infections require special antibiotic treatment, which is often not started empirically, they are associated with higher rates of intensive care unit admission, morbidity and prolonged hospitalization. We aimed to develop machine learning-based tools to aid pediatricians in predicting ESBL-positive UTIs and initiate appropriate empiric antibiotics. Methods: The electronic medical records of a large Health Maintenance Organization were searched for all children one month to 18 years of age with confirmed UTIs during January 1, 2010, to August 31, 2020. Data on demographics, clinical and laboratory information were retrieved, and following univariate analysis, machine learning-based tools were used to develop models to predict a UTI caused by an ESBL-producing bacterium. Results: A total of 35,830 pediatric UTI events comprised the study group. Age, sex, socioeconomic status, site of infection (community or hospital), prior antibiotic use, previous ESBL-positive UTI and the specific uropathogen were significantly associated with the rates of ESBL-positive infection. Using patients’ data available on presentation, the 5 models developed had a very high negative predictive value of 0.98, indicating strong rule-out performance for ESBL-positive UTIs. Conclusions: Our study indicates that machine learning models based on data available at UTI presentation may support clinicians in estimating the likelihood of ESBL-producing bacteria UTIs. Prospective studies are required to improve the models’ performance and determine their actual impact on clinical outcomes.

DOI
Learning through Concept Generation: The Case of Academic and Practitioner Teaching in the Design Engineering Studio

2026

Hadas Sopher, Chen Hajaj, and John S Gero

Proceedings of the International Conference on Design, Computing and Cognition (DCC 2026)

Whether engineering design should be taught by academic or practitioner tutors remains a debatable topic. This paper examines this debate by comparing design learning behaviours through concept introduction in design engineering critiques given by academic and practitioner tutors. Taking a syntactic approach, the study applies natural language processing techniques to track the generation of design concepts, identified by the first occurrence of nouns and their reoccurrences by each speaker in each session. The tutor’s first occurrences reflect her demonstration. Once repeated by a student, they reflect learning through imitation. These learning behaviours are further examined by analysing the semantic distance between students’ and tutors’ concepts, reflecting convergence of conceptual alignment through semantic proximity. The method is demonstrated by analysing the protocols of a natural case study involving two groups (n = 30) across five consecutive sessions. Results revealed differences between the cohorts. Academic-taught students exhibit a higher rate of first occurrences, compared to practitioner-taught students. While tutors showed similarity in repeating students’ concepts, students’ imitation of academic tutors was at a higher rate compared to that of practitioner tutors. Practitioner tutors drifted further distant from student concepts, whereas students in both groups maintained a semantic proximity to their tutors, reflecting the role of the tutor in leading the educational process. Temporal analysis revealed a decrease in semantic distance throughout the course, indicating greater alignment. This study lays the foundation for articulating critique pedagogies empirically to improve learning behaviours in studio-based education.
Uncovering Microservice Faults: A Temporal Graph Approach to Root Cause Analysis

2026

Udi Aharon, Amit Dvir, Ran Dubin, Revital Marbel, and Chen Hajaj

Proceedings of the IEEE International Conference on Communications. ICC 2026

Real-time processing demands for massive IoT sensor data necessitate reliance on distributed microservice systems within edge clusters. However, pinpointing the root cause of anomalies within these edge microservice clusters poses a critical challenge for intelligent IoT operation and maintenance. To address the issue, a spatio-temporal graph propagation model ST-GraphRCA is proposed for root cause analysis in IoT edge environments. Our approach begins by resolving the fundamental issue of time-series asynchrony across distributed multi-source metrics. A PCA-DTW hybrid feature extraction method is introduced with a dynamic alignment strategy to mitigate the effects of random network delays and data deformation without requiring prior synchronization. Subsequently, ST-GraphRCA constructs a stream-based forward propagation graph based on the flow conservation principle. By integrating dynamic edge weights with node-level input–output anomaly scores, ST-GraphRCA precisely infers fault propagation pathways and identifies potential root cause candidates through causal reasoning. Finally, a topology-constrained high-utility mining algorithm filters these candidates. Using a constraint matrix, the algorithm filters out unreachable service combinations to locate low-frequency and high-risk root causes. Experimental results indicate that ST-GraphRCA achieves an F1-Score of 0.89, outperforming existing methods. In resource-constrained edge scenarios, its average localization time is merely 238.8 ms, representing a six-fold improvement over key benchmarks. Thus, ST-GraphRCA not only provides an efficient anomaly fault tracing solution for large-scale IoT systems but also offers technical support for the intelligent operation and maintenance of distributed microservice systems.
Real-Time Network Security: Integrating ANN and Dynamic Graph-Based Clustering

2026

Zohar Simhon, Matan Weiss, Revital Marbel, Chen Hajaj, Amit Dvir, and Ran Dubin

Computer Networks

DOI
Quality of Experience Prediction for First Person Shooter Online Gaming: The Case Study of Call of Duty

2026

Yehonatan Zion, Eyal Paz, Ran Dubin, Amit Dvir, and Chen Hajaj

Proceedings of the IEEE Consumer Communications & Networking Conference (CCNC 2026)

Latency is the most impactful on fairness and Quality of Experience (QoE) in First-Person Shooter (FPS) games. High latency degrades the QoE of players, who may leave the game if unsatisfied with their QoE. Modern FPS games make great efforts to maintain an excellent QoE even under a poor Internet connection with high latency. Those efforts include the wide distribution of game servers and many software optimizations to smooth the effect of lags in the games. This study aims to provide insights into QoE estimation for network-intensive applications by examining one of the most prominent FPS games of the past two decades: Call of Duty. We observed that the game dynamically adjusts its network traffic behavior, including packet size and transmission rate, in response to variations in network quality. However, the ISP does not have this capability since the network traffic is encrypted; observing the game’s network traffic does not expose its nature and most certainly does not expose the game player’s intensity, latency, or QoE. We propose a novel technique for estimating latency and QoE in FPS games from an ISP-level perspective. In our evaluation, the model detected problematic latency in near real-time with 81% accuracy and an 80% F1 on a 10-second window, highlighting a trade-off between responsiveness and predictive performance. The dataset generated for this study is publicly available to support further research.
Cloudy with a Chance of Anomalies: Dynamic Graph Neural Network for Early Detection of Cloud Services’ User Anomalies

2026

Revital Marbel, Yanir Victor Cohen, Ran Dubin, Amit Dvir, and Chen Hajaj

Proceedings of the IEEE Consumer Communications & Networking Conference (CCNC 2026)

In today’s digital landscape, ensuring the security of cloud environments is critical for organizational resilience, growth, and operational efficiency. As cloud services become more prevalent, so do sophisticated attacks targeting cloud users, making early detection essential. This paper introduces a novel time-based embedding approach for Cloud Services Graph-based Anomaly Detection (CS-GAD) that leverages a Graph Neural Network (GNN) to detect anomalous user behavior. We propose a dynamic tripartite graph to model interactions among users, actions, and cloud services over time. Using behavioral patterns, our GNN generates user embeddings to enable early detection of anomalies. We evaluate this approach on a novel dataset simulating five real-world attacks: cryptojacking, billing abuse, lateral movement, monitor exploitation, and service targeting. The dataset comprises 107,116 Application Programming Interface (API) calls over 32 days, tracking 79 AWS services, with attacks embedded within legitimate cloud traffic. Our results demonstrate that the proposed method achieves a lower false positive rate and higher detection accuracy than a prevailing method, as evidenced by improved accuracy, precision, recall, and F1-score.

DOI
The Sound of Emotions: An Artificial Intelligence Approach to Predicting Emotions from Musical Selections

2026

Ron Hirshprung, Ori Lesman, and Chen Hajaj

Multimedia Systems

Music in the current digital era is consumed primarily on-demand, for example by streaming from Spotify or YouTube, rather than by broadcasting, for example, by listening to a radio station. A strong correlation has been found between music and emotions. Hence, with our ability to select our own music and the availability of artificial intelligence (AI) tools, the importance of music emotion recognition (MER) is significant. This research proposes a novel methodology for predicting emotions from a musical piece that is selected by the listener based on a two-layer AI. The dataset and the process are exceptional in two ways: first, by their qualitative character, which minimized the inherent subjectivity of self-reported emotions, and also the high reliability of the collected data in comparison to other methods such as a questionnaire; and second, by their quantitative character, given songs from about 4,447 people. The model succeeded in predicting the primary emotion with and one of the two leading emotions (the primary and the secondary) with, and without accommodating demographic factors in the model. We believe this knowledge may be a useful tool for therapists, coaches, teachers, etc. and may also raise public awareness of a potential privacy violation.

DOI
★ Cleaner Adversarial CAPTCHAs: Intelligent Targets and Precise Noise for Usable Security

2026

Meir Litman, and Chen Hajaj

Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

Traditional CAPTCHAs are increasingly vulnerable to deep learning-based solvers that decode text and images with high accuracy. In this work, we propose methods to strengthen adversarial CAPTCHAs without compromising human usability. First, we introduce a Precise Gradient Method (PGM) that preserves gradient magnitude (rather than discarding it via a sign operator), producing adversarial perturbations with significantly lower perceptual noise. Second, we develop intelligent target class selection, using either dataset-level confusion structure (Class Relations Network) or image-specific softmax probabilities (Distance-Based Target), to steer adversarial perturbations more efficiently. Across multiple modern architectures (MobileNets, EfficientNets, ResNet, and Vision Transformer), our framework achieves faster convergence (fewer iterations), reduced visual distortion, and notably greater robustness under iterative adversarial retraining. Experiments show that our methods consistently reduce iteration counts and perceptual distortion while significantly increasing the difficulty for automated attacks. Our results offer a practical, scalable path toward the next generation of CAPTCHA systems and contribute new insights to the adversarial machine learning landscape focused on security and usability.

Abstract

2025

Survival Forest Models for ICU Mortality Prediction based on Nutrition and Clinical Factors

2025

Anat Goldstein, Chen Hajaj, Pierre Singer, and Orit Raphaeli

IEEE Access

Objective: This study investigates the predictive role of nutritional factors and gastrointestinal dysfunction in determining mortality among critically ill patients admitted to the intensive care unit (ICU). Our goal is to enhance mortality prediction by incorporating detailed nutritional and clinical data collected during the first 48 hours following ICU admission. Methods: The study included data from 2,942 patients admitted to Beilinson Hospital ICU. Variables collected comprised nutritional measures, gastrointestinal dysfunction symptoms, patient demographics, vital signs, respiratory parameters, and laboratory results. For each clinical variable, per-patient 48-hour summary statistics were derived. Three feature selection strategies were compared: stepwise Cox proportional hazards modeling, Random Forest feature importance, and Principal Component Analysis (PCA). For each feature set, we developed Random Survival Forest (RSF) models and evaluated predictive performance using six-fold cross-validation. Performance evaluation included the Concordance Index (CI), time-dependent AUC at 10 days, and Integrated Brier Score (IBS ≤10 days) for Random Survival Forest models, while Akaike Information Criterion (AIC) and CI were used to monitor model fit and discrimination during stepwise Cox proportional hazards feature selection. Results: The stepwise-selected feature set produced the most accurate model (CI=0.76) and achieved the strongest overall performance across complementary metrics (AUC10d=0.874, IBS ≤10 d=0.049). The Random Forest–selected feature set performed closely behind (CI=0.747; AUC10d=0.872; IBS ≤10 d=0.050). PCA-based feature reduction resulted in substantially poorer discrimination (CI=0.73) and higher prediction error (AUC10d=0.715; IBS ≤10 d=0.068). Across approaches, nutritional and gastrointestinal (GI) dysfunction variables, including nasogastric output, feeding tolerance, and timing of enteral nutrition, emerged as important predictors of early mortality. Conclusion: Our findings underscore the critical importance of early nutritional intervention and GI dysfunction management for ICU patient survival. The study demonstrates that targeted feature selection, which incorporates clinical insight, yields superior discriminative performance compared with purely data-driven reduction methods. Future research should validate these models externally and integrate real-time monitoring data to further refine clinical decision-making and improve patient outcomes in ICU settings.

DOI
Cloudy with a Chance of Anomalies: Dynamic Graph Neural Network for Early Detection of Cloud Services’ User Anomalies

2025

Revital Marbel, Yanir Cohen, Ran Dubin, Amit Dvir, and Chen Hajaj

Proceedings of the 34th International Conference on Computer Communications and Networks

In today’s digital landscape, ensuring the security of cloud environments is critical for organizational resilience, growth, and operational efficiency. As cloud services become more prevalent, so do sophisticated attacks targeting cloud users, making early detection essential. This paper introduces a novel time-based embedding approach for Cloud Services Graph-based Anomaly Detection (CS-GAD) that leverages a Graph Neural Network (GNN) to detect anomalous user behavior. We propose a dynamic tripartite graph to model interactions among users, actions, and cloud services over time. Using behavioral patterns, our GNN generates user embeddings to enable early detection of anomalies. We evaluate this approach on a novel dataset simulating five real-world attacks: cryptojacking, billing abuse, lateral movement, monitor exploitation, and service targeting. The dataset comprises 107,116 Application Programming Interface (API) calls over 32 days, tracking 79 AWS services, with attacks embedded within legitimate cloud traffic. Our results demonstrate that the proposed method achieves a lower false positive rate and higher detection accuracy than a prevailing method, as evidenced by improved accuracy, precision, recall, and F1-score.

DOI
Optimized File Type Detection and One-Shot Reclassification Model

2025

Simona Lisker, Ayelet Botman, Chen Hajaj, Ran Dubin, and Amit Dvir

Proceedings of the IEEE International Conference on Communications

File type classification is critical in digital forensics, and file carving. However, the increasing diversity of file formats challenges accurate classification. Traditional methods rely on hand-crafted features or compact neural networks but face long training times, limited training data, and lower accuracy. This paper introduces three novel, content-based file-type classification approaches to address these challenges. These approaches improve accuracy and streamline the integration of new file types using pre-trained models, enhancing both speed and reliability. The first approach utilizes Natural Language Processing (NLP) with a transformer architecture, while the second combines statistical features with a pre-trained model via transfer learning. These methods achieved accuracy rates of 72.4 % and 69.2 %, respectively, surpassing state-of-the-art Convolutional Neural Network (CNN) models. The third approach employs one-shot learning, achieving 100 % accuracy in several scenarios, enabling efficient training with minimal data.

DOI
A New D-MAGIC: Dynamic Model for Cybersecurity Attack Detection Using GNNs into Clustering

2025

Zohar Simhon, Matan Weiss, Chen Hajaj, Revital Marbel, Ran Dubin, and Amit Dvir

Proceedings of the IEEE International Conference on Communications

The increasing sophistication and frequency of cyberattacks have made Network Intrusion Detection Systems (NIDS) a critical component of modern cybersecurity. This work presents D-MAGIC, a novel real-time NIDS that leverages zero-shot learning and graph-based dynamic clustering to detect known and unknown threats. Unlike traditional systems that rely on labeled datasets and predefined attack signatures, D-MAGIC operates unsupervised, identifying anomalies by detecting deviations from normal network behavior. By embedding the relationships between network flows into a graph structure and dynamically clustering similar patterns, D-MAGIC can detect coordinated attacks and emerging threats with minimal delay. Experimental results on the CIC-IDS-2017 and CSE-CIC-IDS-2018 datasets demonstrate that D-MAGIC achieves an improvement of up to 12 % based on the standard F1 score compared to state-of-the-art methods, while significantly reducing false positives and ensuring rapid, real-time detection with minimal detection latency.

DOI
Enhancing Encrypted Internet Traffic Classification Through Advanced Data Augmentation Techniques

2025

Yehonatan Zion, Porat Aharon, Ran Dubin, Amit Dvir, and Chen Hajaj

Proceedings of the IEEE International Conference on Communications

The increasing popularity of online services has made Internet Traffic Classification a critical field of study. However, the rapid development of internet protocols and encryption limits usable data availability. This paper addresses the challenges of classifying encrypted Internet Traffic, focusing on the scarcity of open-source datasets and limitations of existing ones. We propose two Data Augmentation (DA) techniques to synthetically generate data based on real samples: Average augmentation and MTU augmentation. Both augmentations are aimed to improve the performance of the classifier, each from a different perspective: The Average augmentation aims to increase dataset size by generating new synthetic samples, while the M T U augmentation enhances classifier robustness to varying Maximum Transmission Units (MTUs). Our experiments, conducted on two well-known academic datasets and a commercial dataset, demonstrate the effectiveness of these approaches in improving model performance and mitigating constraints associated with limited and homogeneous datasets. Our findings underscore the potential of data augmentation in addressing the challenges of modern Internet Traffic classification. Specifically, we show that our augmentation techniques significantly enhance encrypted traffic classification models. This improvement can positively impact user Quality of Experience (QoE) by more accurately classifying traffic as video streaming (e.g., YouTube) or chat (e.g., Google Chat). Additionally, it can enhance Quality of Service (QoS) for file downloading activities (e.g., Google Docs).

DOI
PQClass: Classification of Post-Quantum Encryption Applications in Internet Traffic

2025

Angelos Marnerides, Chen Hajaj, Revital Marbel, Ran Dubin, and Amit Dvir

Proceedings of the IEEE International Conference on Communications

Post-quantum cryptography (PQC) is expected to revolutionize secure communications in next-generation digital ecosystems. Previous and ongoing activities demonstrate that different PQC algorithms significantly impact traffic latency, but they do not yet provide a scheme to assess the existence of the PQC algorithm or its identification when encrypted traffic is analyzed for traffic engineering purposes. Hence, this work is the first to propose a novel PQClass pipeline for classifying encrypted Internet traffic of recently NIST-approved PQC algorithms. Hence, it establishes solid grounds for enabling engineers to optimize their networks and, in parallel, for cybersecurity practitioners to familiarise themselves with PQC algorithmic properties for enhancing or devising security architectures in diverse setups. Our pipeline demonstrates impressive performance on real-world data, achieving 86% accuracy in detecting the presence of a PQC algorithm and 91% and 98% accuracy in identifying the browser and OS, respectively, based on PQC-based traffic.

DOI
Sentiment Analysis of Student-Tutor Interactions in VR Design Crits

2025

Hadas Sopher, Chen Hajaj, and John S Gero

The Association for Computer-Aided Architectural Design Research in Asia (CAADRIA)

This research evaluates the effectiveness of using immersive VR media in tutor-student interactions in design studio crits as they progress by generating new concepts. Crit interactions, a core educational component in design education, encourage learner-centredness in concept generation. However, the interaction is often tutor-centric, resulting in sentiments that may hinder participation. Immersive VR media offer a new opportunity for design crits to interact at a tangible real-scale display of the design, potentially altering participants’ sentiments. Research lacks evidence on the cognitive effect of this high-end technology on design learning. This gap produces limitations: assessing the effectiveness of immersive media to improve concept generation and understanding the role of sentiments in this process. We conducted a natural experiment comparing crits given to ten students in immersive and non-immersive media. We then employed natural language processing techniques to measure the distribution and temporal distribution of new concepts generated within either positive or negative sentiment for each medium. Findings exhibit significant differences in concept generation per sentiment within the medium used, with a higher rate of concepts in immersive VR. These demonstrate its role in enhancing stimuli to support learner ideation and advance design studio education.
★ A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection

2025

Udi Aharon, Ran Dubin, Amit Dvir, and Chen Hajaj

Computers & Security

Application Programming Interface (API) Injection attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there remains a significant challenge when dealing with attackers who use novel approaches that don’t match the well-known payloads commonly seen in attacks. Also, attackers may exploit standard functionalities unconventionally and with objectives surpassing their intended boundaries. Thus, API security needs to be more sophisticated and dynamic than ever, with advanced computational intelligence methods, such as machine learning models that can quickly identify and respond to abnormal behavior. In response to these challenges, we propose a novel unsupervised few-shot anomaly detection framework composed of two main parts: First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach. Our framework allows for training a fast, lightweight classification model using only a few examples of normal API requests. We evaluated the performance of our framework using the CSIC 2010 and ATRDF 2023 datasets. The results demonstrate that our framework improves API attack detection accuracy compared to the state-of-the-art (SOTA) unsupervised anomaly detection baselines.

Abstract DOI
Predicting Onset and Progression of Neurodegenerative Diseases Using Blood Test Data and Machine Learning Models

2025

Amir Glik, Chen Hajaj, Orit Rephaeli, and Anat Goldstein
Leveraging OSINT for Advanced Proactive Cybersecurity: Strategies and Solutions

2025

Zafrir Avrahami, Moti Zwilling, and Chen Hajaj

IEEE Access

The growing complexity of the digital environment has increased the need for proactive and intelligence-based approaches to cybersecurity. This study examines the role of open-source intelligence (OSINT) as a strategic tool in proactive cybersecurity operations. Drawing on a wide range of peer-reviewed literature and professional sources, it reviews definitions, operational processes, areas of application, benefits, and challenges associated with OSINT. The analysis highlights OSINT’s contribution to situational awareness, early threat detection, and cyber threat intelligence (CTI) capabilities. By using publicly accessible data from the internet and social platforms, organizations can strengthen their defensive posture against diverse cyber threats. The study outlines workflows for collecting and analyzing OSINT, with attention to its integration into organizational intelligence frameworks and cybersecurity strategies. Examples from both the public and private sectors demonstrate how OSINT supports decision-making, incident response, and preparedness for emerging threats. The review also considers OSINT’s main advantages, including cost effectiveness, accessibility, and real-time relevance, alongside its main challenges such as data reliability, legal concerns, and information overload. The findings suggest that, despite these limitations, OSINT has significant potential to enhance proactive cybersecurity measures, support compliance with standards, assist law enforcement, prevent terrorism, and contribute to business decision-making. The paper concludes with a call for further research on integration with advanced technologies, real-time data analysis, and effective intelligence collaboration.

DOI
Measuring and Analyzing Defects of Additive Manufactured Ti-6Al-4V Specimens Through Image Segmentation

2025

Ro’i Lang, Or Haim Anidjar, Sahar Slonimsky, Chen Hajaj, Oz Golan, Carmel Matias, Alex Diskin, Strokin Evgeny, and Mor Mega

Fatigue & Fracture of Engineering Materials & Structures

Additive manufacturing (AM) has expanded significantly, particularly in aerospace; however, AM materials often have defects that impair fatigue performance. This study examines the geometry and morphology of critical defects in Ti‐6Al‐4V specimens produced using three printing quality settings, followed by hot isostatic pressing (HIP) or heat treatment (HT). We present an automated fatigue failure analysis framework using computer vision and AI to identify critical defects, measure surface proximity, and quantify 14 geometric and morphological features. The model achieved a mean IoU of 0.836 and approximately 10% error in feature measurement. Results show that surface proximity is the most influential factor on fatigue life, with near‐surface defects degrading performance for HT specimens with lack‐of‐fusion (LOF) defects. For HIP specimens, failure sources were typically within 0.16–0.6 mm from the surface. Additionally, for LOF defects, the ‐parameter model achieved with measured cycles to failure.

DOI

2024

Phenotypical Characteristics of Nontuberculous Mycobacterial Infection in Patients with Bronchiectasis

2024

Assaf Frajman, Shimon Izhakian, Ori Mekiten, Ori Hadar, Ariel Lichtenstadt, Chen Hajaj, Shon Shchori, Moshe Heching, Dror Rosengarten, and Mordechai R Kramer

Respiratory Research

Background The global mortality and morbidity rates of bronchiectasis patients due to nontuberculous mycobacteria (NTM) pulmonary infection are on a concerning upward trend. The aims of this study to identify the phenotype of NTM-positive individuals with bronchiectasis. Methods A retrospective single-center observational study was conducted in adult patients with bronchiectasis who underwent bronchoscopy in 2007-2020. Clinical, laboratory, pulmonary function, and radiological data were compared between patients with a positive or negative NTM culture. Results Compared to the NTM-negative group (n=677), the NTM-positive group (n=94) was characterized (P ≤0.05 for all) by older age, greater proportion of females, and higher rates of gastroesophageal reflux disease and muco-active medication use; lower body mass index, serum albumin level, and lymphocyte and eosinophil counts; lower values of forced expiratory volume in one second, forced vital capacity, and their ratio, and lower diffusing lung capacity for carbon monoxide; higher rates of bronchiectasis in both lungs and upper lobes and higher number of involved lobes; and more exacerbations in the year prior bronchoscopy. On multivariate analysis, older age (OR 1.05, 95% CI 1.02-1.07, P=0.001), lower body mass index (OR 1.16, 95% CI 1.16-1.07, P <0.001), and increased number of involved lobes (OR 1.26, 95% CI 1.01-1.44, P=0.04) were associated with NTM infection. Conclusions Patients with bronchiectasis and NTM pulmonary infection are more likely to be older and female with more severe clinical, laboratory, pulmonary function, and radiological parameters than those without NTM infection. This phenotype can be used for screening patients with suspected NTM disease.

DOI
Warm Recommendation: Enhancing Cold Start Recommendations Using Multimodal Product Representations

2024

Anat Goldstein, Amit Alony, and Chen Hajaj

International Conference on Information Systems (ICIS)

In the domain of online recommendation systems, the cold-start problem presents a persistent challenge, particularly acute within the fashion industry with rapid product turnover. Our study introduces a novel two-phase solution to generate recommendations when no prior user-item interactions exist. The first phase generates new item embeddings based on images, text descriptions, and attributes, then identifies the most similar existing item. The second phase utilizes a pre-established item-network to find items frequently purchased with the identified similar item. Our approach also enhances collaborative filtering when user history is available. Preliminary results, based on data from H&M’s store, indicate our method’s enhanced performance, with multimodal embeddings, outperforming individual modalities. Furthermore, incorporating our method into a collaborative-filtering algorithm yielded a relative improvement of 7% in hit-rate in an item cold-start scenario. This approach does not require retraining for new items or users, thus offering a promising solution to e-commerce’s prevalent cold-start issue.
★ The Art of Time-Bending: Data Augmentation and Early Prediction for Efficient Traffic Classification

2024

Chen Hajaj, Porat Aharon, Ran Dubin, and Amit Dvir

Expert Systems with Applications

Computational efficiency is an important consideration for deploying machine learning models for time series prediction in an online setting. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. Hyperparameters can significantly impact prediction accuracy. Traffic measurements, typically collected online by sensors, are serially correlated. Moreover, the data distribution may change gradually. A typical adaptation strategy is periodically re-tuning the model hyperparameters, at the cost of computational burden. In this work, we present an efficient and principled online hyperparameter optimization algorithm for Kernel Ridge regression applied to traffic prediction problems. In tests with real traffic measurement data, our approach requires as little as one-seventh of the computation time of other tuning methods, while achieving better or similar prediction accuracy.

Abstract DOI
CBR–Boosting Adaptive Classification By Retrieval of Encrypted Network Traffic with Out-of-Distribution

2024

Amir Lukach, Ran Dubin, Amit Dvir, and Chen Hajaj

arXiv preprint arXiv:2403.11206

Encrypted network traffic Classification tackles the problem from different approaches and with different goals. One of the common approaches is using Machine learning or Deep Learning-based solutions on a fixed number of classes, leading to misclassification when an unknown class is given as input. One of the solutions for handling unknown classes is to retrain the model, however, retraining models every time they become obsolete is both resource and time-consuming. Therefore, there is a growing need to allow classification models to detect and adapt to new classes dynamically, without retraining, but instead able to detect new classes using few shots learning [1]. In this paper, we introduce Adaptive Classification By Retrieval CBR, a novel approach for encrypted network traffic classification. Our new approach is based on an ANN-based method, which allows us to effectively identify new and existing classes without retraining the model. The novel approach is simple, yet effective and achieved similar results to RF with up to 5% difference (usually less than that) in the classification tasks while having a slight decrease in the case of new samples (from new classes) without retraining. To summarize, the new method is a real-time classification, which can classify new classes without retraining. Furthermore, our solution can be used as a complementary solution alongside RF or any other machine/deep learning classification method, as an aggregated solution.
Exploring the Role of Sentiment in Tutor-Student Interaction. The Case Study of CS and Architecture Formative Studio Critiques

2024

Hadas Sopher, Chen Hajaj, and John S Gero

2024 IEEE Frontiers in Education Conference (FIE)

This work-in-progress research presents a novel approach to assessing tutor-student interactions in design studio critiques, thereby advancing studio assessment methodologies. Following constructivist theories, students learn by practicing professional design behaviors. They introduce new concepts and progress according to the tutor’s feedback and demonstration. Despite their principal role, critique interactions are often tutor-centric and ambiguous, resulting in negative sentiments that may hinder participation. Current methods measure solely learners’ performance of design practices while neglecting the cognitive actions involved, such as the learner’s sentiment. This gap limits assessing the effectiveness of the interaction. In this study, we measure the effect of sentiment on introducing new design concepts, identified as first occurrences (FOs). We demonstrate the approach in two natural case studies comprising six critiques of three CS students and three Architecture students. NLP algorithms are developed to quantify the distribution of FOs generated by students and tutors within positive and negative sentiment environments. Changes in the sentiment throughout critiques measure the temporal role of sentiment in generating FOs. Observations exhibit differences in sentiment between CS and Architecture students. In both courses, more learner-centric FOs occurred in a positive sentiment environment. Using explicit assessment methods opens new possibilities for establishing real-time feedback systems, leading studio-based education to its next step in reshaping constructivist pedagogy.

DOI
Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning

2024

Udi Aharon, Revital Marbel, Ran Dubin, Amit Dvir, and Chen Hajaj

arXiv preprint arXiv:2405.11258

Web applications and APIs face constant threats from malicious actors seeking to exploit vulnerabilities for illicit gains. To defend against these threats, it is essential to have anomaly detection systems that can identify a variety of malicious behaviors. However, a significant challenge in this area is the limited availability of training data. Existing datasets often do not provide sufficient coverage of the diverse API structures, parameter formats, and usage patterns encountered in real-world scenarios. As a result, models trained on these datasets often struggle to generalize and may fail to detect less common or emerging attack vectors. To enhance detection accuracy and robustness, it is crucial to access larger and more representative datasets that capture the true variability of API traffic. To address this, we introduce a GAN-inspired learning framework that extends limited API traffic datasets through targeted, domain-aware synthesis. Drawing on techniques from Natural Language Processing (NLP), our approach leverages Transformer-based architectures, particularly RoBERTa, to enhance the contextual representation of API requests and generate realistic synthetic samples aligned with security-specific semantics. We evaluate our framework on two benchmark datasets, CSIC 2010 and ATRDF 2023, and compare it with a previous data augmentation technique to assess the importance of domain-specific synthesis. In addition, we apply our augmented data to various anomaly detection models to evaluate its impact on classification performance. Our method achieves up to a 4.94% increase in F1 score on CSIC 2010 and up to 21.10% on ATRDF 2023. The source codes of this work are available at this https URL.
Extending Limited Datasets with GAN-Like Self-Supervision for SMS Spam Detection

2024

Or Haim Anidjar, Revital Marbel, Ran Dubin, Amit Dvir, and Chen Hajaj

Computers & Security

DOI
OSF-EIMTC: An Open-Source Framework for Standardized Encrypted Internet Traffic Classification

2024

Ofek Bader, Adi Lichy, Amit Dvir, Ran Dubin, and Chen Hajaj

Computer Communications

Internet traffic classification plays a key role in network visibility, Quality of Services (QoS), intrusion detection, Quality of Experience (QoE) and traffic-trend analyses. In order to improve privacy, integrity, confidentiality, and protocol obfuscation, the current traffic is based on encryption protocols, e.g., SSL/TLS. With the increased use of Machine-Learning (ML) and Deep-Learning (DL) models in the literature, comparison between different models and methods has become cumbersome and difficult due to a lack of a standardized framework. In this paper, we propose an open-source framework, named OSF-EIMTC, which can provide the full pipeline of the learning process. From the well-known datasets to extracting new and well-known features, it provides implementations of well-known ML and DL models (from the traffic classification literature) as well as evaluations. Such a framework can facilitate research in traffic classification domains, so that it will be more repeatable, reproducible, easier to execute, and will allow a more accurate comparison of well-known and novel features and models.

DOI
Hidden in Time, Revealed in Frequency: Spectral Features and Multiresolution Analysis for Encrypted Internet Traffic Classification

2024

Nathan Dillbary, Roi Yozevitch, Amit Dvir, Ran Dubin, and Chen Hajaj

2024 IEEE 21st Consumer Communications & Networking Conference (CCNC)

In recent years, privacy and security concerns have led to the wide adoption of encrypted protocols, making encrypted traffic a major portion of overall communications online. The transition into more secure protocols poses significant challenges for internet service providers to utilize traditional traffic classification techniques in order to guarantee the Quality of Service (QoS), Quality of Experience (QoE), and cyber-security of their customers. In this work, we introduce two methods, namely STFT-TC and DWT-TC, leveraging compact time-series representation coupled with well-known techniques from the field of Digital Signal Processing (DSP): the short-time Fourier transform (STFT) and the discrete wavelet transform (DWT). The STFT-TC method extracts a suite of statistical and spectral features from the magnitude spectrogram, offering a fresh perspective on interpreting and classifying encrypted traffic. The DWT-TC method extracts statistical features from the wavelet coefficients and incorporates unique characteristics that describe the signal’s shape and energy distribution. Evaluating our methods on two public QUIC datasets demonstrated improvements in accuracy of up to 5.7%. Similarly, the F1-scores also showed enhancements, with increments of up to 5.9% for the same datasets.

DOI
★ Measuring Flight-Destination Similarity: A Multidimensional Approach

2024

Anat Goldstein, and Chen Hajaj

Expert Systems with Applications

E-tourism websites offer users a vast array of travel destinations and opportunities, necessitating tools that enable destination comparison and intelligent search capabilities. One key requirement for such tools is the ability to measure the similarity between destinations. Over the years, various similarity measurement techniques have been proposed, including user-based and content-based approaches. However, many of these techniques require data preparation or prior domain knowledge from experts. In contrast, this study proposes an innovative approach that requires no prior domain knowledge of flight destinations or their relationships, and utilizes only readily available data. Our approach draws upon concepts from image recognition and natural language processing (NLP) to extract hidden aspects of destinations. Using data from a flight-search website as a testbed, we analyze similarity metrics based on state-of-the-art methods for image recognition, NLP, and product-network analysis. We then compare these metrics to those obtained by human subjects. Our findings suggest that no single method dominates in all aspects, leading us to propose a hybrid method that leverages the strengths of each. The proposed method can be readily applied to measure product similarity in other domains.

Abstract DOI
Revolutionizing Our Way to Better Classifiers: Leveraging Synthetic Data with Generative Models for Encrypted Network Traffic Classification

2024

Yehonatan Zion, Chen Hajaj, Amit Dvir, Gil Ben-Artzi, Shahar Mahpod, and Ran Dubin

Available at SSRN 4654236

2023

Speech and Multilingual Natural Language Framework for Speaker Change Detection and Diarization

2023

Or Haim Anidjar, Yannick Est‘eve, Chen Hajaj, Amit Dvir, and Itshak Lapidot

Expert Systems with Applications

Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary augmentation and script transliteration. Our evaluations on part-of-speech tagging, universal dependency parsing, and named entity recognition in nine diverse low-resource languages uphold the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.

DOI
Online Temporary Learning Groups in Higher Education–Interactions, Compensation, and Maximisation of Achievements in an Israeli Case Study

2023

Uzi Ben-Shalom, Chen Hajaj, Nitza Davidovitch, and Corinne Berger

Journal of Education Culture and Society

Thesis. This article provides an analysis of online social interactions in two online Temporary Learning Groups (TLG) and their correlates with both pre-admissions scores and academic achievements. Concept. The function of Social Networking Systems (SNS) use on academic achievements is most often indirectly assessed through surveying attitudes of students and teachers. Contrary to this approach, we directly assessed the content on a TLG and paired it with objective admission scores and academic achievements. Results and conclusion. The results reveal that the content of the discussions on the TLGs is practical, immediate, and focuses on the allocation of information required for academic achievements. The users of the TLGs are usually students with lower admission scores and academic achievements. They use these platforms as a compensating mechanism to improve their achievements. In addition, some of the TLG users serve as maximising agents of other students’ achievements. TLGs’ implications for teaching, class-attendance and level of schooling must be recognised by teachers. Originality. While researchers focus on the presence of SNSs in class and its hampering of schooling by multitasking the effect of TLG activity must also be addressed.

DOI
Breaking the Structure of MaMaDroid

2023

Harel Berger, Amit Dvir, Enrico Mariconti, and Chen Hajaj

Expert Systems with Applications

The rise in popularity of the Android platform has resulted in an explosion of malware threats targeting it. As both Android malware and the operating system itself constantly evolve, it is very challenging to design robust malware mitigation techniques that can operate for long periods of time without the need for modifications or costly re-training. In this paper, we present MaMaDroid, an Android malware detection system that relies on app behavior. MaMaDroid builds a behavioral model, in the form of a Markov chain, from the sequence of abstracted API calls performed by an app, and uses it to extract features and perform classification. By abstracting calls to their packages or families, MaMaDroid maintains resilience to API changes and keeps the feature set size manageable. We evaluate its accuracy on a dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it not only effectively detects malware (with up to 99% F-measure), but also that the model built by the system keeps its detection capabilities for long periods of time (on average, 86% and 75% F-measure, respectively, one and two years after training). Finally, we compare against DroidAPIMiner, a state-of-the-art system that relies on the frequency of API calls performed by apps, showing that MaMaDroid significantly outperforms it.

DOI
Detecting Parallel Covert Data Transmission Channels in Video Conferencing Using Machine Learning

2023

Ofir Joseph, Avshalom Elmalech, and Chen Hajaj

Electronics

Covert communication channels are a concept in which a policy-breaking method is used in order to covertly transmit data from inside an organization to an external or accessible point. VoIP and Video systems are exposed to such attacks on different layers, such as the underlying real-time transport protocol (RTP) which uses Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) packet streams to punch a hole through Network address translation (NAT). This paper presents different innovative attack methods utilizing covert communication and RTP channels to spread malware or to create a data leak channel between different organizations. The demonstrated attacks are based on a UDP punch hole created using Skype peer-to-peer video conferencing communication. The different attack methods were successfully able to transmit a small text file in an undetectable manner by observing the communication channel, and without causing interruption to the audio/video channels or creating a noticeable disturbance to the quality. While these attacks are hard to detect by the eye, we show that applying classical Machine Learning algorithms to detect these covert channels on statistical features sampled from the communication channel is effective for one type of attack.

DOI
When a RF Beats a CNN and GRU, Together—A Comparison of Deep Learning and Classical Machine Learning Approaches for Encrypted Malware Traffic Classification

2023

Adi Lichy, Ofek Bader, Ran Dubin, Amit Dvir, and Chen Hajaj

Computers & Security

Internet traffic classification is widely used to facilitate network management. It plays a crucial role in Quality of Services (QoS), Quality of Experience (QoE), network visibility, intrusion detection, and traffic trend analyses. While there is no theoretical guarantee that deep learning (DL)-based solutions perform better than classic machine learning (ML)-based ones, DL-based models have become the common default. This paper compares well-known DL-based and ML-based models and shows that in the case of malicious traffic classification, state-of-the-art DL-based solutions do not necessarily outperform the classical ML-based ones. We exemplify this finding using two well-known datasets for a varied set of tasks, such as: malware detection, malware family classification, detection of zero-day attacks, and classification of an iteratively growing dataset. Note that, it is not feasible to evaluate all possible models to make a concrete statement, thus, the above finding is not a recommendation to avoid DL-based models, but rather empirical proof that in some cases, there are more simplistic solutions, that may perform even better.

DOI
Protein Intake and Clinical Outcomes of Enterally Fed Critically Ill Patients

2023

O Raphaeli, L Statlander, C Hajaj, I Bendavid, and Pierre Singer

Clinical Nutrition ESPEN

DOI
Using Machine-Learning to Assess the Prognostic Value of Early Enteral Feeding Intolerance in Critically Ill Patients: A Retrospective Study

2023

Orit Raphaeli, Liran Statlender, Chen Hajaj, Itai Bendavid, Anat Goldstein, Eyal Robinson, and Pierre Singer

Nutrients

Background: The association between gastrointestinal intolerance during early enteral nutrition (EN) and adverse clinical outcomes in critically ill patients is controversial. We aimed to assess the prognostic value of enteral feeding intolerance (EFI) markers during early ICU stays and to predict early EN failure using a machine learning (ML) approach. Methods: We performed a retrospective analysis of data from adult patients admitted to Beilinson Hospital ICU between January 2011 and December 2018 for more than 48 h and received EN. Clinical data, including demographics, severity scores, EFI markers, and medications, along with 72 h after admission, were analyzed by ML algorithms. Prediction performance was assessed by the area under the receiver operating characteristics (AUCROC) of a ten-fold cross-validation set. Results: The datasets comprised 1584 patients. The means of the cross-validation AUCROCs for 90-day mortality and early EN failure were 0.73 (95% CI 0.71–0.75) and 0.71 (95% CI 0.67–0.74), respectively. Gastric residual volume above 250 mL on the second day was an important component of both prediction models. Conclusions: ML underlined the EFI markers that predict poor 90-day outcomes and early EN failure and supports early recognition of at-risk patients. Results have to be confirmed in further prospective and external validation studies.

DOI

2022

SimCSE for Encrypted Traffic Detection and Zero-Day Attack Detection

2022

Rotem Bar, and Chen Hajaj

IEEE Access

Traffic detection has attracted much attention in recent years, playing an essential role in intrusion detection systems (IDS). This paper proposes a new approach for traffic detection at the packet level, inspired by natural language processing (NLP), using simple contrastive learning of sentence embeddings (SimCSE) as an embedding model. The new approach can learn the features of traffic from raw packet data. Experiments were conducted on two well-known datasets to evaluate our approach. For detecting malicious activity, our model achieved an accuracy of 99.99% on the USTC-TFC2016 dataset, whereas for detecting virtual private network (VPN) activity, our model achieved an accuracy of 99.98% on the ISCXVPN2016 dataset. Furthermore, the resulting model was found to be robust based on zero-day attack detection, which shows the model’s ability to detect attacks that have not been seen before. Experiments show that our approach can effectively detect network traffic and outperforms many other state-of-the-art methods.

DOI
MaMaDroid2.0–The Holes of Control Flow Graphs

2022

Harel Berger, Chen Hajaj, Enrico Mariconti, and Amit Dvir

arXiv preprint arXiv:2202.13922

Android malware is a continuously expanding threat to billions of mobile users around the globe. Detection systems are updated constantly to address these threats. However, a backlash takes the form of evasion attacks, in which an adversary changes malicious samples such that those samples will be misclassified as benign. This paper fully inspects a well-known Android malware detection system, MaMaDroid, which analyzes the control flow graph of the application. Changes to the portion of benign samples in the train set and models are considered to see their effect on the classifier. The changes in the ratio between benign and malicious samples have a clear effect on each one of the models, resulting in a decrease of more than 40% in their detection rate. Moreover, adopted ML models are implemented as well, including 5-NN, Decision Tree, and Adaboost. Exploration of the six models reveals a typical behavior in different cases, of tree-based models and distance-based models. Moreover, three novel attacks that manipulate the CFG and their detection rates are described for each one of the targeted models. The attacks decrease the detection rate of most of the models to 0%, with regards to different ratios of benign to malicious apps. As a result, a new version of MaMaDroid is engineered. This model fuses the CFG of the app and static analysis of features of the app. This improved model is proved to be robust against evasion attacks targeting both CFG-based models and static analysis models, achieving a detection rate of more than 90% against each one of the attacks.
Problem-Space Evasion Attacks in the Android OS: A Survey

2022

Harel Berger, Chen Hajaj, and Amit Dvir

arXiv preprint arXiv:2205.14576

Android is the most popular OS worldwide. Therefore, it is a target for various kinds of malware. As a countermeasure, the security community works day and night to develop appropriate Android malware detection systems, with ML-based or DL-based systems considered as some of the most common types. Against these detection systems, intelligent adversaries develop a wide set of evasion attacks, in which an attacker slightly modifies a malware sample to evade its target detection system. In this survey, we address problem-space evasion attacks in the Android OS, where attackers manipulate actual APKs, rather than their extracted feature vector. We aim to explore this kind of attacks, frequently overlooked by the research community due to a lack of knowledge of the Android domain, or due to focusing on general mathematical evasion attacks - i.e., feature-space evasion attacks. We discuss the different aspects of problem-space evasion attacks, using a new taxonomy, which focuses on key ingredients of each problem-space attack, such as the attacker model, the attacker’s mode of operation, and the functional assessment of post-attack applications.
Do You Think You Can Hold Me? The Real Challenge of Problem-Space Evasion Attacks

2022

Harel Berger, Amit Dvir, Chen Hajaj, and Rony Ronen

arXiv preprint arXiv:2205.04293

Given the continually rising frequency of cyberattacks, the adoption of artificial intelligence methods, particularly Machine Learning (ML), Deep Learning (DL), and Reinforcement Learning (RL), has become essential in the realm of cybersecurity. These techniques have proven to be effective in detecting and mitigating cyberattacks, which can cause significant harm to individuals, organizations, and even countries. Machine learning algorithms use statistical methods to identify patterns and anomalies in large datasets, enabling security analysts to detect previously unknown threats. Deep learning, a subfield of ML, has shown great potential in improving the accuracy and efficiency of cybersecurity systems, particularly in image and speech recognition. On the other hand, RL is again a subfield of machine learning that trains algorithms to learn through trial and error, making it particularly effective in dynamic environments. We also evaluated the usage of ChatGPT-like AI tools in cyber-related problem domains on both sides, positive and negative. This article provides an overview of how ML, DL, and RL are applied in cybersecurity, including their usage in malware detection, intrusion detection, vulnerability assessment, and other areas. The state-of-the-art studies using ML, DL, and RL models are evaluated in each section based on the main idea, techniques, and important findings. It also discusses these techniques’ challenges and limitations, including data quality, interpretability, and adversarial attacks. Overall, the use of ML, DL, and RL in cybersecurity holds great promise for improving the effectiveness of security systems and enhancing our ability to protect against cyberattacks. However, it is essential to continue developing and refining these techniques to address the ever-evolving nature of cyber threats. Besides, some promising solutions that rely on machine learning, deep learning, and reinforcement learning are susceptible to adversarial attacks, underscoring the importance of factoring in this vulnerability when devising countermeasures against sophisticated cyber threats. We also concluded that ChatGPT can be a valuable tool for cybersecurity, but it should be noted that ChatGPT-like tools can also be manipulated to threaten the integrity, confidentiality, and availability of data.
The Hidden Conversion Funnel of Mobile vs. Desktop Consumers

2022

Anat Goldstein, and Chen Hajaj

Electronic Commerce Research and Applications

DOI
Less Is More: Robust and Novel Features for Malicious Domain Detection

2022

Chen Hajaj, Nitay Hason, and Amit Dvir

Electronics

Malicious domains are increasingly common and pose a severe cybersecurity threat. Specifically, many types of current cyber attacks use URLs for attack communications (e.g., CandC, phishing, and spear-phishing). Despite the continuous progress in detecting cyber attacks, there are still critical weak spots in the structure of defense mechanisms. Since machine learning has become one of the most prominent malware detection methods, a robust feature selection mechanism is proposed that results in malicious domain detection models that are resistant to evasion attacks. This mechanism exhibits a high performance based on empirical data. This paper makes two main contributions: First, it provides an analysis of robust feature selection based on widely used features in the literature. Note that even though the feature set dimensional space is cut by half, the performance of the classifier is still improved (an increase in the model’s F1-score from 92.92% to 95.81%). Second, it introduces novel features that are robust with regard to the adversary’s manipulation. Based on an extensive evaluation of the different feature sets and commonly used classification models, this paper shows that models based on robust features are resistant to malicious perturbations and concurrently are helpful in classifying non-manipulated data.

DOI
MalDIST: From Encrypted Traffic Classification to Malware Traffic Detection and Classification

2022

Ofek Bader, Adi Lichy, Chen Hajaj, Ran Dubin, and Amit Dvir

2022 IEEE 19th annual consumer communications & networking conference (CCNC)

The world of malware is shifting towards using encrypted traffic. While encryption improves the privacy of users, it brings challenges in the fields of QoS, QoE, and cybersecurity. Recent state-of-the-art Deep-Learning architectures for encrypted traffic classifications demonstrated superb results in tasks of traffic categorization over encrypted traffic. In this paper, we leverage the feasibility to use such architectures for the tasks of malware detection and classification to gain insights into how well these architectures perform in the domain of malware traffic. Specifically, we present a Deep-Learning model for malware traffic detection and classification (MalDIST), which outperforms both classical ML and DL malware traffic classification models both in terms of detection and classification.

DOI
Using the Cardio-Vascular Index (CVRI) to Predict Mortality in Septic Shock

2022

O Raphaeli, I Bendavid, C Hajaj, L Statlander, A Goldstein, E Chen, P Singer, and U Gabbay

ISICEM

2021

Crystal Ball: From Innovative Attacks to Attack Effectiveness Classifier

2021

Harel Berger, Chen Hajaj, Enrico Mariconti, and Amit Dvir

IEEE Access

Android OS is one of the most popular operating systems worldwide, making it a desirable target for malware attacks. Some of the latest and most important defensive systems are based on machine learning (ML) and cybercriminals continuously search for ways to overcome the barriers posed by these systems. Thus, the focus of this work is on evasion attacks in the attempt to show the weaknesses of state of the art research and how more resilient systems can be built. Evasion attacks consist of manipulating either the actual malicious application (problem-based) or its extracted feature vector (feature-based), to avoid being detected by ML systems. This study presents a set of innovative problem-based evasion attacks against well-known Android malware detection systems, which decrease their detection rate by up to 97%. Moreover, an analysis of the effectiveness of these attacks against VirusTotal (VT) scanners was conducted, empirically showing their efficiency against well-known scanners (e.g., McAfee and Comodo) as well. The VT system proved to be a great candidate for the attacks, as in 98% of the apps, less scanners detected the manipulated apps than the original malicious apps. As not all the attacks are effective in the same manner against the VT scanners, the attack efficiency classifiers are advised. Each classifier predicts the applicability of one of the attacks. The set of classifiers creates an ensemble, which shows high success rates, allowing the attacker to decide which attack is best to use for each malicious app and defense system.

DOI
Robust Coordination in Adversarial Social Networks: From Human Behavior to Agent-Based Modeling

2021

Chen Hajaj, Zlatko Joveski, Sixie Yu, and Yevgeniy Vorobeychik

Network Science

AbstractDecentralized coordination is one of the fundamental challenges for societies and organizations. While extensively explored from a variety of perspectives, one issue that has received limited attention is human coordination in the presence of adversarial agents. We study this problem by situating human subjects as nodes on a network, and endowing each with a role, either regular (with the goal of achieving consensus among all regular players), or adversarial (aiming to prevent consensus among regular players). We show that adversarial nodes are, indeed, quite successful in preventing consensus. However, we demonstrate that having the ability to communicate among network neighbors can considerably improve coordination success, as well as resilience to adversarial nodes. Our analysis of communication suggests that adversarial nodes attempt to exploit this capability for their ends, but do so in a somewhat limited way, perhaps to prevent regular nodes from recognizing their intent. In addition, we show that the presence of trusted nodes generally has limited value, but does help when many adversarial nodes are present, and players can communicate. Finally, we use experimental data to develop computational models of human behavior and explore additional parametric variations: features of network topologies and densities, and placement, all using the resulting data-driven agent-based (DDAB) model.

DOI
Prediction Model for the Spread of the COVID-19 Outbreak in the Global Environment

2021

Ron S Hirschprung, and Chen Hajaj

Heliyon

COVID-19 has long become a worldwide pandemic. It is responsible for the death of over two million people and posed an economic recession. This paper studies the spread pattern of COVID-19, aiming to establish a prediction model for this event. We harness Data Mining and Machine Learning methodologies to train regression models to predict the number of confirmed cases in a spatial-temporal space. We introduce an innovative concept ‒ the Center of Infection Mass (CoIM) ‒ adapted from the field of physics. We empirically evaluated our model on western European countries, based on the CoIM index and other features, and showed that a relatively high accurate prediction of the spread can be obtained. Our contribution is twofold: first, we introduced a prediction methodology and proved empirically that a prediction can be made even to the range of over a month; second, we showed promise in adopting the CoIM index to prediction models, when models that adopt the CoIM yield significantly better results than those that discard it. By applying our model, and better controlling the inherent tradeoff between life-saving and economy, we believe that decision-makers can take close to optimal measures. Thus, this methodology may contribute to public welfare.

DOI
A Thousand Words Are Worth More Than One Recording: Word-Embedding Based Speaker Change Detection

2021

Or Haim Anidjar, Itshak Lapidot, Chen Hajaj, and Amit Dvir

Interspeech

Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. This task is essential for many applications, such as automatic voice transcription or Speaker Diarization (SD). This paper focuses on the essential task of audio segmentation and suggests a word-embedding-based solution for the SCD problem. More-over, we show how to use our approach in order to outperform voice-based solutions for the SD problem. We empirically show that our method can accurately identify the speaker-turns in an audio-recording with 82.12% and 89.02% success in the Recall and F1-score measures.

DOI
★ Hybrid Speech and Text Analysis Methods for Speaker Change Detection

2021

Or Haim Anidjar, Itshak Lapidot, Chen Hajaj, Amit Dvir, and Issachar Gilad

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. Nowadays, many applications, such as Speaker Diarization (SD) or automatic vocal transcription, depend on this segmentation task. In this paper, we focus on the essential task of the SD problem, the audio segmenting process, and suggest a solution for the SCD problem, as well as the assignment of clustered speaker labels for the extracted segments, and applying the solution over two datasets: a commercial dataset in Hebrew and the ICSI Meeting Corpus. As such, we propose a hybrid framework for the SCD problem that is learned by textual information and speech signals and the meta-data features that can be extracted from them. Moreover, we demonstrate the negative correlation between an increase in the number of speakers in the training dataset and the influence on the overall diarization system’s performance, which is improved using our efficient SCD component. Finally, we show how our proposed hybrid framework remains robust compared to the ICSI Meeting Corpus, as the experimental evaluation’s training and testing is based on two languages.

Abstract DOI
PCL: Packet Classification with Limited Knowledge

2021

Vitalii Demianiuk, Chen Hajaj, and Kirill Kogan

IEEE INFOCOM 2021-IEEE Conference on Computer Communications

We introduce a novel representation of packet classifiers allowing to operate on partially available input data varying dynamically. For a given packet classifier, availability of fields or complexity of field computations, and free target specific resources, the proposed infrastructure computes a classifier representation satisfying performance and robustness requirements. We show the feasibility to reconstruct a classification result in this noisy environment, allowing for the improvement of performance and the achievement of additional robustness levels of network infrastructure. Our results are supported by extensive evaluations in various settings where only a partial input is available.

DOI
Feeding Intolerance as a Predictor of Clinical Outcomes in Critically Ill Patients: A Machine Learning Approach

2021

O Raphaeli, C Hajaj, I Bendavid, A Goldstein, E Chen, and P Singer

Clinical Nutrition ESPEN

DOI
Using machine learning to support early prediction of feeding intolerance in critically ill patients

2021

O Raphaeli, C Hajaj, I Bendavid, A Goldstein, E Chen, and P Singer

ESICM LIVES

BACKGROUND: The association between gastrointestinal intolerance during early enteral nutrition (EN) and adverse clinical outcomes in critically ill patients is controversial. We aimed to assess the prognostic value of enteral feeding intolerance (EFI) markers during early ICU stays and to predict early EN failure using a machine learning (ML) approach. METHODS: We performed a retrospective analysis of data from adult patients admitted to Beilinson Hospital ICU between January 2011 and December 2018 for more than 48 h and received EN. Clinical data, including demographics, severity scores, EFI markers, and medications, along with 72 h after admission, were analyzed by ML algorithms. Prediction performance was assessed by the area under the receiver operating characteristics (AUCROC) of a ten-fold cross-validation set. RESULTS: The datasets comprised 1584 patients. The means of the cross-validation AUCROCs for 90-day mortality and early EN failure were 0.73 (95% CI 0.71-0.75) and 0.71 (95% CI 0.67-0.74), respectively. Gastric residual volume above 250 mL on the second day was an important component of both prediction models. CONCLUSIONS: ML underlined the EFI markers that predict poor 90-day outcomes and early EN failure and supports early recognition of at-risk patients. Results have to be confirmed in further prospective and external validation studies.

2020

Encrypted Video Traffic Clustering Demystified

2020

Amit Dvir, Angelos K Marnerides, Ran Dubin, Nehor Golan, and Chen Hajaj

Computers & Security

Cyber threat intelligence officers and forensics investigators often require the behavioural profiling of groups based on their online video viewing activity. It has been demonstrated that encrypted video traffic can be classified under the assumption of using a known subset of video titles based on temporal video viewing trends of particular groups. Nonetheless, composing such a subset is extremely challenging in real situations. Therefore, this work exhibits a novel profiling scheme for encrypted video traffic with no a priori assumption of a known subset of titles. It introduces a seminal synergy of Natural Language Processing (NLP) and Deep Encoder-based feature embedding algorithms with refined clustering schemes from off-the-shelf solutions, in order to group viewing profiles with unknown video streams. This study is the first to highlight the most computationally effective, accurate combinations of feature embedding and clustering using real datasets, thereby, paving the way to future forensics tools for automated behavioural profiling of malicious actors.

DOI
Robust Machine Learning for Encrypted Traffic Classification

2020

Jonathan Muehlstein, Yehonatan Zion, Ofir Pele, Chen Hajaj, Ran Dubin, and Amit Dvir

CoRR

Desktops and laptops can be maliciously exploited to violate privacy. In this paper, we consider the daily battle between the passive attacker who is targeting a specific user against a user that may be adversarial opponent. In this scenario, while the attacker tries to choose the best vector attack by surreptitiously monitoring the victims encrypted network traffic in order to identify users parameters such as the Operating System (OS), browser and apps. The user may use tools such as a Virtual Private Network (VPN) or even change protocols parameters to protect his/her privacy. We provide a large dataset of more than 20,000 examples for this task. We run a comprehensive set of experiments, that achieves high (above 85) classification accuracy, robustness and resilience to changes of features as a function of different network conditions at test time. We also show the effect of a small training set on the accuracy.
Evasion Is Not Enough: A Case Study of Android Malware

2020

Harel Berger, Chen Hajaj, and Amit Dvir

International symposium on cyber security cryptography and machine learning
Robust Malicious Domain Detection

2020

Nitay Hason, Amit Dvir, and Chen Hajaj

Cyber Security Cryptography and Machine Learning: Fourth International Symposium, CSCML 2020, Be’er Sheva, Israel, July 2–3, 2020, Proceedings 4

Domain name system (DNS) is a crucial part of the Internet, yet has been widely exploited by cyber attackers. Apart from making static methods like blacklists or sinkholes infeasible, some weasel attackers can even bypass detection systems with machine learning based classifiers. As a solution to this problem, we propose a robust domain detection system named HinDom. Instead of relying on manually selected features, HinDom models the DNS scene as a Heterogeneous Information Network (HIN) consist of clients, domains, IP addresses and their diverse relationships. Besides, the metapath-based transductive classification method enables HinDom to detect malicious domains with only a small fraction of labeled samples. So far as we know, this is the first work to apply HIN in DNS analysis. We build a prototype of HinDom and evaluate it in CERNET2 and TUNET. The results reveal that HinDom is accurate, robust and can identify previously unknown malicious domains.

2019

Adversarial Coordination on Social Networks

2019

Chen Hajaj, Sixie Yu, Zlatko Joveski, and Yevgeniy Vorobeychik

Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems

Extensive literature exists studying decentralized coordination and consensus, with considerable attention devoted to ensuring robustness to faults and attacks. However, most of the latter literature assumes that non-malicious agents follow simple stylized rules. In reality, decentralized protocols often involve humans, and understanding how people coordinate in adversarial settings is an open problem. We initiate a study of this problem, starting with a human subjects investigation of human coordination on networks in the presence of adversarial agents, and subsequently using the resulting data to bootstrap the development of a credible agent-based model of adversarial decentralized coordination. In human subjects experiments, we observe that while adversarial nodes can successfully prevent consensus, the ability to communicate can significantly improve robustness, with the impact particularly significant in scale-free networks. On the other hand, and contrary to typical stylized models of behavior, we show that the existence of trusted nodes has limited utility. Next, we use the data collected in human subject experiments to develop a data-driven agent-based model of adversarial coordination. We show that this model successfully reproduces observed behavior in experiments, is robust to small errors in individual agent models, and illustrate its utility by using it to explore the impact of optimizing network location of trusted and adversarial nodes.

DOI
★ Improving Robustness of ML Classifiers Against Realizable Evasion Attacks Using Conserved Features

2019

Liang Tong, Bo Li, Chen Hajaj, Chaowei Xiao, Ning Zhang, and Yevgeniy Vorobeychik

28th USENIX Security Symposium (USENIX Security 19)

Machine learning (ML) techniques are increasingly common in security applications, such as malware and intrusion detection. However, ML models are often susceptible to evasion attacks, in which an adversary makes changes to the input (such as malware) in order to avoid being detected. A conventional approach to evaluate ML robustness to such attacks, as well as to design robust ML, is by considering simplified feature-space models of attacks, where the attacker changes ML features directly to effect evasion, while minimizing or constraining the magnitude of this change. We investigate the effectiveness of this approach to designing robust ML in the face of attacks that can be realized in actual malware (realizable attacks). We demonstrate that in the context of structure-based PDF malware detection, such techniques appear to have limited effectiveness, but they are effective with content-based detectors. In either case, we show that augmenting the feature space models with conserved features (those that cannot be unilaterally modified without compromising malicious functionality) significantly improves performance. Finally, we show that feature space models enable generalized robustness when faced with a variety of realizable attacks, as compared to classifiers which are tuned to be robust to a specific realizable attack.

Abstract

2018

Adversarial task assignment

2018

Chen Hajaj, and Yevgeniy Vorobeychik

International Joint Conference on Artificial Intelligence

The problem of assigning tasks to workers is of long-standing fundamental importance. Examples of this include the classical problem of assigning computing tasks to nodes in a distributed computing environment, assigning jobs to robots, and crowdsourcing. Extensive research into this problem generally addresses important issues such as uncertainty and incentives. However, the problem of adversarial tampering with the task assignment process has not received as much attention. We are concerned with a particular adversarial setting where an attacker may target a set of workers in order to prevent the tasks assigned to these workers from being completed. When all tasks are homogeneous, we provide an efficient algorithm for computing the optimal assignment. When tasks are heterogeneous, we show that the adversarial assignment problem is NP-Hard, and present an algorithm for solving it approximately. Our theoretical results are accompanied by extensive experiments showing the effectiveness of our algorithms.

DOI
A crowdsourcing framework for medical data sets

2018

Cheng Ye, Joseph Coco, Anna Epishova, Chen Hajaj, Henry Bogardus, Laurie Novak, Joshua Denny, Yevgeniy Vorobeychik, Thomas Lasko, Bradley Malin, and others

AMIA Summits on Translational Science Proceedings

Crowdsourcing services like Amazon Mechanical Turk allow researchers to ask questions to crowds of workers and quickly receive high quality labeled responses. However, crowds drawn from the general public are not suitable for labeling sensitive and complex data sets, such as medical records, due to various concerns. Major challenges in building and deploying a crowdsourcing system for medical data include, but are not limited to: managing access rights to sensitive data and ensuring data privacy controls are enforced; identifying workers with the necessary expertise to analyze complex information; and efficiently retrieving relevant information in massive data sets. In this paper, we introduce a crowdsourcing framework to support the annotation of medical data sets. We further demonstrate a workflow for crowdsourcing clinical chart reviews including (1) the design and decomposition of research questions; (2) the architecture for storing and displaying sensitive data; and (3) the development of tools to support crowd workers in quickly analyzing information from complex data sets.

2017

Non-Cooperative Team Formation and a Team Formation Mechanism

2017

Matthew Chambers, Chen Hajaj, Greg Leo, Jian Lou, Martin Linden, Yevgeniy Vorobeychik, and Myrna H Wooders

Available at SSRN 3054771

We model decentralized team formation as a game in which players make offers to potential teams whose members then either accept or reject the offers. The games induce no-delay subgame perfect equilibria with unique outcomes that are individually rational and match soulmates. We provide sufficient conditions for equilibria to implement core coalition structures, and show that when each player can make a sufficiently large number of proposals, outcomes are Pareto optimal. We then design a mechanism to implement equilibrium of this game and provide sufficient conditions to ensure that truthful reporting of preferences is a strong ex post Nash equilibrium. Moreover, we show empirically that players rarely have an incentive to misreport preferences more generally. Furthermore, for the problem with cardinal preferences, we show empirically that the resulting mechanism results in significantly higher social welfare than serial dictatorship, and the outcomes are highly equitable.
Enhancing Comparison Shopping Agents Through Ordering and Gradual Information Disclosure

2017

Chen Hajaj, Noam Hazon, and David Sarne

Autonomous Agents and Multi-Agent Systems

The plethora of comparison shopping agents (CSAs) in today’s markets enables buyers to query more than a single CSA when shopping, thus expanding the list of sellers whose prices they obtain. This potentially decreases the chance of a purchase within any single interaction between a buyer and a CSA, and consequently decreases each CSAs’ expected revenue per-query. Obviously, a CSA can improve its competence in such settings by acquiring more sellers’ prices, potentially resulting in a more attractive “best price”. In this paper we suggest a complementary approach that improves the attractiveness of the best result returned based on intelligently controlling the order according to which they are presented to the user, in a way that utilizes several known cognitive-biases of human buyers. The advantage of this approach is in its ability to affect the buyer’s tendency to terminate her search for a better price, hence avoid querying further CSAs, without spending valuable resources on finding additional prices to present. The effectiveness of our method is demonstrated using real data, collected from four CSAs for five products. Our experiments confirm that the suggested method effectively influence people in a way that is highly advantageous to the CSA compared to the common method for presenting the prices. Furthermore, we experimentally show that all of the components of our method are essential to its success.

DOI
Selective Opportunity Disclosure at the Service of Strategic Information Platforms

2017

Chen Hajaj, and David Sarne

Autonomous Agents and Multi-Agent Systems

This paper studies the strategic behavior of platforms that provide agents easier access to the type of opportunities in which they are interested (e.g., eCommerce platforms, used cars bulletins and dating web-sites). We show that under four common service schemes, a platform can benefit from not necessarily listing all the opportunities with which it is familiar, even if there is no marginal cost for listing any additional opportunity. The main implication of this result is that platforms should take the subset of opportunities to be included in their listings as a decision variable, alongside the fees set for the service in their expected-profit maximizing optimization problem. We show that none of the four schemes generally dominates any of the others or is dominated by any. For the case of homogeneous preferences, however, several dominance relationships can be proved. Furthermore, the analysis provides a game-theoretic search-based explanation for a possible preference of buyers to pay for the service rather than receive it for free (e.g., when the service is sponsored by ads). The paper shows that this preference can hold both for the users and the platform simultaneously in a given setting, even if both sides are fully strategic. Finally, the paper analyzes the potential improvement in the platform’s expected profit that can be achieved by considering hybrid service schemes that combine the basic ones. In particular, we focus in the Two-Part Tariff scheme that combines the two commonly used subscription and pay-per-click schemes.

DOI
Enhancing Crowdworkers’ Vigilance

2017

Avshalom Elmalech, David Sarne, Esther David, and Chen Hajaj

Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

This paper presents methods for improving the attention span of workers in tasks that heavily rely on their attention to the occurrence of rare events. The underlying idea in our approach is to dynamically augment the task with some dummy (artificial) events at different times throughout the task, rewarding the worker upon identifying and reporting them. The proposed approach is an alternative to the traditional approach of exclusively relying on rewarding the worker for successfully identifying the event of interest itself. We propose three methods for timing the dummy events throughout the task. Two of these methods are static and determine the timing of the dummy events at random or uniformly throughout the task. The third method is dynamic and uses the identification (or misidentification) of dummy events as a signal for the worker’s attention to the task, adjusting the rate of dummy events generation accordingly.
Agent-Mediated Electronic Commerce. Designing Trading Strategies and Mechanisms for Electronic Markets

2017

Sofia Ceppi, Esther David, Chen Hajaj, Valentin Robu, and Ioannis A Vetsikas

2016

Extending Workers’ Attention Span Through Dummy Events

2016

Avshalom Elmalech, David Sarne, Esther David, and Chen Hajaj

Proceedings of the AAAI Conference on Human Computation and Crowdsourcing

This paper studies a new paradigm for improving the attention span of workers in tasks that heavily rely on user’s attention to the occurrence of rare events. Such tasks are highly common, ranging from crime monitoring to controlling autonomous complex machines, and many of them are ideal for crowdsourcing. The underlying idea in our approach is to dynamically augment the task with some dummy (artificial) events at different times throughout the task, rewarding the worker upon identifying and reporting them. This, as an alternative to the traditional approach of exclusively relying on rewarding the worker for successfully identifying the event of interest itself. We propose three methods for timing the dummy events throughout the task. Two of these methods are static and determine the timing of the dummy events at random or uniformly throughout the task. The third method is dynamic and uses the identification (or misidentification) of dummy events as a signal for the worker’s attention to the task, adjusting the rate of dummy events generation accordingly. We use extensive experimentation to compare the methods with the traditional approach of inducing attention through rewarding the identification of the event of interest and within the three. The analysis of the results indicates that with the use of dummy events a substantially more favorable tradeoff between the detection (of the event of interest) probability and the expected expense can be achieved, and that among the three proposed method the one that decides on dummy events on the fly is (by far) the best.
Intelligent Mechanisms for Platforms in Modern Markets

2016

Chen Hajaj

This thesis explores the use of autonomous agents serving as mediators between different entities in modern markets to maximize an expected revenue (e.g., the agents’ own profit or users’ social welfare). This mediation can take many forms and serve many purposes, such as buyers and sellers at online shopping environments, men and women at dating sites, and different traders at barter markets. These agents differ from one another in terms such as the cost of their mediation services (e.g, they can offer their services for a fee or complimentary). In the first case, when the services are costly, agents must ponder the terms they set for their services. Similarly, when offering their services for free they have to be as appealing as possible in order to overcome external competition. This thesis provides theoretically-justified and empirically-validated approaches for designing and operating intelligent mechanisms for mediation agents in both electronic and barter markets. While two of the mechanisms presented within this thesis guide the platform for how to use selective information disclosure for its benefit, a different one is designed specifically in order to avoid selective information disclosure to the platform. The research is based on both theoretical analysis and online validation of suggested heuristics. Theoretical analysis is carried out using concepts from search theory, and game theory and the online validation is based on Amazon Mechanical Turk, a well-known crowdsourcing platform imitating people’s behavior in real-life.The models used in this thesis consider settings of autonomous, fully-rational, and self-interested agents interacting with both fully-rational agents and bounded-rational people, and how the difference in their decision-making processes may imply differently, yet correlated, strategies for the system designer. The different model variants are applicable and can be mapped onto various different real-life applications (e.g., online shopping, car purchasing, and barter markets). The models differ primarily in the assumptions they make. In electronic markets, assumptions relating to the agents’ source of income and the users’ computational capabilities (fully-rational agents vs. bounded-rational people). For barter markets, this thesis focuses on fully-rational agents that differ in the number of goods they hold and can be traded on the market. Based on the common model for these markets, this thesis provides a proof of impossibility result on the ability to enforce the agents’ strategy-proof behavior regardless of the means used. Still, the thesis shows that by adding a set of real-life assumptions on the model, one can circumvent this impossibility result and validate that the truthful strategy is indeed the dominant one. The analysis is accompanied by extensive simulations and online experiments, making use of various testbeds. The simulations and online experiments are used to validate the applicability of the suggested mechanisms for modern markets and to the different entities populating them.

2015

Improving Comparison Shopping Agents’ Competence Through Selective Price Disclosure

2015

Chen Hajaj, Noam Hazon, and David Sarne

Electronic Commerce Research and Applications

A new approach for increasing the attractiveness of comparison shopping agents.The approach, selective price-disclosure, affects the buyer’s decision-making.Two methods for selective price-disclosure for fully-rational agents are given.The effectiveness and efficiency of the methods are demonstrated using real data.A third method is experimentally shown to be highly effective with people. The plethora of comparison shopping agents (CSAs) in today’s markets enables buyers to query more than a single CSA when shopping, and an inter-CSAs competition naturally arises. We suggest a new approach, termed "selective price disclosure", which improves the attractiveness of a CSA by removing some of the prices in the outputted list. The underlying idea behind this approach is to affect the buyer’s beliefs regarding the chance of obtaining more attractive prices. The paper presents two methods, which are suitable for fully-rational buyers, for deciding which prices among those known to the CSA should be disclosed. The effectiveness and efficiency of the methods are evaluated using real data collected from five CSAs. The methods are also evaluated with human subjects, showing that selective price disclosure can be highly effective in this case as well; however, the disclosed subset of prices should be extracted in a different (simplistic) manner.

DOI
Automatic Anatomical Shape Correspondence and Alignment Using Mesh Features

2015

Tal Darom, Yaniv Gur, Chen Hajaj, and Yosi Keller

Biomedical Imaging (ISBI), 2015

In this work, we propose a fully automatic and computationally efficient group registration approach for sets of three-dimensional models represented as mesh objects. Our approach is based on agglomerating the set of pairwise model-to-model rigid registrations by a robust spectral synchronization scheme. The pairwise registration is computed using spectral graph matching applied to meshes via the LD-SIFT local mesh features. We applied the proposed scheme to sets of subcortical surfaces, and it was shown to provide accurate and robust registration results.
Strategy-Proof and Efficient Kidney Exchange Using a Credit Mechanism

2015

Chen Hajaj, John P Dickerson, Avinatan Hassidim, Tuomas Sandholm, and David Sarne

In proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

We present a credit-based matching mechanism for dynamic barter markets—and kidney exchange in particular—that is both strategy proof and efficient, that is, it guarantees truthful disclosure of donor-patient pairs from the transplant centers and results in the maximum global matching. Furthermore, the mechanism is individually rational in the sense that, in the long run, it guarantees each transplant center more matches than the center could have achieved alone. The mechanism does not require assumptions about the underlying distribution of compatibility graphs—a nuance that has previously produced conflicting results in other aspects of theoretical kidney exchange. Our results apply not only to matching via 2-cycles: the matchings can also include cycles of any length and altruist-initiated chains, which is important at least in kidney exchanges. The mechanism can also be adjusted to guarantee immediate individual rationality at the expense of economic efficiency, while preserving strategy proofness via the credits. This circumvents a well-known impossibility result in static kidney exchange concerning the existence of an individually rational, strategy-proof, and maximal mechanism. We show empirically that the mechanism results in significant gains on data from a national kidney exchange that includes 59% of all US transplant centers.

2014

Advanced Service Schemes for a Self-Interested Information Platform

2014

Chen Hajaj, David Sarne, and Lea Perets

Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems

This paper deals with platforms that bring together agents and opportunities of the type in which they are interested (e.g., eCommerce platforms, used car bulletins and dating web sites). It shows that the platform can benefit from not necessarily listing all of the opportunities that it can potentially list, even if there is no marginal cost for listing any additional opportunity. Two important implications of this result are discussed and demonstrated. The first is that the platform should take into account the option to disclose different subsets of opportunities, whenever setting its expected-profit-maximizing service fee. The second is that from the platform’s users point of view, it might turn out to be more beneficial to pay for the platform’s service rather than get it for free (e.g., when the service is sponsored by ads), as the costly case is characterized by more favorable listings.
Ordering Effects and Belief Adjustment in the Use of Comparison Shopping Agents

2014

Chen Hajaj, Noam Hazon, and David Sarne

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2014)

The popularity of online shopping has contributed to the development of comparison shopping agents (CSAs) aiming to facilitate buyers’ ability to compare prices of online stores for any desired product. Furthermore, the plethora of CSAs in today’s markets enables buyers to query more than a single CSA when shopping, thus expanding even further the list of sellers whose prices they obtain. This potentially decreases the chance of a purchase based on the prices outputted as a result of any single query, and consequently decreases each CSAs’ expected revenue per-query. Obviously, a CSA can improve its competence in such settings by acquiring more sellers’ prices, potentially resulting in a more attractive “best price”. In this paper we suggest a complementary approach that improves the attractiveness of a CSA by presenting the prices to the user in a specific intelligent manner, which is based on known cognitive-biases. The advantage of this approach is its ability to affect the buyer’s tendency to terminate her search for a better price, hence avoid querying further CSAs, without having the CSA spend any of its resources on finding better prices to present. The effectiveness of our method is demonstrated using real data, collected from four CSAs for five products. Our experiments with people confirm that the suggested method effectively influence peoplein a way that is highly advantageous to the CSA.
Strategic Information Platforms: Selective Disclosure and the Price of Free

2014

Chen Hajaj, and David Sarne

Proceedings of the Fifteenth ACM conference on Economics and Computation

This paper deals with platforms that provide agents easier access to the type of opportunities in which they are interested (e.g., eCommerce platforms, used cars bulletins and dating web-sites). We show that under various common service schemes, a platform can benefit from not necessarily listing all the opportunities with which it is familiar, even if there is no marginal cost for listing any additional opportunity. The main implication of this result is that platforms should extract their expected-profit-maximizing service terms not based solely on the fees charged from users, but they should also use the subset that will be listed as the decision variable in the optimization problem. The analysis applies to four well-known service schemes that a platform may use to price its services. We show that neither of these schemes generally dominates the others or is dominated by any of the others. For the common case of homogeneous preferences, however, several dominance relationships can be proved, enabling the platform to identify the schemes that should be used as a default. Furthermore, the analysis provides a game-theoretic search-based explanation for a possible preference of buyers to pay for the service rather than receive it for free (e.g., when the service is sponsored by ads), a phenomena that has been justified in prior literature typically with the argument of willingness to pay a premium for an ad-free experience or more reliable platforms. The paper shows that this preference can hold both for the users and the platform in a given setting, even if both sides are fully strategic.

DOI

2013

Search More, Disclose Less

2013

Chen Hajaj, Noam Hazon, David Sarne, and Avshalom Elmalech

In proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence

The blooming of comparison shopping agents (CSAs) in recent years enables buyers in today’s markets to query more than a single CSA while shopping, thus substantially expanding the list of sellers whose prices they obtain. From the individual CSA point of view, however, the multi-CSAs querying is definitely non-favorable as most of today’s CSAs benefit depends on payments they receive from sellers upon transferring buyers to their websites (and making a purchase). The most straightforward way for the CSA to improve its competence is through spending more resources on getting more sellers’ prices, potentially resulting in a more attractive "best price". In this paper we suggest a complementary approach that improves the attractiveness of the best price returned to the buyer without having to extend the CSAs’ price database. This approach, which we term "selective price disclosure" relies on removing some of the prices known to the CSA from the list of results returned to the buyer. The advantage of this approach is in the ability to affect the buyer’s beliefs regarding the probability of obtaining more attractive prices if querying additional CSAs. The paper presents two methods for choosing the subset of prices to be presented to a fully-rational buyer, attempting to overcome the computational complexity associated with evaluating all possible subsets. The effectiveness and efficiency of the methods are demonstrated using real data, collected from five CSAs for four products. Furthermore, since people are known to have an inherently bounded rationality, the two methods are also evaluated with human buyers, demonstrating that selective price-disclosing can be highly effective with people, however the subset of prices that needs to be used should be extracted in a different (and more simplistic) manner.

2012

Three Dimensional Group Registration of Mesh Objects

2012

Chen Hajaj

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

No publications found