A Multimodal Approach for Measuring Item Similarity

Multimodal AI that fuses computer vision and NLP to measure item similarity the way humans do — powering smarter recommendations in tourism, e-commerce, and real estate.

Multimodal Item Similarity

Multimodal Item Similarity

Fusing computer vision and NLP to measure item similarity the way humans do — for tourism, e-commerce, and real estate recommendations.

When humans judge similarity, they consider appearance, function, atmosphere, and context simultaneously. Simple text matching or category tags miss most of this. We develop multimodal AI that fuses computer vision and NLP to measure item similarity the way people actually experience it.

The Challenge

Two hotels can look similar in photos but feel completely different in reviews. Two tourist destinations can share atmosphere despite different geographies. Traditional similarity metrics capture one dimension at a time — and fail at the nuanced comparisons that drive real user decisions.

Our Approach

  • Visual Feature Extraction — Deep convolutional networks and vision transformers extract semantic features (architectural style, landscape type, activity level, color palette) directly from images without manual annotation.
  • Textual Semantic Analysis — Large language models encode review text and descriptions to capture cultural character, climate, offerings, and subjective user sentiment as dense semantic vectors.
  • Multimodal Fusion — Cross-attention mechanisms align visual and textual embeddings into a unified similarity space, validated against human expert judgments across thousands of item pairs.

Applications

  • Tourism & Travel Recommendations — Suggesting alternative destinations when a user's top choice is unavailable, matched on experience rather than category. Validated against real booking patterns.
  • E-commerce & Real Estate — Finding visually and contextually similar products or properties at different price points, improving discovery and reducing search abandonment.
Our AI analyzes visual and textual features of destinations to build similarity judgments that match human intuition across diverse travel contexts.

Related Publications

2024

  1. Warm Recommendation: Enhancing Cold Start Recommendations Using Multimodal Product Representations
    2024
    Anat Goldstein, Amit Alony, and Chen Hajaj
    International Conference on Information Systems (ICIS)
  2. Measuring Flight-Destination Similarity: A Multidimensional Approach
    2024
    Anat Goldstein, and Chen Hajaj
    Expert Systems with Applications