A Multimodal Approach for Measuring Item Similarity
Multimodal AI that fuses computer vision and NLP to measure item similarity the way humans do — powering smarter recommendations in tourism, e-commerce, and real estate.
When humans judge similarity, they consider appearance, function, atmosphere, and context simultaneously. Simple text matching or category tags miss most of this. We develop multimodal AI that fuses computer vision and NLP to measure item similarity the way people actually experience it.
The Challenge
Two hotels can look similar in photos but feel completely different in reviews. Two tourist destinations can share atmosphere despite different geographies. Traditional similarity metrics capture one dimension at a time — and fail at the nuanced comparisons that drive real user decisions.
Our Approach
- Visual Feature Extraction — Deep convolutional networks and vision transformers extract semantic features (architectural style, landscape type, activity level, color palette) directly from images without manual annotation.
- Textual Semantic Analysis — Large language models encode review text and descriptions to capture cultural character, climate, offerings, and subjective user sentiment as dense semantic vectors.
- Multimodal Fusion — Cross-attention mechanisms align visual and textual embeddings into a unified similarity space, validated against human expert judgments across thousands of item pairs.
Applications
- Tourism & Travel Recommendations — Suggesting alternative destinations when a user's top choice is unavailable, matched on experience rather than category. Validated against real booking patterns.
- E-commerce & Real Estate — Finding visually and contextually similar products or properties at different price points, improving discovery and reducing search abandonment.
Our AI analyzes visual and textual features of destinations to build similarity judgments that match human intuition across diverse travel contexts.
Related Publications
2024
- Warm Recommendation: Enhancing Cold Start Recommendations Using Multimodal Product Representations2024International Conference on Information Systems (ICIS)
- ★ Measuring Flight-Destination Similarity: A Multidimensional Approach2024Expert Systems with ApplicationsE-tourism websites offer users a vast array of travel destinations and opportunities, necessitating tools that enable destination comparison and intelligent search capabilities. One key requirement for such tools is the ability to measure the similarity between destinations. Over the years, various similarity measurement techniques have been proposed, including user-based and content-based approaches. However, many of these techniques require data preparation or prior domain knowledge from experts. In contrast, this study proposes an innovative approach that requires no prior domain knowledge of flight destinations or their relationships, and utilizes only readily available data. Our approach draws upon concepts from image recognition and natural language processing (NLP) to extract hidden aspects of destinations. Using data from a flight-search website as a testbed, we analyze similarity metrics based on state-of-the-art methods for image recognition, NLP, and product-network analysis. We then compare these metrics to those obtained by human subjects. Our findings suggest that no single method dominates in all aspects, leading us to propose a hybrid method that leverages the strengths of each. The proposed method can be readily applied to measure product similarity in other domains.Abstract DOI