Hybrid Search Combines Best of Both Worlds

Introduction

Imagine managing a vast reservoir of marketing collateral – thousands of images, videos, and documents meticulously crafted to elevate your brand. The conventional approach to finding these assets, relying solely on precise file names, IDs, or file types, proved to be an arduous and time-consuming task. Users were forced to memorize an extensive inventory of asset details, a significant hurdle that stifled efficiency and creativity.

Our core challenge was clear: how could we deliver highly relevant search results with the absolute minimum requirement for users to know intricate details about their assets? This bottleneck also severely impacted the onboarding process for our client’s new team members, requiring them to first internalize the entire content library before they could become productive.

Specific Pain Points

  • Irrelevant Search Results: Traditional keyword search frequently failed to deliver the precise, relevant results users desperately needed.
  • Misleading File Names: Discrepancies between a file’s name and its actual content often led to confusion and frustration when displayed in search results.

Solution

To unlock the creative potential of marketers and provide them with instant inspiration through highly relevant content, we engineered and implemented a powerful semantic search capability. This innovation culminates in a Hybrid Search feature, which seamlessly blends the precision of standard keyword queries (such as file name, ID, or file type) with the unparalleled flexibility of natural language descriptions. Now, a user can simply type a phrase like “black car on the road,” and the system intelligently surfaces the most pertinent digital assets.

Description of the Solution

The hybrid search functions by understanding the meaning and context behind user queries, not just matching keywords. This is achieved through a sophisticated process that leverages advanced ML and vector database technologies. Users can choose which search option to use: keyword, semantic, or hybrid.

For semantic searches, the user’s natural language input is transformed into a numerical representation (a vector embedding). This “query vector” is then compared against a vast database of pre-indexed vectors representing all digital assets and their detailed descriptions. The closest matches, indicating semantic relevance, are then returned.

The magic of hybrid search lies in its intelligent fusion of this semantic understanding with traditional keyword matching algorithms, ensuring comprehensive and highly precise results for any search intent.

Specific Steps Taken

  • Enriching Digital Assets: We utilized powerful Large Language Models (LLMs) to automatically generate rich, descriptive text for each digital asset.
  • Vectorization for Understanding: Both the content of digital assets (e.g., image features, video transcripts) and their generated descriptions were converted into high-dimensional numerical representations, known as vector embeddings. User queries were also vectorized in real-time.
  • Optimized Storage: These vector embeddings were efficiently stored in OpenSearch, a high-performance vector database, enabling rapid similarity searches.
  • Intelligent Comparison: The vectorized user query was then swiftly compared with all the stored vectors representing digital assets, identifying semantic similarities.
  • Semantic Results: The system returned a ranked list of relevant results based purely on semantic understanding.
  • Hybrid Search Logic: A sophisticated algorithm was developed to intelligently combine the strengths of both keyword and semantic search, ensuring the most relevant and comprehensive results are always delivered, regardless of query style.

Unique Value Proposition

This Hybrid Search empowers users to find what they need, faster and more intuitively than ever before. It bridges the gap between how users think about content and how traditional systems require them to search, dramatically improving content discoverability, accelerating workflows, and reducing the learning curve for new team members. It’s not just about finding files; it’s about unlocking the full potential of a content library with intelligent, context-aware discovery.

Results

Key Metrics:

  • Exceptional Ranking Accuracy (NDCG@10): A Mean NDCG@10 of 0.997 demonstrates near-perfect ranking, consistently placing the most relevant results at the very top of the search output.
  • High First-Result Relevance (MRR): A Mean Reciprocal Rank of 1.000 signifies that for every single query, the most relevant item was always found as the very first result, ensuring immediate access to critical content.
  • Reduced Onboarding Time: While not directly measured by the test queries, the improved content discoverability inherent in these metrics significantly decreases the time it takes for new team members to become proficient in finding and utilizing assets. This translates to faster ramp-up times and increased productivity.

SmartTips

Enrich Before You Embed
The quality of semantic search is only as good as the descriptions generated. Before vectorizing assets, use LLMs to enrich metadata with natural language summaries or inferred topics. This significantly improves retrieval quality, especially for visual or audio content with sparse labels.

Tune Your Alpha – Balance Precision and Recall
When blending keyword and semantic scores in a hybrid search, the alpha parameter determines the weight of each method. A higher alpha favors semantic recall, while a lower alpha boosts keyword precision. Tuning this value through user testing or offline metrics ensures relevance and reliability.

Use Filtering to Supercharge Precision
Allowing users to filter by file type, category, or date narrows the dataset, improves result speed and precision, and helps users find exactly what they need.

SmartFact

Semantic search can uncover hidden connections in data that even humans might miss. A query like “eco-friendly packaging” could surface images of biodegradable materials, even if tagged only with “sustainable solutions” – bridging the gap between user intent and discovery.

Technologies Used

OpenSearch (vector database), OpenAI models – GPT-4, Go, Python

Table of Content

Back to Top
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.