Open Positions
Tech Blog
Case Study
News

Introducing New Open-Source Model for Generating User Queries in E-commerce

Milutin Studen

Mar 12, 2025

In the dynamic world of e-commerce, search functionality is a crucial component of the user experience. When customers type queries like “blue Nike Air Force 1” into a search bar, they expect instant, accurate results. However, many existing language models (LLMs) designed to generate these queries are either too expensive or not optimized for real-world search behavior.

That’s why we’re proud to unveil our latest innovation by our ML team (Mentor: Milutin Studen, Engineers: Petar Surla, Andjela Radojevic): an open-source machine learning model designed to generate user queries for e-commerce platforms. Built on the foundation of the T5 model, this solution delivers more natural, concise, and effective search queries, helping users find exactly what they’re looking for.

The Problem We’re Solving

E-commerce platforms rely heavily on user queries to connect customers with products. However, many existing query-generation models produce overly literal or unnatural queries, such as “What shoe sizes does Nike Air Force 1 have?” instead of more intuitive queries like “blue Nike Air Force 1.” This disconnect can result in building models and search systems with poor quality, ultimately leading to a frustrating user experience. These generated queries often fail to effectively enhance search capabilities.

To tackle this problem, we developed an open-source model designed specifically to generate realistic, user-aligned queries that reflect actual search behavior. Our model helps build better datasets, improve query suggestions, and enhance the overall performance of search systems, offering a cost-effective and high-quality alternative to expensive solutions.

The Solution

For fine-tuning, we started with a pre-trained T5 model specifically designed for query generation.

Through extensive experimentation, we developed several iterations of the model, each optimized for different input configurations.

Here are the key models we created:

After testing, T5-GenQ-TDC-v1 emerged as the top performer, consistently generating user queries that align with real-world search behavior.

How We Built It

The project was divided into four main phases:

Data Preprocessing
We created a custom dataset using Amazon Reviews, carefully curating and processing the data to ensure it was suitable for training. This dataset became the foundation for our model’s training process.
Training
Using the preprocessed dataset derived from Amazon Reviews, we fine-tuned this model to generate user queries. The dataset was split into training and testing sets to ensure robust evaluation. To measure the model’s performance, we used the RougeL metric, which helped us assess the quality of the generated queries. Throughout the process, we experimented with various input text combinations to optimize the model’s accuracy and effectiveness.
Evaluation
To measure success, we created a new dataset containing queries generated by both our model and the base model. We calculated various metrics to compare performance and determine which model delivered better results.
Analysis
We analyzed the results, creating visualizations and graphs to highlight where our model outperformed the base model. These insights were crucial in refining the final version of the model.

Why This Matters

This model generates user queries that are significantly better than those produced by the base model. This improvement has the potential to improve search functionality on e-commerce platforms, making it easier for users to find the products they’re looking for.

Experiment

To assess the performance of our fine-tuned query generation model, we conducted an additional experiment on a dataset containing real user queries, which was not part of the fine-tuning data. The goal was to verify the model’s performance and effectiveness on real user queries for e-commerce products. The fine-tuned model outperforms the base model, which indicates that the fine-tuned model generates queries that are more similar with the real user queries, making it a better fit for e-commerce applications.

What’s Next?

After fine-tuning the model, the next steps involve evaluating its performance in different ways, experimenting with different configurations, and adapting it for new tasks. Deploying the model into real-world applications and continuously improving it with fresh data are also key. These ongoing iterations will ensure the model remains effective and continues to improve over time.

Explore the Project

If you’re as excited about this project as we are, you can explore the details yourself! Check out the following resources:

MODEL: smartcat/T5-GenQ-TDC-v1
REPO: https://github.com/smartcat-labs/GenQ
DATASET: smartcat/Amazon-2023-GenQ · Datasets at Hugging Face

Related blog posts:

Motiv turned platform deprecation into product innovation. Here’s how.

Major tech players have retired hundreds of solutions over the years. Google alone has deprecated more widely used software than most companies create in their lifetime. Whether or not we liked those services, the truth is that it’s never easy to switch. Being a part of a larger ecosystem makes it even harder, especially if […]

Jan 22, 2025

AIDA

Cloud

8 common challenges of AI adoption and how to solve them in one month

Before your organization adopts AI, it needs a data strategy. Yet, only 32% of companies have it, which is concerning. Download our whitepaper to find out more about AIDA from a technical perspective or schedule a demo with our business analyst. At the very least, you’d get a feeling there are simple solutions to seemingly complex problems.

SmartCat

Apr 25, 2023

Engeenering

Setup CUDA and cuDNN for Machine Learning

Setting up CUDA and cuDNN for Machine Learning can be a challenging process. This guide walks you through the steps to install CUDA and cuDNN on your system, ensuring it’s correctly configured for machine learning tasks.Whether you’re an experienced developer or a beginner in the world of machine learning, this tutorial will help you get […]

Milutin Studen

Jul 03, 2024

Introducing New Open-Source Model for Generating User Queries in E-commerce

The Problem We’re Solving

The Solution

How We Built It

Why This Matters

Experiment

What’s Next?

Explore the Project

Stay up to date!

Related blog posts:

Motiv turned platform deprecation into product innovation. Here’s how.

8 common challenges of AI adoption and how to solve them in one month

Setup CUDA and cuDNN for Machine Learning