Open Positions
Tech Blog
Case Study
News

Introducing New Open-Source Model for Generating User Queries in E-commerce

Milutin Studen

Mar 12, 2025

In the dynamic world of e-commerce, search functionality is a crucial component of the user experience. When customers type queries like “blue Nike Air Force 1” into a search bar, they expect instant, accurate results. However, many existing language models (LLMs) designed to generate these queries are either too expensive or not optimized for real-world search behavior.

That’s why we’re proud to unveil our latest innovation by our ML team (Mentor: Milutin Studen, Engineers: Petar Surla, Andjela Radojevic): an open-source machine learning model designed to generate user queries for e-commerce platforms. Built on the foundation of the T5 model, this solution delivers more natural, concise, and effective search queries, helping users find exactly what they’re looking for.

The Problem We’re Solving

E-commerce platforms rely heavily on user queries to connect customers with products. However, many existing query-generation models produce overly literal or unnatural queries, such as “What shoe sizes does Nike Air Force 1 have?” instead of more intuitive queries like “blue Nike Air Force 1.” This disconnect can result in building models and search systems with poor quality, ultimately leading to a frustrating user experience. These generated queries often fail to effectively enhance search capabilities.

To tackle this problem, we developed an open-source model designed specifically to generate realistic, user-aligned queries that reflect actual search behavior. Our model helps build better datasets, improve query suggestions, and enhance the overall performance of search systems, offering a cost-effective and high-quality alternative to expensive solutions.

The Solution

For fine-tuning, we started with a pre-trained T5 model specifically designed for query generation.

Through extensive experimentation, we developed several iterations of the model, each optimized for different input configurations.

Here are the key models we created:

After testing, T5-GenQ-TDC-v1 emerged as the top performer, consistently generating user queries that align with real-world search behavior.

How We Built It

The project was divided into four main phases:

Data Preprocessing
We created a custom dataset using Amazon Reviews, carefully curating and processing the data to ensure it was suitable for training. This dataset became the foundation for our model’s training process.
Training
Using the preprocessed dataset derived from Amazon Reviews, we fine-tuned this model to generate user queries. The dataset was split into training and testing sets to ensure robust evaluation. To measure the model’s performance, we used the RougeL metric, which helped us assess the quality of the generated queries. Throughout the process, we experimented with various input text combinations to optimize the model’s accuracy and effectiveness.
Evaluation
To measure success, we created a new dataset containing queries generated by both our model and the base model. We calculated various metrics to compare performance and determine which model delivered better results.
Analysis
We analyzed the results, creating visualizations and graphs to highlight where our model outperformed the base model. These insights were crucial in refining the final version of the model.

Why This Matters

This model generates user queries that are significantly better than those produced by the base model. This improvement has the potential to improve search functionality on e-commerce platforms, making it easier for users to find the products they’re looking for.

Experiment

To assess the performance of our fine-tuned query generation model, we conducted an additional experiment on a dataset containing real user queries, which was not part of the fine-tuning data. The goal was to verify the model’s performance and effectiveness on real user queries for e-commerce products. The fine-tuned model outperforms the base model, which indicates that the fine-tuned model generates queries that are more similar with the real user queries, making it a better fit for e-commerce applications.

What’s Next?

After fine-tuning the model, the next steps involve evaluating its performance in different ways, experimenting with different configurations, and adapting it for new tasks. Deploying the model into real-world applications and continuously improving it with fresh data are also key. These ongoing iterations will ensure the model remains effective and continues to improve over time.