Introducing New Open-Source Model for Generating User Queries in E-commerce
Milutin Studen
|
Mar 12, 2025
Share
In the dynamic world of e-commerce, search functionality is a crucial component of the user experience. When customers type queries like “blue Nike Air Force 1” into a search bar, they expect instant, accurate results. However, many existing language models (LLMs) designed to generate these queries are either too expensive or not optimized for real-world search behavior.
That’s why we’re proud to unveil our latest innovation by our ML team (Mentor: Milutin Studen, Engineers: Petar Surla, Andjela Radojevic): an open-source machine learning model designed to generate user queries for e-commerce platforms. Built on the foundation of the T5 model, this solution delivers more natural, concise, and effective search queries, helping users find exactly what they’re looking for.
The Problem We’re Solving
E-commerce platforms rely heavily on user queries to connect customers with products. However, many existing query-generation models produce overly literal or unnatural queries, such as “What shoe sizes does Nike Air Force 1 have?” instead of more intuitive queries like “blue Nike Air Force 1.” This disconnect can result in building models and search systems with poor quality, ultimately leading to a frustrating user experience. These generated queries often fail to effectively enhance search capabilities.
To tackle this problem, we developed an open-source model designed specifically to generate realistic, user-aligned queries that reflect actual search behavior. Our model helps build better datasets, improve query suggestions, and enhance the overall performance of search systems, offering a cost-effective and high-quality alternative to expensive solutions.
The Solution
For fine-tuning, we started with a pre-trained T5 model specifically designed for query generation.
Through extensive experimentation, we developed several iterations of the model, each optimized for different input configurations.
After testing, T5-GenQ-TDC-v1 emerged as the top performer, consistently generating user queries that align with real-world search behavior.
How We Built It
The project was divided into four main phases:
Data Preprocessing We created a custom dataset using Amazon Reviews, carefully curating and processing the data to ensure it was suitable for training. This dataset became the foundation for our model’s training process.
Training Using the preprocessed dataset derived from Amazon Reviews, we fine-tuned this model to generate user queries. The dataset was split into training and testing sets to ensure robust evaluation. To measure the model’s performance, we used the RougeL metric, which helped us assess the quality of the generated queries. Throughout the process, we experimented with various input text combinations to optimize the model’s accuracy and effectiveness.
Evaluation To measure success, we created a new dataset containing queries generated by both our model and the base model. We calculated various metrics to compare performance and determine which model delivered better results.
Analysis We analyzed the results, creating visualizations and graphs to highlight where our model outperformed the base model. These insights were crucial in refining the final version of the model.
Why This Matters
This model generates user queries that are significantly better than those produced by the base model. This improvement has the potential to improve search functionality on e-commerce platforms, making it easier for users to find the products they’re looking for.
Experiment
To assess the performance of our fine-tuned query generation model, we conducted an additional experiment on a dataset containing real user queries, which was not part of the fine-tuning data. The goal was to verify the model’s performance and effectiveness on real user queries for e-commerce products. The fine-tuned model outperforms the base model, which indicates that the fine-tuned model generates queries that are more similar with the real user queries, making it a better fit for e-commerce applications.
What’s Next?
After fine-tuning the model, the next steps involve evaluating its performance in different ways, experimenting with different configurations, and adapting it for new tasks. Deploying the model into real-world applications and continuously improving it with fresh data are also key. These ongoing iterations will ensure the model remains effective and continues to improve over time.
Explore the Project
If you’re as excited about this project as we are, you can explore the details yourself! Check out the following resources:
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!