How to optimize the costs of using AI at your online marketplace



Mar 11, 2024

This age will be remembered as the age when AI spoke, sang, and painted. Yes, even though AI has been present for years in various forms (predictions, recommendations, analytics, your Google maps…) – it’s only come into focus through generative AI technologies.

Large Language Models, such as GPT, can effectively mimic human language. That’s not their only use, however. 

The algorithm behind it can summarize and analyze swathes of data in seconds. It can analyze product reviews, large chunks of text and messages to detect potential fraud attempts.It provides near-human-like support, but also images and paintings. It enables advanced search that doesn’t rely on exact matches, but on intents and similarities (more about in the article).

And as of September 2023, the LLMs have gotten the ability to take text-audio-video-image input and provide the same output. 

However, cutting edge always comes at a hefty cost and every ounce of optimization matters. In this article, you’ll discover how much it costs to run an LLM and why simple token pricing isn’t enough.

The reality of costs of using LLMs in online marketplaces

Major players are already implementing Large Language Models (LLMs) and generative AI. Chances are, you too are experimenting with AI-driven support, search, fraud protection and other functionalities.

However, the bigger the business, the more it can afford experimentation and increased budgets. Even though some implementation mechanisms might not be the most effective solution, they can afford it longer.

For any other player that’s not a big corporation like Amazon or Alibaba, it’s imperative to consider every optimization parameter, especially as the volume of data continues to grow.

There are 4 ways your LLM can cause costs for your company.

  1. Prompt and response processing

When your users type in requests (prompts) it takes resources to compute and get responses. According to the official OpenAI website, 1M tokens for their best model (GPT-4) cost $10 for input and $30 for the output.

To make things a bit clearer, 1000 tokens is around 750 words.

In other words, to input 750 words and get 750 words in return, it will cost you $0.09. This may not sound much if your users are the only ones who communicate with the AI. And if they’re doing it once per month…

Yet, GenAI on marketplaces usually entails frequent communication in the backend, especially if chatbots aren’t the only tool you’re implementing.

Did you know that adding “Be concise” to your prompts can save you up to 90% of costs for tokens?

  1. Fine-tuning costs

Fine-tuning means you’re taking a pre-trained model and inserting your own data sets to train it to your needs. This costs around $24 per hour. It might take you just 15 minutes… Or 72 hours to create your fine-tuned model, which is 72 x $24 = $1728

The benefits of fine-tuned models is that you get cheaper and faster responses from the model. Simple prompting requires a few shots before you get the response you wanted. Tuned models are already trained to give out more precise responses.

Don’t forget expert costs for these operations.

  1. Infrastructure costs

Hosting and running LLMs is also costly. Especially if the number of monthly users is high. For example, to host an open-source LLM on AWS would cost you $7-10 per hour, or $240 per day… And $7200 per month.

You’ll need to consider both the number of people on your website per month, their behavior, and available infrastructure.

  1. Implementation costs

There are multiple standard ways to implement LLMs on your marketplace. While some may have opted in for training and developing models on their own data (and you might want to consider that OpenAI spent $3.2 million to train chat-GPT) – in reality the vast majority integrates an out-of-the-box, open-source LLM solution and uses their own data to fine-tune it to their needs.

It requires fewer development resources, and it’s much faster. Yet, even in those cases, there are many issues that cause hidden costs.

  • Scalability issues

As online marketplaces grow, so does the volume of data. The bigger the volume, the more processing power you need.

  • Database and resource control

Not all databases are optimized for AI operations. Traditional relational databases might not be the best fit for storing and retrieving heavy data with many properties required by LLMs. 

These databases can lead to slower query times and imprecise recall, affecting the quality of response.

  • Speed issues

Improper infrastructure and database choice means the response won’t be fast or precise enough, which leads to customer dissatisfaction and directly affects revenue.

Vector databases provide huge cost optimizations for AI-driven marketplaces

We already have solutions on the market that can handle many of these issues. One of those are the so called vector databases. 

For this AI tech to work (so that’s cost effective), the data needs to be stored efficiently. Data used today is high dimensional and multimodal (text, video, image, audio) and this makes it very hard to process and retrieve from traditional databases (rows).

That’s how vector databases came to be. They take all that complex data and create so-called “embeddings”. An embedding is a process of transforming information into an element in the multidimensional vector space with the help of LLMs. 

Illustration: On the left, this is how VDBs store information vs traditional databases

In other words, think of it as taking the essential information about the data and transforming it into a packaged, compressed form. 

Traditional databases store data in rows and columns and you have to use an exact match search to go through it. With the rising volume of data, the entire system slows down because it processes so many properties.

With the slower processing come greater costs.

In vector databases, the search is so much faster as it allows for searching for “packages of data” and similarities between packages. Instead of going through rows and processing all of it, the machine takes a finite set of parameters, finds the similar package, and retrieves that. The so-called similarity search is the backbone of success of these types of databases.

For example, if a user types a query searching for “small puppies”, a traditional database would return exact keyword matches – everything that contains the phrase small happy puppies. If the DB is huge, it would take a while. Especially if the database has many breeds of puppies, toys and equipment for them…

A vector database would understand the context of the query, your intent behind it, and present you with all the smiling, happy puppies. If the user is on the page for a specific breed, it would recognize that as well. 

Illustration: Information in the VDB stored closely by their similarity

Look at the illustration above. Instead of running through thousands of rows (both cars, dogs, and puppies), in the VDB, the algorithm would find and retrieve the nearest neighbor with a similar set of information.

Bearing in mind that retrieving information from a vector store is about 250 times cheaper than generating it using GPT-4, it’s easy to see why they’re becoming so popular for optimizing the cost of running an AI solution in the business.

Using vector databases in synergy with LLMs, where you send only the right packages to the LLM from a sea of data, might save you a fortune in token expenses, but more on that later.

The almighty similarity search in online retail as revenue engine

The key feature vector databases really excel at is similarity search. As we have seen in the example above, similarity search is a search algorithm used to find data points that are most similar to a query. Instead of searching for exact matches, it aims to find items that are closest or most similar to the query based on certain metrics or criteria.

In online marketplaces, where users are constantly seeking products or services that match their preferences, the ability to efficiently and accurately find similar items makes a difference between buying and bouncing.

When dealing with data with many data points – visual, audio, or text – it really provides a tremendously pleasant experience for shoppers.

Now imagine a user browsing an online clothing store and coming across a shirt they like, but it’s not quite the right color or pattern. With optimized visual search, they can upload a picture of their desired shirt, and the system will find items that visually match the uploaded image by many parameters. This enhances the shopping experience by allowing users to find products based on visual cues, not just textual descriptions.

Or let’s consider a user searching for a “lightweight summer jacket.” Traditional search might return jackets that are either lightweight or meant for summer. However, similarity search, combined with a semantic understanding of AI algorithms, understands the context and intent behind the query. It will prioritize jackets that are both lightweight and suitable for summer, ensuring that the results are more aligned with the user’s intent.

In a scenario where a user wants to find a product using both text and image, multimodal search comes to the rescue. They might upload a picture of a dress and add the text “in blue color.” The system will consider both the image and the text to return blue dresses that match the style of the uploaded image – and much faster because these characteristics are already made to be easily searchable within the vector database.

To summarize the benefits:

  • By understanding the context and nuances of queries, similarity search ensures that the results are more accurate and relevant to the user’s intent.
  • It allows users to search using various inputs, be it text, image, or a combination of both, offering a more flexible and intuitive search experience.
  • Vector databases are optimized for data with multiple parameters (high-dimensional data), ensuring quick and efficient retrieval of similar items.
  • By returning results that closely match user preferences, similarity search offers a more personalized shopping experience, leading to increased user satisfaction and higher sales. Users sift less through irrelevant results, leading to a smoother and more efficient shopping journey.

Similarity search, with its various forms and benefits, is transforming the way users interact with online marketplaces. By offering more precise, context-aware, and visually aligned results, it ensures that users find what they’re looking for with ease and efficiency.

Technical POV: Savings that come from the interplay between vector databases (VDBs) and Large Language Models (LLMs)

While LLMs provide the capability to understand and generate human-like text, VDBs ensure efficient storage and retrieval of enormous volumes of data. But how does this combination help in cost optimization?

Faster data retrieval

As we’ve covered earlier, LLMs, especially when dealing with vast amounts of data, can be resource intensive. VDBs, with their capability for quick similarity searches, ensure that relevant data is retrieved efficiently. This reduces the computational load on LLMs, leading to faster response times and much lower operational costs.

Reduced token costs

One of the significant expenses associated with LLMs is token costs. Tokens are a unit for a word, text or phrase. The more you process them, the more tokens it requires.

By integrating with VDBs, businesses can store frequently used responses or data as “embeddings” (simplified packages of data). When the user makes a similar query, instead of regenerating responses using LLMs, the system can retrieve the stored embeddings from the VDB.

Even when the connected LLM needs to process something again, you still send a “compressed package” of data, and by doing so, you reduce the number of tokens used.

Scale whenever you want

VDBs can dynamically scale based on needs. This means that as the volume of data grows, you don’t need to make significant infrastructure investments. Your LLM operations remain consistent and cost-effective, regardless of the data volume.

No more over-provisioning or under-used resources. Vector databases adjust in real-time to the demands of the system.

Reduce dependency on commercial solutions

By using VDBs and storing your embeddings, you can actually reduce your dependency on commercial LLM solutions. This can lead to significant cost saving if you have high query volumes, which is the case on online web shops with millions of users.

Direct business POV: How this way of AI implementation drives further revenue

Every decision related to technology, and user experience has a direct impact on your bottom line. Implementing advanced similarity search and vector databases isn’t just about optimizing your online marketplace in the backend. In fact, it’s more about the forefront, where real people use it. 

Direct impact on revenue is what offsets the costs.

  • In a saturated market, offering advanced search functionalities can set a platform apart from its competitors, attracting more users and vendors to the platform.

Optimize the use of AI in your online marketplace today

It’s never too late to cut down unnecessary expenses. Yet, we’re aware that this process takes time. Our advice is to do pilot testing and start with micro experiments. That way, you’ll see the benefits and avoid major problems.

Implement training and workshops for your team, because this type of technology requires thoughtful education. Implement feedback loop processes within the marketplace, so the customers can also tell you what you’re doing right, and what could be different.

You know there’s nothing better than a satisfied customer who leaves a positive review or writes an unsolicited praise to the customer support.

However, if you wish to start as soon as possible, contact Smart Cat. We have extensive experience in this area (see how we improved customer experience in a convenience stores chain or perhaps how we increased the precision of predicting churn rates by 20% for this other client).

We are also partners with the hottest vector database vendors on the market, Pinecone and DataStax. Through our workshops, you’ll not only be getting the infrastructure but also training and overall gen AI clarity.

On top of that, we’ve been implementing AI-based solutions for quite some time now and we know how to optimize your current systems.

Contact us today and grab this 360 deal. Stop the leaks in your budget or build a solid system from the ground up.

Stay up to date!

Stay at the AI frontier. Explore, learn, and subscribe for the latest in tech trends and advancements!