There is a common thought of databases being either Relational, Object-based, or NoSQL databases.
All these types are widely used in every sphere of life, getting optimized and improved over time and helping us store data and utilize it.
As modern technology develops rapidly, we strive to develop technologies more suitable for our needs.
With the advancement in Machine Learning, we have faced challenges related to the speed and accuracy of real-time predictions. At the same time, more and more approaches in Machine Learning are using vector embeddings. Vectors are now everywhere in Machine Learning. Whether you work with text, images, videos, or behavioral data, you can transform and represent this data as vector embeddings. Some companies came up with an idea to create a better-suited solution for working with this kind of data and introduced vector databases. One of the solutions that we found and decided to test is Pinecone — a fully managed vector database.
Maybe vector databases are new to you, but you have probably been using them without your knowledge once you searched on Google or while you were shopping online.
Pinecone is a fully managed service that provides vector search (or similarity search) with a straightforward API. Such a service lets you index as many vectors as you want and search vector representation of data to find items in close proximity to the query. This means that not only it stores the embeddings in the vector index, but also performs the similarity search. The base for the similarity search is the vector embeddings.
Personal item recommendations, finding relevant ads, best image/article match, …
All these are Machine Learning problems, and although they represent very different problems, from the Pinecone perspective they all look the same. The only thing that matters for Pinecone is the vectors and similarities between them. So it doesn’t matter whether you upload image vector embeddings, text vector embeddings, or user vector embeddings, once you do the real-time inference, you will retrieve the vectors from the vector index that are most similar to the query vector.
Sometimes you may have more than one model (e.g. user model and product model) that will generate (user and product) embeddings in the same vector space. This way you could index all the product embeddings and perform queries based on user embeddings. Still, Pinecone only cares which pairs of vectors have the highest similarity score and retrieves the ids of those items.
Getting Started with Pinecone
Whether you want to build a simple application with a few thousand vectors or a robust system with millions of vectors to query in real-time, the approach is the same.
Pinecone provides an API that is simple to use and that lets you launch a distributed similarity search service in a few lines of code. Depending on the size of data, the vector database scales, but all of that distributed infrastructure is behind the scenes – no need to worry about it.
After initializing Pinecone within the app, you can make calls to upload new vectors, fetch existing vectors or query any vector to retrieve the most similar items that are already indexed.
But all of these steps (upload, fetch, or query) require vector embeddings. How do you create them? There are many embedding models today for all kinds of data: BERT for text, ResNet for images, VGGish for audio, etc. Besides these pre-trained models, you can create your custom model and train with your data. Their input is raw data, while output is the vector embedding that can be uploaded to the vector search index and further used for vector search.
Let’s say we want to build an image search app. We will use an embedding model and create vector embeddings for the images we have. Once created, we will upload them into the Pinecone vector index. When we want to test the service using a new image, we need to retrieve its embedding first. It is important to use the same embedding model to generate the vector embedding for this image. Only after we create the embedding, we can query the Pinecone to retrieve most similar images from the vector index.
Pinecone vs. Model Inference
Generating embeddings may take some time depending on the model that you use. But once you generate and index them, you practically have a deployed model. The vector index can be easily scaled by adding new vectors at any time. This “model deployment” could be done in minutes and with a few lines of code. Still, it gives fast and accurate results.
It also gives results comparable to results that a trained model would give. If you use the model inference instead of the similarity search, the deployment process will be more time-consuming and the solution not so easily scalable. Also, the model inference is slower than Pinecone vector search. This is more obvious once you query multiple vectors at the same time. Try it out, you will be amazed how fast Pinecone retrieves results compared to the model!
Let’s see an example: Imagine you are building an application for anomaly detection. You can do this by training a classifier, deploying the trained model, and using it inside your application. Or you can create the model that will produce embeddings for each event that you have stored, and upload these embeddings into the vector index. This way you can constantly add new records to the index, or in case you decide to change the model, you simply create a new vector index and index all of the new vectors. By retrieving the top K nearest neighbors from the vector index, you can check whether a new event is an anomaly or not.
What is great about vector databases and Pinecone in specific, is that any ML problem that can transform raw data into vector embeddings can be implemented using this service. With a simple similarity search, you can build search engines, recommendation systems, document/information retrieval, chatbots, and many other applications that involve searching. Vector data is easy to search. And this idea can make many applications better.
You may think why not implement similarity search in some other way. Why not use cosine similarity and perform a similarity search within the application. Well, this might work if you have a really small number of items. Remember, for every item you would create a vector, and whether you decide to store them in memory or in a database, once you want to retrieve results for a specific item, you would have to run a brute-force approach that requires computing similarities between the query item and all other items that you have. So if you have 1000 items, that might be fine, but once you have more data, you would not be happy with how much time your application needs to calculate and retrieve results. That’s why a service like Pinecone would be a much better choice. It uses the approximate nearest neighbors (ANN) instead of the exhaustive search and this gives much better performance results.
By now you probably understand that all the logic behind the similarity computing is hidden and computed on the backend side of this API. It’s a black box that lets you pick a similarity algorithm and does the rest for you. And it works great. Believe us, we tried it!
We have not run into many downsides so far. Let’s just mention a few:
- Currently, Pinecone provides only a Python API Client. Reportedly the REST API will be available in September, and Go and Java clients in September or October.
- There are several open-source solutions using the same similarity search approach, if you don’t mind building out and maintaining the distributed infrastructure yourself.
- The quality of vector search results highly depend on the quality of the vector embeddings. As the saying goes, “Garbage in, garbage out.”
Vector databases are certainly becoming more and more popular. It is not only the biggest IT companies that use this concept to provide their services, but small companies like Pinecone that provide SaaS solutions to anyone.
In this article, we presented a short overview of our experience working with the Pinecone vector database. Our impression is that it is a fast, reliable, and elegant way to solve ML problems. If this is something that you find interesting, if you want to create a similarity search application or simply play and explore this exciting ML domain, make sure to visit Pinecone.
Written by: Anđela Kojanić
February 10, 2022