RAG is all the rage now! Haven’t heard about it? In this blog, we precisely help you kickstart your Generative AI Application development Journey and how to build a Retrieval Augmented Generation (RAG) App using MongoDB Atlas and Fireworks AI. Further we’ll discuss how to optimize the architecture to achieve better cost and performance.
In a post-ChatGPT world, hearing about a new AI advancement or a Large Language Model (LLM) as a developer is as common as a new Javascript framework. Playing around with LLMs is fun, but creating AI-enabled experiences is the real deal in skill building for any developer.
Building apps or experiences on pre-trained LLMs has limitations. GPT, Claude, Llama and Mixtral have their knowledge/learning cutoff at a specific date. Methods to add custom knowledge, like fine-tuning, also have restrictions, like cost and knowledge limits that are restricted to training data.
MongoDB and Fireworks have partnered together to help enterprises build the next generation of scalable, secure cost-effective RAG applications grounded in their operational data.
Retrieval Augmented Generation (RAG) combines best of both worlds by leveraging a retrieval component to fetch relevant information from a database (or a vector store) and a generative component (LLM) to synthesize and generate a coherent response to the user query.
Through a RAG Architecture, LLMs get a second brain and the ability to fetch relevant and up-to-date info and turn the LLM into a real-time response generation engine that is grounded in your own data.
Some more reasons that make a RAG App special are:
**Data Efficiency: **RAG is data-efficient because it dynamically pulls in relevant data or information that may not have been seen during training. This saves time, effort and money compared to more data-hungry solutions like fine-tuning, which also often demand specialist, hard to find skills.
**Flexibility: **RAG enables dynamic updating of underlying knowledge bases or documents, making it easier to maintain the model without regular retraining. It is beneficial when the domain information is changing frequently, like with stock prices, weather etc.
A RAG Architecture consists of a Large Language Model to synthesize and submit the query to a data store (also can be called a vector store). The vector store then returns the relevant vector chunks as response to the LLM’s initial request. The LLM absorbs the response into its context and generates a response relevant to the user’s query.
As mentioned in the beginning of the blog, we’ll create a RAG-based app which recommends movies based on the user’s query. We are going to build a model to index and retrieve movie recommendations. The example will be built on top of MongoDB and Fireworks AI and involves:
💡Note: You can further learn more about optimizing RAG architecture. We have some helpful tips to reduce costs, improve throughput, add batching, and introduce function calling. These options help customize and scale your RAG architecture to suit your specific needs.
While this tutorial focuses on building a basic RAG Pipeline, we have guides to build optimized RAG architectures that can be further customized and scaled to suit various needs. For example:
Note: You can follow the tutorial using the Notebook
Before we dive into the code, make sure to set up your environment. This involves installing necessary packages like pymongo
, fireworks-ai
and openai
.
12
Note: We use the OpenAI Python SDK because it’s compatible with the Fireworks SDK
To interact with Fireworks AI and MongoDB Atlas Cluster, we need to initialize their respective clients. Replace "FIREWORKS_API_KEY" and "MONGODB_URI" with your actual credentials.
You can create and pick up the MongoDB URI from the MongoDB Atlas Cluster following the steps below.
After creating your account at Fireworks.ai, you can find the Fireworks API_Key under Account Settings
-> API Keys
12345678910111213
Fireworks serves many state-of-the-art embedding models. Here are the full list of models Fireworks support.
In this blog, we are using the Nomic AI Model as one example to generate embeddings from the document corpus, specifically the [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5)
variant. The function generate_embeddings
below takes a list of texts and returns embeddings.
1234567891011121314151617
We will be adding more OSS embedding models as the space evolves, please check the fireworks.ai website for the most up to date list of embedding models.
Now, let's process our movie data through the generate_embeddings
function created above.
We'll extract key information from our MongoDB collection and generate embeddings for each movie. Ensure NUM_DOC_LIMIT here is set to limit the number of documents processed.
12345678910111213141516171819202122232425
For our system to efficiently search through movie embeddings, we need to set up a search index in MongoDB. Define the index structure as shown:
1234567891011121314
Let's test our recommender system. We create a query for superhero movies and exclude Spider-Man movies, as per user preference.
123456789101112131415161718192021
Finally, we use Fireworks' chat API to generate a personalized movie recommendation based on the user's query and preferences.
123456789101112131415161718192021222324252627
We successfully built a movie recommendation system RAG using Fireworks, MongoDB, and the nomic-ai embedding model.
While this tutorial focuses on building a basic RAG Pipeline, we have guides to build optimized RAG architectures that can be further customized and scaled to suit various needs. For example: