Unlock knowledge faster: set up your first RAG LLM on Azure today

Articles Published on October 14, 2024

In this article, we guide you to kickstart your AI exploration, paving the way for more advanced, tailored applications with Fieldbox’s expertise.

Almost every company relies on a wide range of documents to operate effectively. These documents capture capital knowledge, best practices, rules and guidelines. They can also log the day-to-day operations, like reports, incidents, and other hot live data. However, sorting through this flood of information to find relevant data can be overwhelming. While AI promises a solution, you can be wondering where to start.

Large Language Models (LLMs) such as the one powering ChatGPT can be leveraged, along with Information Retrieval (IR) approaches, to create a Retrieval Augmented Generation (RAG) chatbot. To minimize risk and quickly assess whether an LLM is the right fit, you can easily leverage a cloud platform like Azure to explore its viability.

This article presents how to set up a RAG chatbot on Azure within days, showcases the questions it raises, and presents possibilities for further exploration. This solution on Azure utilizes a mix of proprietary Azure services, open-source tools (for the front-end for instance), and Software as a Service (an API is used to query GPT-4).

Ultimately, building a functional RAG chatbot on Azure is both straightforward and quick. This solution not only helps you better understand user needs but also opens the door to exploring alternative approaches and more advanced implementations.

Understanding IR, LLM, and RAG

The emergence of Large Language Models (LLMs) promises easy interaction with both general information, and knowledge contained within proprietary documentation. In particular, LLMs exhibit powerful semantic understanding and generation capabilities.

An information retrieval (IR) system is designed to fetch relevant information from a large dataset or database in response to a user’s query. IR has been an industry topic for many years.

The IR process typically involves four key components:

The Query Rewriter is responsible for refining and optimizing the user’s original query to improve the retrieval process.
The Retriever is the core component that searches the database or corpus to find documents or pieces of information that match the query.
The Re-Ranker takes the set of documents or information pieces retrieved by the Retriever and ranks them according to their relevance to the query.
The Answer Writer generates a response based on the top-ranked documents.

RAG (Retrieval Augmented Generation) is a recently popularized mechanism that boosts the accuracy of LLM responses by integrating them with your organization’s data and knowledge base.

Essentially, the core steps of RAG are as follows: create an embedding of the query, compare this embedding to the document index and select the top k documents, aggregate these k documents and the query in a “templated prompt”, and provide this prompt to a LLM that generates the final answer.

RAG enriches IR with LLM capabilities, by integrating the principles of both retrieval-based and generation-based models to enhance the quality of information retrieval and response generation. RAG is closely related to the typical information retrieval system with the four steps (Query Rewriter, Retriever, Re-Ranker, Answer Writer), and also augmented with LLM capabilities.

In the end, RAG LLMs provide more accurate and relevant responses, tailored to specific information.

Leveraging Azure to kickstart a RAG LLM chatbot

Azure subscriptions provide a comprehensive suite of services that can be picked and chosen from, and combined to set up a RAG Chatbot quickly and easily.

For this write-up we decided to use the code provided by Microsoft Azure on the azure-search-openai-demo github repository.

This simple and effective chatbot interacts with specific documents, using Azure AI Search, and Azure OpenAI Service.

Its main objective is to help the user find information in a large textual corpus composed of multiple documents.

The chatbot has a “conversation” mode which uses the conversation history and the user’s last question to generate a search query. It then retrieves relevant documents from the knowledge base using Azure AI Search and generates a response based on the conversation history, the user’s original question, and the retrieved documents. The chatbot can optionally suggest follow-up questions based on the conversation history. It shows its thought process (which prompt it used) for each answer. And most importantly, it provides a link to the source document it used to generate its answers.

The frontend is built using React and the Fluent UI framework. It provides a user interface for interacting with the chatbot.

The backend, built using Python and the Quart framework, handles user requests, communicates with Azure AI Search and OpenAI Service, and manages the knowledge base.

Azure AI Search, a cloud-based search service by Microsoft Azure, offers comprehensive tools for building advanced search solutions. In particular, it provides the essential components for implementing production-ready Retrieval-Augmented Generation (RAG) systems, including efficient document indexing and powerful query processing capabilities.

Azure OpenAI Service permits calls to the OpenAI LLM of your choice, for understanding and generating natural language.

Other Azure services can be used, like Azure Blob Storage (Storage service used for storing documents and images) or Azure Active Directory (Authentication and authorization service).

Building a RAG LLM step by step

The documentation that comes with the GitHub repo is comprehensive, here is a summary of its main steps, and a couple of twists that can be applied.

set up the Azure account permissions
set up a local environment and deploy the application using the Azure Developer CLI azd
create the embedding database from the proprietary pages. Various options are possible at this stage. We chose to use PDF as a base format for the documents, as it is a widely spread format for internal information, and it makes it easy for the chatbot to cite the source of its answers. Also, we used a local PDF parser rather than the Azure native tool, in order to have more control on the chunking and embedding steps of the process.

For our experiment, we used an extract of Fieldbox’s internal documentation hosted in our “Science and Technology” workspace.

At this stage, the application is ready to use on custom data!

As an illustration, here are a few examples of queries that we would like to use our chatbot for.

I need to start a computer vision project from scratch, what should I know?
What is a foundation model?
What is the URL of the Fieldbox MLflow server?

Here is the home page of the chatbot:

Chatbot

Here is the answer to a genuine question: “Can you list open audio datasets?”. Note that the answer refers to internal documentation only, and correctly links to the relevant page of the documentation.

Code

How to evaluate the quality of the answers

At this stage, the key challenge lies in ensuring that the system composed of the LLM and RAG retrieves and generates the most accurate and relevant answers.

We built an evaluation benchmark with a direct and simple approach: create manually a list of test questions, a “ground truth” answer, and use human evaluation of the results of the chatbot.

We decided to split our questions into several categories:

Basic Retrieval: when the information is contained in a single, easily identifiable paragraph.
Conditional Retrieval: similar to basic retrieval, but with a filtering criterion.
Aggregation: requires aggregating and summarizing a long text, or multiple paragraphs from several sources.
Semantic Retrieval: requires a high semantic understanding of the question.

And for metrics, we chose to evaluate the answers on three axes.

First, the “retrieval score”: did the chatbot retrieve the relevant document from the knowledge base ?
Second, the “final answer score”: is the formulation of the answer satisfactory?
Third, the “hallucination score”: was there any hallucination in the answer?

Those three indicators were simply evaluated by a human on a scale of three possibilities: “good, average, bad”.

On our simple test of about fifteen questions, more than two thirds had a satisfactory retrieval score, almost two thirds had a satisfactory answer score, and only one question had a hallucination in its answer. These metrics are rather encouraging for a rather moderate and direct setup effort.

There are more elaborate ways to assess performance of RAG systems, for instance the Azure Team provides a sample code repository called ai-rag-chat-evaluator to evaluate the demo app with pre-built metrics.

Enhancing the performance of your RAG chatbot

There are many ways to enhance the chatbot.

Some are related to the core RAG steps :

Improving the knowledge database / index design – The current index creation pipeline uses a rather simple split of the documents in chunks of fixed size. Correctly designing the content of the index is a key enhancement.
Improving the document search – The way the documents are retrieved correspond to a simple vector search similarity. It can be improved by adding additional descriptions to each chunk (e.g. adding keywords), or by using more advanced Azure services like cognitive search.
Creating a better prompt – Rather than simply aggregating the documents that were retrieved to generate the prompt, some processing can be performed, such as re-ranking and/or filtering.
Enhancing the “standard” RAG pipeline by building on the initial query – It is possible to add a first step just after the user query was defined, and use a first LLM call to rephrase it. Subsequently, the vector database would be queried with both the initial query and the alternate query generated by the LLM. The top-k results of the two queries would then be merged to generate the final answer.
Query routing – Another option consists in splitting the knowledge database in various distinct databases, all of them consisting in specific domain knowledge. The initial query would then be examined to define the kind of knowledge that should be examined to only request the corresponding vector database.

Some enhancement will come from providing the user with easy feedback formulation, and from registering all the calls made to the chatbot, creating an archive with the queries / answers.

Some recent techniques such as GraphRAG could also be looked into. GraphRAG is an advanced version of RAG that builds knowledge graphs from your data to provide more contextually rich answers, particularly useful in complex domains, involving interconnected data. However, this comes with additional processing / computing costs that need to be assessed beforehand (typically a factor 10 as usually stated in August 2024), and can be overkill for the simpler queries that RAG addresses directly.

Additionally to Azure Services, some open-source frameworks are gaining popularity and traction and it can be useful to consider them as complements. LlamaIndex and Langchain in particular are emerging as very well maintained and widely used open-source frameworks, with an appropriate non-contaminating usage license (MIT).

Start your RAG LLM journey now

In today’s data-heavy world, rapidly accessing the right information is critical to staying competitive. With the RAG chatbot setup on Azure, you gain a fast, efficient way to tap into your knowledge base and surface the most relevant insights, all with the power of LLMs.

This Azure-based solution not only allows you to assess whether RAG is suitable for your needs but also opens the door to broader exploration.

Once your chatbot is operational, you can explore more advanced applications like tailored conversational agents for industry-specific data or even full-fledged business applications powered by LLMs.

Fieldbox is here to support you throughout this journey—from rapidly setting up a prototype to scaling it into a sophisticated tool that matches your specific use cases.

Want to explore real-world use cases where Fieldbox is transforming businesses with generative AI? Download Fieldbox’s white paper on Generative AI to discover how these technologies can transform your business and see how companies are already achieving industrial excellence through AI-driven innovation.

At Fieldbox, we are dedicated to guiding you through this RAG LLM journey, helping you navigate the complexities of AI, and ensuring a seamless integration of these powerful tools into your business.

Article contributors

Brendan L'Ollivier Fabien Daniel Julien Budynek