Building a Text-based Generative MVP Using OpenAI, ChatGPT and RAG

Building a Text-based Generative MVP Using OpenAI, ChatGPT and RAG

In this article by Surjeet Singh, Software Engineer I at GeekyAnts, learn to build smart chatbots with RAG using LangChain, Streamlit, and LangSmith.

·

7 min read

Link to GitHub repo: https://github.com/surjeet176/ragBlog

In today’s digital era, communication has evolved significantly. What once required human intervention for every query is now managed efficiently by smart chatbots. This shift has enabled many companies to reduce their communication workforce and reallocate resources more effectively. By 2025, it is anticipated that over 80% of customer queries will be addressed by these intelligent AI chatbots.

In this blog, we will explore the inner workings of these chatbots through a small MVP (Minimum Viable Product). We will provide a step-by-step guide on implementing Retrieval-Augmented Generation (RAG) systems using LangChain and Streamlit. Additionally, we will leverage LangSmith to test and debug our application. By the end of this tutorial, you will have a functional chatbot that demonstrates the potential of RAG technology and its impact on the market.

Understanding RAG (Retrieval-Augmented Generation)

RAG combines retrieval mechanisms and generative models to produce high-quality, contextually relevant text. Unlike traditional models like OpenAI’s ChatGPT or Google’s Gemini, RAG retrieves relevant documents from a vector database which provides additional context to the final prompt, enabling more accurate responses.

Key components of RAG:

  • Retrieval Mechanism: It identifies and retrieves relevant documents or passages from a knowledge base using techniques such as keyword matching, semantic similarity, or machine learning-based retrieval. These docs are then passed as context to the generative models.

  • Generative Model: It generates the text based on the input prompt and the retrieved context from vector db therefore producing more accurate and informative responses.

With this understanding of RAG, we can proceed to create our chatbot using this technology.

Building Our Chatbot with RAG: Putting Theory into Practice

Before we start developing our own chatbot, lets look all the steps that are involved in the process :

Step 1 : Basic Flow of Chatbot

Step 2 : Setting up the Development Environment and Installing Requirements

Step 3 : Setting Up Necessary Environment Variables

Step 4 : Creating Basic RAG System

Step 5 : Creating Streamlit Chatbot Web App

Step 6 : Integrating LangSmith in our Application for Testing and Monitoring the Chain.

Step 1 : Basic Flow of Chatbot

The chatbot operates through a structured flow that combines retrieval-augmented generation (RAG) with OpenAI GPT-3.5-turbo LLM for contextually relevant responses. Here is how it works:

  1. 📝 User submits a question.

  2. 🔍 Chatbot retrieves relevant documents or passages from the knowledge base.

  3. 📚 Retrieved context is integrated into the input prompt for GPT-3.5-turbo LLM.

  4. 🤖 The LLM then processes the augmented input and generates a response.

  5. 💬 Chatbot delivers the response to the user.

Step 2 : Setting Up the Development Environment

We will be using Python 3.11 for the complete project. These are the following steps:

  1. Create env for Python.

  2. Activate the environment.

  3. Install all the required dependencies.

python3.11 -m venv myenv
source myenv/bin/activate
pip install langchain langchain_openai langchain_core langchain_community langchain_chroma dotenv

Step 3 : Setting Up Necessary Environment Variables

Since we will be using GPT-3.5-turbo model of OpenAI, we only need to set up the OpenAI API key as the env.

  • Create a .env file and add this variable for now.
OPENAI_API_KEY="your-api-key"

Step 4 : Creating a Basic RAG System

With the basic setup complete, we can start building the RAG system using LangChain.

We will be using LangChain as the framework.

  1. Create Files:

    • app.py in the root folder for all code.

    • faq.txt in the root folder for FAQs. This file contains all the questions and answers and will act as the knowledge base. (Sample available in the GitHub repo shared in the beginning of the article and the image attached.)

  1. Set Up Environment:

    • Import requirements and load environment variables.

    • Initialize OpenAI's text embeddings model to create vector embeddings of documents for storage in the vector-store database.

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.runnables import (
    RunnableBinding,
    RunnableLambda, 
    RunnableAssign
)
from langchain_community.document_loaders import TextLoader
from langchain_chroma import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from dotenv import load_dotenv
from langchain_core.prompts import ChatPromptTemplate


load_dotenv()
embeddings = OpenAIEmbeddings(
        model='text-embedding-3-large'
    )
  1. Process Documents:

    • Load faq.txt as docs.

    • Use LangChain's Text Splitter to split docs.

    • Create a db instance with Chroma-db for retrieval.

loader = TextLoader("faq.txt")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=600, 
    chunk_overlap=100
)

docs = text_splitter.split_documents(documents)
db = Chroma.from_documents(
    docs, embeddings
)

retriever = db.as_retriever()
  1. Create Prompt:
  • Define a prompt template for GPT-3.5-turbo with placeholders for {question} and {context}.

  • Define the the system and human prompts accordingly.

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            '''You are helful Chatbot that takes Question and context. 
                You will only answer based on the context. Give the Answer directly like a conversation, do not mention of the context.
                If you don't know the answer, say that you don't know. Do not make up an answer.
            ''',
        ),
        (
            "human", 
            "[Question] : {question},\n[Context] : {context}"
        ),
    ]
)
  1. Build Final Chain:

    • Use LCEL (LangChain Expression Language) to compose multiple chains.

    • First format the input, then pass it to the retriever to get the related docs.

# formats ithe input question coming from user. Converts it into the simple text
def format_input(inputs):
    return f"Question: {inputs['question']}"

# initialises the instance of openAI model
open_ai_gpt = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

# Creates a Runnable chain which retrieves the documents and assigns them to 'context' variable, 
# this context along with question is passed to the next chain
prompt_chain = RunnableBinding(
    bound=RunnableAssign(
        mapper={
            "context" : RunnableLambda(format_input) | retriever
        }
    )
)

# Final chain which is composed of all the chains. All the chains run one by one in order, output of one chain is fed to other chain as input. 
faq_chain = prompt_chain | prompt | open_ai_gpt
  1. Run the Chain:

    • Execute the chain to get AI-generated responses for queries.
output = faq_chain.invoke("How will i recieve my ATM PIN ? ")
print(output.content)

Step 5 : Creating Streamlit Chatbot Web App

We have created our RAG chatbot that processes user queries, searches the retrieval database, and generates natural language responses using OpenAI. Now, we will create a simple Streamlit chatbot for user interaction.

Steps :

  1. Install Streamlit:
pip install streamlit
  1. Wrap Logic in a Function: Modify app.py to wrap the logic in a function to get the RAG chain (faq_chain):
def get_faq_chain():
    embeddings = OpenAIEmbeddings(
        model='text-embedding-3-large'
    )

    loader = TextLoader("faq.txt")
    documents = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=600, 
        chunk_overlap=100
    )
    docs = text_splitter.split_documents(documents)
    db = Chroma.from_documents(
        docs, embeddings
    )
    retriever = db.as_retriever()
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                '''You are helful Chatbot that takes Question and context. 
                    You will only answer based on the context. Give the Answer directly like a conversation, do not mention of the context.
                    If you don't know the answer, say that you don't know. Do not make up an answer.
                ''',
            ),
            (
                "human", 
                "[Question] : {question},\n[Context] : {context}"
            ),
        ]
    )
    open_ai_gpt = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
    prompt_chain = RunnableBinding(
        bound=RunnableAssign(
            mapper={
                "context" : RunnableLambda(format_input) | retriever
            }
        )
    )
    faq_chain = prompt_chain | prompt | open_ai_gpt
    return faq_chain
  1. Create Streamlit App: Create a new file streamlit_app.py for the Streamlit app.
import streamlit as st
from dotenv import load_dotenv
from app import get_faq_chain

load_dotenv() 

st.title("FAQ chat")

if "messages" not in st.session_state:
    st.session_state.messages = []

# Getting the faq_chain and storing it into the session
if "model" not in st.session_state:
    st.session_state["model"] = get_faq_chain()

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

if prompt := st.chat_input("Enter you Query ?"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
            # invoking the model with question received from the user
        response = st.session_state.model.invoke({
            "question" : f"{prompt}",
        })
        st.markdown(response.content)
    # Add assistant response to chat history
    st.session_state.messages.append({"role": "assistant", "content": response.content})
  1. Run Streamlit app:
streamlit run streamlit_app.py

The UI will allow users to interact with the chatbot seamlessly.

Step 6 : Integrating LangSmith

Integrating LangSmith into our RAG system takes our development process to the next level. With LangSmith, we gain a powerful tool for debugging, testing, and monitoring our RAG pipeline through an interactive dashboard. This allows us to identify weaknesses, optimize performance, and ensure our chatbot delivers top-notch results in real-world scenarios.

Integration Steps:

  1. Sign up on LangChain https://www.langchain.com/ and create API keys for access.

  1. Install LangSmith:
pip install langsmith
  1. Configure Environment : Add LangChain API keys and environment variables to the .env file, including project name and tracing settings.
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY="langchain-api-key"
LANGCHAIN_PROJECT="your-project-name"
  1. View Chain Calls : Visit the LangSmith dashboard to view all chain calls under the specified project. Monitor inputs, outputs, processing time, and token consumption for each chain.

Hurray!! 🎉

We have achieved a major milestone: our chatbot now leverages Retrieval-Augmented Generation (RAG) for the most relevant output. With this powerful capability, our chatbot seamlessly processes user queries, searches a retrieval database for context, and generates accurate, natural language responses using OpenAI.

Try Out These Steps

To take our chatbot to the next level, we can:

  • Fine-tune Prompts: Enhance accuracy and contextual relevance by tweaking prompts.

  • Implement Enhancements: Add features and improvements to boost reliability and versatility.

By implementing these enhancements, our chatbot will become an even more effective tool, providing users with accurate and helpful information.