What is RAG?

Before we discuss what exactly is RAG, let us discuss a different scenario and relate it to RAG, lets say you watched a lot of films/movies and if i say, i have watched a film and i found it interesting, i watched it a long ago, and you might ask me if i remember any few scenes that i could explain, so that you could help me with the title of movie. Now, i started saying its a film where a ship hits an iceberg and people drown. and you immediately might guess title "Titanic".

When you have watched a lot of films, but you guessed the title correctly

this is exactly RAG or Retrieval-Augmented Generation., where RAG systems has been fed or trained with data and when you ask it a question, it picks from up sources it has without much hallucination.

How does RAG work?

lets relate it to previous movie example we took, You watched lot of films, and i explained you only a scene from movie but you guessed the title right and i dint had to explain complete movie to you.

Here in RAG, we actually divide documents into chunks and chunks to embedded system and store them in vector databases.

When user asks a question to RAG, it gets into database and picks up closest vector, retrieves data, gives it to LLM(Augmentation, LLM gets context) and Generation (LLM generates you answer)

lets keep our movie example as constant reference to keep things simple, here you see movies as like introduction scene, interval scence, highlights scene, climax scene, comedy scene

and when i tell you even one scene from movie, you would know what movie i am referring too.

Same way, lets say, if i have lot of KT or internal documents, i cannot go through each and every document as its time consuming, this is where RAG comes in play

We will ingest all our documents to RAG

We convert our existing documents to chunks (you can compare it like different kind of scenes in movies) and then we change into vector (because systems under only 1s and 0s )

In this context, A vector is simply a list of numbers that represents the meaning of a word, sentence, or document.

So lets say, for example i am searching for a question "Which year did the iceberg hit?" , and if we have movies ingested like Avatar, Titanic, Hulk...

it converts our question into vector and search for nearest vector-- iceberg hit would mostly from movie Avatar.

How does vector search help?, let us understand better with example

Dog → [0.82, 0.15, -0.44, 0.71]

Cat → [0.79, 0.18, -0.41, 0.68]

Car → [-0.23, 0.91, 0.55, -0.77]

Here each number could represent something like animal, pet, fur.. could be any.

Now if i search for "Dog" using RAG, most probably, it will bring the closest vector available for me, if it finds any document that has "Cat" in it. it starts searching context in that document

it would completely ignore "Car" as the vectors are not close

Implementing RAG

let us understand RAG by implementing, since RAG needs a model, let us use ollama here for our project

Download Ollama

let us assume that we are working in a company and there are multiple times production failure and each time production failure occurred, we documented it with situation and resolution.

Clone the repo here to you local - RAG-demo

git clone https://github.com/sagarkakkalasworld/RAG-demo.git

Now let us create a virtual environment

command:

python -m venv venv

source venv/bin/activate

let us install required packages

command:

pip install langchain

pip install langchain-community

pip install langchain-ollama

pip install langchain-chroma

pip install chromadb

pip install markdown

pip install unstructured

here langchain in main core framework that helps for retereivals,prompts, document abstraction and main RAG workflows

and chromadb is vector database that helps us store embedded values

and markdown converts markdown to html files

and now to convert our data into embeddings

command: ollama pull nomic-embed-text

and now for agent to give answers to user:

command: ollama pull llama3

now create a file ingest.py

from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

from langchain_community.document_loaders import TextLoader
import os

docs = []

for file in os.listdir("./incidents"):
    if file.endswith(".md"):
        loader = TextLoader(f"./incidents/{file}")
        docs.extend(loader.load())

docs = loader.load()

headers = [
    ("#", "title"),
    ("##", "section")
]

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers
)

chunks = []

for doc in docs:
    chunks.extend(
        splitter.split_text(doc.page_content)
    )

print(f"Total Chunks: {len(chunks)}")

embeddings = OllamaEmbeddings(
    model="nomic-embed-text"
)

vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

print("Vector DB created successfully")

Here in the top, code actually gets into incidents folder, reads all .md files, converts text into chunks with title and sections , and then converted to chunks, now using embedded model , these chunks get into chromadb

now lets run

command

python ingest.py

once ingestion is done, you will observe that chromadb is created in your local which is vector databse

Now we have successfully, converted our data to chunks and embedded into vector database.

Now we can use this vectordb and integrate with our LLM agent

create a file rag.py

from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

from langchain_community.document_loaders import TextLoader
import os

docs = []

for file in os.listdir("./incidents"):
    if file.endswith(".md"):
        loader = TextLoader(f"./incidents/{file}")
        docs.extend(loader.load())

docs = loader.load()

headers = [
    ("#", "title"),
    ("##", "section")
]

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers
)

chunks = []

for doc in docs:
    chunks.extend(
        splitter.split_text(doc.page_content)
    )

print(f"Total Chunks: {len(chunks)}")

embeddings = OllamaEmbeddings(
    model="nomic-embed-text"
)

vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

print("Vector DB created successfully")

Now let us test our rag system

command

python rag.py

it prompts to ask a question, ask it related to documents ingested to see if it is working fine

This concludes blog here.

Sagar Kakkala's World

Search This Blog

Retrieval-Augmented Generation 🚀 | How does RAG work? 🧠📚 | Sagar Kakkala’s World 🌍

What is RAG?

How does RAG work?

Implementing RAG

Comments

Post a Comment