Retrieval-Augmented Generation πŸš€ | How does RAG work? πŸ§ πŸ“š | Sagar Kakkala’s World 🌍

What is RAG?

Before we discuss what exactly is RAG, let us discuss a different scenario and relate it to RAG, lets say you watched a lot of films/movies and if i say, i have watched a film and i found it interesting, i watched it a long ago, and you might ask me if i remember any few scenes that i could explain, so that you could help me with the title of movie. Now, i started saying its a film where a ship hits an iceberg and people drown. and you immediately might guess title "Titanic".

When you have watched a lot of films, but you guessed the title correctly

this is exactly RAG or Retrieval-Augmented Generation., where RAG systems has been fed or trained with data and when you ask it a question, it picks from up sources it has without much hallucination.


How does RAG work?

lets relate it to previous movie example we took, You watched lot of films, and i explained you only a scene from movie but you guessed the title right and i dint had to explain complete movie to you.

Here in RAG, we actually divide documents into chunks and chunks to embedded system and store them in vector databases.

When user asks a question to RAG, it gets into database and picks up closest vector, retrieves data, gives it to LLM(Augmentation, LLM gets context) and Generation (LLM generates you answer)

lets keep our movie example as constant reference to keep things simple, here you see movies as like introduction scene, interval scence, highlights scene, climax scene, comedy scene

and when i tell you even one scene from movie, you would know what movie i am referring too.

Same way, lets say, if i have lot of KT or internal documents, i cannot go through each and every document as its time consuming, this is where RAG comes in play

We will ingest all our documents to RAG

We convert our existing documents to chunks (you can compare it like different kind of scenes in movies) and then we change into vector (because systems under only 1s and 0s ) 

In this context, A vector is simply a list of numbers that represents the meaning of a word, sentence, or document.


So lets say, for example i am searching for a question "Which year did the iceberg hit?" , and if we have movies ingested like Avatar, Titanic, Hulk... 

it converts our question into vector and search for nearest vector-- iceberg hit would mostly from movie Avatar.

How does vector search help?, let us understand better with example

Dog → [0.82, 0.15, -0.44, 0.71]

Cat → [0.79, 0.18, -0.41, 0.68]

Car → [-0.23, 0.91, 0.55, -0.77]

Here each number could represent something like animal, pet, fur.. could be any.

Now if i search for "Dog" using RAG, most probably, it will bring the closest vector available for me, if it finds any document that has "Cat" in it. it starts searching context in that document

it would completely ignore "Car" as the vectors are not close


Implementing RAG

let us understand RAG by implementing, since RAG needs a model, let us use ollama here for our project

Download Ollama



let us assume that we are working in a company and there are multiple times production failure and each time production failure occurred, we documented it with situation and resolution.

Clone the repo here to you local - RAG-demo



command: git clone https://github.com/sagarkakkalasworld/RAG-demo.git



Now let us create a virtual environment

command:
python -m venv venv
source venv/bin/activate



let us install required packages

command:
pip install langchain
pip install langchain-community
pip install langchain-ollama
pip install langchain-chroma
pip install chromadb
pip install markdown
pip install unstructured





here langchain in main core framework that helps for retereivals,prompts, document abstraction and main RAG workflows

and chromadb is vector database that helps us store embedded values

and markdown converts markdown to html files

and now to convert our data into embeddings

command: ollama pull nomic-embed-text


and now for agent to give answers to user:

command: ollama pull llama3



now create a file ingest.py

from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

from langchain_community.document_loaders import TextLoader
import os

docs = []

for file in os.listdir("./incidents"):
if file.endswith(".md"):
loader = TextLoader(f"./incidents/{file}")
docs.extend(loader.load())

docs = loader.load()

headers = [
("#", "title"),
("##", "section")
]

splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers
)

chunks = []

for doc in docs:
chunks.extend(
splitter.split_text(doc.page_content)
)

print(f"Total Chunks: {len(chunks)}")

embeddings = OllamaEmbeddings(
model="nomic-embed-text"
)

vectordb = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)

print("Vector DB created successfully")

Here in the top, code actually gets into incidents folder, reads all .md files, converts text into chunks with title and sections , and then converted to chunks, now using embedded model , these chunks get into chromadb

now lets run 

command

python ingest.py


once ingestion is done, you will observe that chromadb is created in your local which is vector databse



Now we have successfully, converted our data to chunks and embedded into vector database.

Now we can use this vectordb and integrate with our LLM agent

create a file rag.py

from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import MarkdownHeaderTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

from langchain_community.document_loaders import TextLoader
import os

docs = []

for file in os.listdir("./incidents"):
if file.endswith(".md"):
loader = TextLoader(f"./incidents/{file}")
docs.extend(loader.load())

docs = loader.load()

headers = [
("#", "title"),
("##", "section")
]

splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers
)

chunks = []

for doc in docs:
chunks.extend(
splitter.split_text(doc.page_content)
)

print(f"Total Chunks: {len(chunks)}")

embeddings = OllamaEmbeddings(
model="nomic-embed-text"
)

vectordb = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)

print("Vector DB created successfully")



Now let us test our rag system 

command

python rag.py

it prompts to ask a question, ask it related to documents ingested to see if it is working fine



This concludes blog here.


Comments