RAG - Basic RAG using llama3, langchain and chromadb
embedding
RAG
A primer about RAG’s
what the hell is it ?
- RAG is an advanced form of information retrieval (IR) mechanism.
- Information retrieval, is the process of retrieving relevant information from a dataset for your query.
How basic RAG works in a nutshell
- Input: Query and Array of documents (dataset)
- Output: Top N relevant documents
- RAG uses Bi-Encoder encoding and cosine similarity search to retrieve top n documents
- first, query is converted into embeddings (floating point representation) i.e) array of vectors, and then pooled together into single vector say Q
- second, input documents array, here each document in the array is converted into embeddings (floating point representation) i.e) array of vectors, and then pooled together into single vector say Docs
- then, cosine similarity search is taken between the Q and each document vector in Docs say cosine_similarity_score_list, and it is sorted in descending manner
- Top n docs is returned from cosine_similarity_score_list as relevant documents