RAG - Basic RAG using llama3, langchain and chromadb

embedding
RAG
A primer about RAG’s
Author

fastdaima

Published

July 31, 2024

what the hell is it ?

  • RAG is an advanced form of information retrieval (IR) mechanism.
  • Information retrieval, is the process of retrieving relevant information from a dataset for your query.

How basic RAG works in a nutshell

  • Input: Query and Array of documents (dataset)
  • Output: Top N relevant documents
  • RAG uses Bi-Encoder encoding and cosine similarity search to retrieve top n documents
  • first, query is converted into embeddings (floating point representation) i.e) array of vectors, and then pooled together into single vector say Q
  • second, input documents array, here each document in the array is converted into embeddings (floating point representation) i.e) array of vectors, and then pooled together into single vector say Docs
  • then, cosine similarity search is taken between the Q and each document vector in Docs say cosine_similarity_score_list, and it is sorted in descending manner
  • Top n docs is returned from cosine_similarity_score_list as relevant documents