Building Efficient RAG pipeline using Open Source LLMs
08-24, 13:45–15:15 (Asia/Kuala_Lumpur), Classroom

Large Language models are all over the place, driving the advancement of AI in today's era. For enterprises and businesses, integrating LLM with custom data sources is crucial to provide more contextual understanding and reduce hallucinations. In my talk, Tarun will emphasize on building an effective RAG pipeline for production using Open Source LLMs. In simple words, Retrieval Augmented Generation involves retrieving relevant documents as context for user queries and leveraging LLMs to generate more accurate responses.

It is a fully hands-on workshop/session, where participants will construct an entire RAG pipeline using open-source: LLMs, vector databases, and embeddings.


Problem Statement

  • Closed-source models like GPT, Claude, and Gemini demonstrate significant potential as LLMs, but enterprises and startups with sensitive data hesitate to rely on them due to data privacy and security concerns.
  • While numerous solutions and resources on the internet utilize closed-source models like GPT and Gemini to construct RAG pipelines, there is limited information available on building effective RAG pipelines using Open Source LLMs.
  • When it comes to using Open Source LLM, it is important to understand the prompt template to use to get response in specific format. While those with a basic grasp of Transformers can adjust parameters to enhance results, this approach may not be suitable for everyone.
  • Basic RAG solutions often struggle and tend to produce hallucinations.

Session Outline

A hands-on workshop/session, where participants will construct an entire RAG pipeline using open-source: LLMs, vector databases, and embeddings. Additionally, the speaker will demonstrate two advanced techniques aimed at improving results from LLMs. Below is the outline of my workshop talk:

  • Issues with Large Language Models
  • Understand the need of RAG and Open Source LLMs
  • Prompt Engineering Basics - Zero Shot and Few Shot
  • Open Source LLMs parameters tour: Understanding temperature, top_p and so on.
  • Building basic RAG pipeline using Open Source LLMs, embeddings, and vector stores.
  • Advanced Technique: 1- Using Cross Encoders Sentence transformers to Re-rank
  • Advanced Technique: 2- Fine Tune Embeddings for RAG and Hybrid Search
  • Build Streamlit app for your RAG application
  • Deploy it using secrets on share.streamlit.io

Tarun Jain is a Data Scientist at AI Planet, a Belgium based AI Startup. He is also recognised as Google Developer Expert in AI/ML. He is also part of GSoC'24 at RedHenLab.