From naive to advanced RAG: the complete guide

This deep-dive session addresses common challenges with Retrieval Augmented Generation (RAG) and provides insights on overcoming them, based on experiences of developers across Europe. The talk, led by the team building a vector database, covers various techniques to enhance RAG using LangChain4j, including semantic chunking, query expansion, metadata filtering, document reranking, and data lifecycle processes. It also discusses how to effectively evaluate and present results.

talk.summaryAiDisclaimer

Guillaume LaforgeGoogle

Cédrick LunvenDataStax

talkDetail.whenAndWhere

Monday, October 7, 13:30-16:30

Room 6

talks.description

It’s easy to get started with Retrieval Augmented Generation, but you’ll quickly be disappointed with the generated answers: inaccurate or incomplete, missing context or outdated information, bad text chunking strategy, not the best documents returned by your vector database, and the list goes on.After meeting thousands of developers across Europe, we’ve explored those pain points, and will share with you how to overcome them. As part of the team building a vector database we are aware of the different flavors of searches (semantic, meta-data, full text, multimodal) and embedding model choices. We have been implementing RAG pipelines across different projects and frameworks and are contributing to LangChain4j.In this deep-dive, we will examine various techniques using LangChain4j to bring your RAG to the next level: with semantic chunking, query expansion & compression, metadata filtering, document reranking, data lifecycle processes, and how to best evaluate and present the results to your users.

Vector database

Langchain4j

Retrieval Augmented Generation

Semantic chunking

talks.speakers

Guillaume Laforge

Google

France

Guillaume Laforge is a Developer Advocate for Google Cloud, where he specializes in Generative AI, service orchestration, and serverless compute solutions. He is also a Java Champion and the co-creator of the Apache Groovy programming language.

Cédrick Lunven

DataStax

France

Cedrick is a software engineer and a member of Developer Relations at Datastax. A passionate Java developer for the past 20 years, he has built a wide range of applications on distributed systems and tools. He is also the CTO of GoodBards, a platform that leverages generative AI to run end-to-end marketing campaigns.
Committed to open source, Cedrick created a feature toggle library called FF4J in 2013, which he has been actively maintaining. Over the past year, he has been implementing drivers, clients, SDKs, and various integrations of DataStax solutions with SpringAI and Langchain4j, to which he has also contributed.