DevNation DayDevNation Day
Conference50min
INTERMEDIATE

A look inside the LLM closed box: test, observe and evaluate your RAG assisted chatbot

This talk explores testing LLM-infused applications by combining deterministic assertions with an LLM-as-a-judge approach. It demonstrates using LangChain4j 1.0 for observing application behavior, creating datasets from collected traces, and evaluating performance in RAG-assisted LLM chatbots during retrieval and generation stages.

Dimitrios Kafetzis
Dimitrios KafetzisRed Hat

talkDetail.whenAndWhere

Thursday, May 8, 13:20-14:10
Room B
talks.description
Checking the correctness of an application with an exhaustive suite of unit and integration tests is a natural task for any respectable software developer. Such a test suite also comes with other advantages like documenting the expected behavior of the application and enabling a fast feedback loop. This is all relatively straightforward when the components of your software are entirely deterministic, but how can you achieve something similar when a key part of it has a probabilistic nature?This probabilistic nature makes it even more important to observe and collect real user inputs from production to better understand user needs and automate the evaluation of your LLM-infused application.This talk will show in practice how to test an LLM-infused application with a mix of deterministic assertions and an LLM-as-a-judge approach. It will also demonstrate how LangChain4j 1.0 allows us to extensively observe the behavior of this application, and create a dataset out of the collected traces. Finally this dataset will be used in an evaluation framework through which assessing the performance of our RAG assisted LLM chatbot on both its retrieval and generation stages.
evaluation
dataset
probabilistic
langchain4j
talks.speakers
Dimitrios Kafetzis

Dimitrios Kafetzis

Red Hat

Greece

Software engineer and 3d printing enthusiast currently working on master thesis at NCSR Demokritos on Data Science.

Work Experience:

Fullstack Developer at Crowd Policy (2022-2023)
https://www.crowdpolicy.com

Software Engineer at Agile Actors (2023-now)
Creating Software solutions for various organizations. Currently part of the EUS team at Red Hat as contractor.
https://www.agileactors.com

Education:

Harokopio Univercity of Athens (2018-2022)
Attained a Bachelors Degree at Harokopio Univercity of Athens, Department of Informatics and Telematics
https://www.hua.gr

National Center for Scientific Research Demokritos & University of Peloponese (2023-now)
MSc in Data Science
https://msc-data-science.iit.demokritos.gr/en

talkDetail.rateThisTalk

talkDetail.ratingExpired

talkDetail.ratingWindowExpired

occupancy.title

occupancy.votingClosed

occupancy.votingWindowExpired

comments.title

comments.speakerNotEnabledComments