Data & AIData & AI
Conference45min
INTERMEDIATE

Testing Agents Before They Test You

This talk shares practical approaches to evaluating AI Agentic Systems before and after deployment, highlighting the need for trust, observability, and tailored testing methods. Attendees will learn how to design effective evaluation pipelines, combine offline and live testing, and build trustworthy AI agents through real-world examples and lessons learned.

Jettro Coenradie
Jettro CoenradieYuma
Daniël Spee
Daniël SpeeYuma

talkDetail.whenAndWhere

Thursday, April 2, 15:30-16:15
Zaal 2
talks.roomOccupancytalks.noOccupancyInfo
talks.description

Abstract


Would you let a stranger handle your customer data?
Would you let a new hire talk to a client on their first day?
Would you put your kid in a self-driving car and just say "Have fun at school."

Then why do we trust our shiny new AI Agents to behave correctly in production without testing them?

In this talk, we share our journey of exploring how to evaluate Agentic Systems before and after deployment. We’ll walk through how to move from “it works in the demo” to trustworthy and observable systems that you can confidently run in production.

We’ll show practical examples of building evaluation pipelines, and how we experiment with simple, measurable ways to understand an agent’s behavior over time. We’ll share what we’ve learned so far, where things go wrong, what helps, and what’s still an open challenge as we build toward more mature evaluation practices.

Expect real experiences, not just theory. Expect live examples, and ideas you can take home to build trust into your own agents.

Key Takeaways

  • Why testing AI Agents is different from traditional software testing
  • How to design evaluation frameworks that fit your use case
  • How to combine offline testing with live production observation

Target Audience
Developers, architects, and AI practitioners who are experimenting with or building agent-based systems and want to learn how to evaluate and test them effectively.


testing
agents
trust
evaluation
talks.speakers
Jettro Coenradie

Jettro Coenradie

Yuma

Netherlands

Jettro is a software architect, search relevance expert, and data enthusiast who enjoys discussing his job, hobbies, and other topics that inspire people. Jettro truly believes in the Luminis mantra that the only thing that grows by sharing is knowledge. After more than ten years of creating the best search engines for multiple customers, Jettro is very active in the Generative AI domain. He has extensive experience with Retrieval Augmented Generation and AI Agents.
Daniël Spee

Daniël Spee

Yuma

Netherlands

Daniël Spee is a software engineer at Yuma with a passion for search, AI, and data-centric systems. He believes that great search goes beyond technology and requires a deep understanding of user intent, data semantics, and the business context that drives decision-making.

By combining AI techniques with classical search approaches, Daniël builds smarter, context-aware systems that bridge the gap between information and insight. His current focus is on leveraging AI agents and retrieval pipelines to automate and enhance real-world workflows, turning data into action and intelligence.

talkDetail.rateThisTalk

talkDetail.poortalkDetail.excellent

talkDetail.ratingNotYetAvailable

talkDetail.ratingAvailableWhenStarted

talkDetail.signInRequired

talkDetail.signInToRateDescription

occupancy.title

occupancy.votingNotYetAvailable

occupancy.votingAvailableBeforeStart

talkDetail.signInRequired

occupancy.signInToVoteDescription

comments.title

comments.speakerNotEnabledComments