
Conference50min
Beyond the Prompt: Evaluating, Testing, and Securing LLM Applications
This presentation discusses evaluating LLM applications by measuring changes in prompts and RAG pipelines. It explores evaluation frameworks like Vertex AI Evaluation, DeepEval, and Promptfoo, and emphasizes the importance of robust guardrails for input and output to protect against prompt injections and harmful responses, ensuring app safety and precision.

Mete AtamelGoogle
talkDetail.whenAndWhere
Friday, June 13, 09:00-09:50
Room 2
When you change prompts or modify the Retrieval-Augmented Generation (RAG) pipeline in your LLM applications, how do you know it’s making a difference? You don’t—until you measure. But what should you measure, and how? Similarly, how can you ensure your LLM app is resilient against prompt injections or avoids providing harmful responses? More robust guardrails on inputs and outputs are needed beyond basic safety settings.In this talk, we’ll explore various evaluation frameworks such as Vertex AI Evaluation, DeepEval, and Promptfoo to assess LLM outputs, understand the types of metrics they offer, and how these metrics are useful. We’ll also dive into testing and security frameworks like LLM Guard to ensure your LLM apps are safe and limited to precisely what you need.
talkDetail.shareFeedback
talkDetail.feedbackExpired
talkDetail.feedbackPeriodExpired
occupancy.title
occupancy.votingClosed
occupancy.votingWindowExpired
comments.speakerNotEnabledComments