Data & AIData & AI
Conference40min
INTERMEDIATE

Beyond the Prompt: Evaluating, Testing, and Securing LLM Applications

This talk discusses evaluating and securing LLM applications by measuring changes in prompts or RAG pipelines. It highlights evaluation frameworks like Vertex AI Evaluation, DeepEval, and Promptfoo, and introduces security measures using LLM Guard to ensure resilience against prompt injections and harmful responses, emphasizing the need for robust input-output guardrails.

Mete Atamel
Mete AtamelGoogle

talkDetail.whenAndWhere

Saturday, September 27, 15:50-16:30
Concert Hall
talks.description
When you change prompts or modify the Retrieval-Augmented Generation (RAG) pipeline in your LLM applications, how do you know it’s making a difference? You don’t—until you measure. But what should you measure, and how? Similarly, how can you ensure your LLM app is resilient against prompt injections or avoids providing harmful responses? More robust guardrails on inputs and outputs are needed beyond basic safety settings.

In this talk, we’ll explore various evaluation frameworks such as Vertex AI Evaluation, DeepEval, and Promptfoo to assess LLM outputs, understand the types of metrics they offer, and how these metrics are useful. We’ll also dive into testing and security frameworks like LLM Guard to ensure your LLM apps are safe and limited to precisely what you need.
security
metrics
frameworks
evaluation
talks.speakers
Mete Atamel

Mete Atamel

Google

United Kingdom

I’m a Software Engineer and a Developer Advocate at Google in London. I build tools, demos, tutorials, and give talks to educate and help developers to be successful on Google Cloud.

talkDetail.rateThisTalk

talkDetail.ratingExpired

talkDetail.ratingWindowExpired

occupancy.title

occupancy.votingClosed

occupancy.votingWindowExpired

comments.title

comments.speakerNotEnabledComments