ArchitectureArchitecture
Conference40min
INTERMEDIATE

GenAI on Kubernetes: training, inference and serving tutorial

This tutorial guides participants through end-to-end GenAI workload management on Kubernetes, covering distributed training, optimized inference, production model serving, and key components like autoscaling, operators, and OSS frameworks. Attendees gain hands-on experience designing efficient, scalable, and secure GenAI deployments using practical, reproducible Kubernetes patterns and workflows.

Alessandro Vozza
Alessandro VozzaMicrosoft

talkDetail.whenAndWhere

Friday, April 24, 15:15-15:55
Banquet
talks.roomOccupancytalks.noOccupancyInfo
talks.description
Kubernetes is quickly becoming the preferred platform for running GenAI workloads, but wiring together training jobs, inference pipelines, and scalable model serving can feel overwhelming. This hands-on tutorial walks you through the full lifecycle of GenAI on Kubernetes—from launching distributed training jobs with GPU scheduling, to running optimized inference workloads, to exposing production-grade model endpoints using cloud-native serving stacks. We’ll cover key building blocks like operators, autoscaling (HPA/KEDA), vector stores, model registries, and popular OSS frameworks (KServe, Ray, vLLM, Kubeflow, and more). You’ll learn how to design resource-efficient GPU clusters, fine-tune models securely, and deploy multi-model serving architectures that behave predictably under real traffic. By the end, you’ll have a working reference setup and a clear understanding of how to operationalize GenAI workloads using the Kubernetes patterns you already know. No magic—just practical, reproducible workflows you can take back to your platform today.
genai
inference
kubernetes
training
talks.speakers
Alessandro Vozza

Alessandro Vozza

Microsoft

Netherlands

Alessandro, a seasoned community leader, has spent the last few years architecting cloud-native infrastructures for Microsoft customers, energizing the Dutch tech community, and helping professionals achieve CKx certification. With over 25 years immersed in open-source technologies, Alessandro is deeply passionate about the cloud-native ecosystem. He's now back at Microsoft as a Senior Technical Specialist in Application Innovation & AI.

talkDetail.rateThisTalk

talkDetail.poortalkDetail.excellent

talkDetail.ratingNotYetAvailable

talkDetail.ratingAvailableWhenStarted

talkDetail.signInRequired

talkDetail.signInToRateDescription

occupancy.title

occupancy.votingNotYetAvailable

occupancy.votingAvailableBeforeStart

talkDetail.signInRequired

occupancy.signInToVoteDescription

comments.title

comments.speakerNotEnabledComments