GenAI on Kubernetes: training, inference and serving tutorial

This tutorial guides participants through end-to-end GenAI workload management on Kubernetes, covering distributed training, optimized inference, production model serving, and key components like autoscaling, operators, and OSS frameworks. Attendees gain hands-on experience designing efficient, scalable, and secure GenAI deployments using practical, reproducible Kubernetes patterns and workflows.

Alessandro VozzaMicrosoft

talkDetail.whenAndWhere

Friday, April 24, 15:15-15:55

Banquet

talks.roomOccupancytalks.noOccupancyInfo

talks.description

Kubernetes is quickly becoming the preferred platform for running GenAI workloads, but wiring together training jobs, inference pipelines, and scalable model serving can feel overwhelming. This hands-on tutorial walks you through the full lifecycle of GenAI on Kubernetes—from launching distributed training jobs with GPU scheduling, to running optimized inference workloads, to exposing production-grade model endpoints using cloud-native serving stacks. We’ll cover key building blocks like operators, autoscaling (HPA/KEDA), vector stores, model registries, and popular OSS frameworks (KServe, Ray, vLLM, Kubeflow, and more). You’ll learn how to design resource-efficient GPU clusters, fine-tune models securely, and deploy multi-model serving architectures that behave predictably under real traffic. By the end, you’ll have a working reference setup and a clear understanding of how to operationalize GenAI workloads using the Kubernetes patterns you already know. No magic—just practical, reproducible workflows you can take back to your platform today.

genai

inference

kubernetes

training

talks.speakers

Alessandro Vozza

Microsoft

Netherlands

Alessandro, a seasoned community leader, has spent the last few years architecting cloud-native infrastructures for Microsoft customers, energizing the Dutch tech community, and helping professionals achieve CKx certification. With over 25 years immersed in open-source technologies, Alessandro is deeply passionate about the cloud-native ecosystem. He's now back at Microsoft as a Senior Technical Specialist in Application Innovation & AI.

talkDetail.rateThisTalk

talkDetail.poortalkDetail.excellent

talkDetail.ratingNotYetAvailable

talkDetail.ratingAvailableWhenStarted

talkDetail.signInRequired

talkDetail.signInToRateDescription

talkDetail.shareFeedback

talkDetail.feedbackNotYetAvailable

talkDetail.feedbackAvailableAfterStart

talkDetail.signInRequired

talkDetail.signInToFeedbackDescription

occupancy.title

occupancy.votingNotYetAvailable

occupancy.votingAvailableBeforeStart

talkDetail.signInRequired

occupancy.signInToVoteDescription

comments.title

comments.speakerNotEnabledComments

GenAI on Kubernetes: training, inference and serving tutorial

talkDetail.whenAndWhere

Alessandro Vozza

star_border talkDetail.rateThisTalk

talkDetail.ratingNotYetAvailable

talkDetail.signInRequired

feedback talkDetail.shareFeedback

talkDetail.feedbackNotYetAvailable

talkDetail.signInRequired

how_to_vote occupancy.title

occupancy.votingNotYetAvailable

talkDetail.signInRequired

talkDetail.rateThisTalk

talkDetail.shareFeedback

occupancy.title