Build & DeployBuild & Deploy
Conference50min
INTERMEDIATE

RESOURCE_EXHAUSTED - Managing Google Cloud infrastructure for Cost-Effective AI and Agentic Workloads

This session offers practical Google Cloud strategies to manage scarce AI infrastructure, including GPUs/TPUs, CPU, and memory. It covers scheduling, reservations, agentic workload bottlenecks, and FinOps tactics like Spot VMs and CUDs, helping teams keep workloads running efficiently and cost-effectively.

talk.summaryAiDisclaimer

Maciej Strzelczyk
Maciej StrzelczykGoogle Cloud

talkDetail.whenAndWhere

Friday, June 19, 13:35-14:25
Room 4A
talks.description
The gold rush for AI has transformed cloud infrastructure. If you've tried to spin up a GPU or TPU lately, there's a high chance you've been stopped dead in your tracks by a dreaded RESOURCE_EXHAUSTED error.
As we transition into the era of Agentic workflows, where thousands of LLM calls, vector searches and autonomous loops run concurrently - even standard CPU and memory limits are being pushed to their breaking points. Throwing money at the problem is no longer a guaranteed solution, so it's time to learn all the features and tricks available to ensure smooth computing.

In this session, we will move past the Google Cloud marketing slides and dive into concrete, practical strategies to keep your workloads running without draining your company's wallet.

Bonus: Most of the techniques and features presented can be applied to non-AI related infrastructure as well.

We will cover:
  • Navigating Scarcity: How to leverage Dynamic Workload Scheduler, Reservation Sharing, and queuing systems to guarantee GPU/TPU availability.
  • Agentic Bottlenecks: Mitigating CPU and memory spikes when running highly concurrent AI agents.
  • FinOps for AI: Mixing Spot VMs, Committed Use Discounts (CUDs), and secondary node pools to optimize price-to-performance.
  • TPUs: What is a TPU? Is it a valid alternative to GPU?

Who's this session for:
Anyone responsible for acquiring and managing the necessary resources for their teams and products.
cloud
gpu
tpu
ai
talks.speakers
Maciej Strzelczyk

Maciej Strzelczyk

Google Cloud

Poland

Developer Relations Engineer at Google Cloud. I started my Cloud adventure in the Cloud Support team where I learned first hand what problems customers can have. As DevRel Engineer I aim to make life easier for small and medium customers of Google Cloud!