Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

This session covers model quantization as a key technique for creating efficient, affordable AI models—especially Small Language Models and agentic AI. Attendees will learn practical quantization methods, deployment strategies across hardware types, and techniques for maximizing performance and cost-effectiveness in multi-agent systems using the Agent2Agent protocol.

David vonThenenNetApp

talkDetail.whenAndWhere

Friday, April 24, 17:15-17:55

MC 2

talks.roomOccupancytalks.noOccupancyInfo

talks.description

The AI industry is shifting from bigger to better. As companies chase efficiency and performance, quantization has emerged as one of the most effective ways to make models smaller, faster, and more affordable—without crippling accuracy. With recent breakthroughs from teams like DeepSeek proving that optimization can shake entire markets, developers are rethinking what "efficient AI" really means. The real question isn't whether we can make models smarter... it's whether we can make them smarter per watt, per dollar, and per millisecond.

This session explores the full lifecycle of model quantization and how it powers the rise of Small Language Models (SLMs) and agentic AI systems. We'll cover how quantization works, when it pays off, and how it changes deployment tradeoffs across CPUs, GPUs, and AI accelerators. Attendees will walk away with practical techniques for compressing models, tuning quantization-aware training, and deploying specialized SLMs to leverage them in multi-agent Agentic systems using Agent2Agent (A2A) protocol. The end goal is to maximize hardware potential while staying responsive without breaking the bank on hardware costs.

efficiency

deployment

quantization

slm

talks.speakers

David vonThenen

NetApp

United States of America

David is a Senior AI/ML Engineer within the Office of the CTO at NetApp, where he’s dedicated to empowering developers to build, scale, and deploy AI/ML solutions in production environments. He brings deep expertise in building and training models for applications such as NLP, vision, real-time analytics, and even classifying debilitating diseases. His mission is to help users build, train, and deploy AI models efficiently, making advanced machine learning accessible to users of all levels.

Before NetApp, he was heavily involved in the AI/ML community, specifically in conversational AI solutions and driving AI platform growth in a DevRel and pre-sales role. David frequently shares his insights at industry conferences and events, offering hands-on guidance for implementing AI/ML in cloud environments. David's prior experience includes contributing to the Kubernetes and CNCF ecosystems, working hands-on with VMware virtualization, implementing backup/recovery solutions, and developing hardware storage adapter firmware and drivers.

talkDetail.rateThisTalk

talkDetail.poortalkDetail.excellent

talkDetail.ratingNotYetAvailable

talkDetail.ratingAvailableWhenStarted

talkDetail.signInRequired

talkDetail.signInToRateDescription

talkDetail.shareFeedback

talkDetail.feedbackNotYetAvailable

talkDetail.feedbackAvailableAfterStart

talkDetail.signInRequired

talkDetail.signInToFeedbackDescription

occupancy.title

occupancy.votingNotYetAvailable

occupancy.votingAvailableBeforeStart

talkDetail.signInRequired

occupancy.signInToVoteDescription

comments.title

comments.speakerNotEnabledComments

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

talkDetail.whenAndWhere

David vonThenen

star_border talkDetail.rateThisTalk

talkDetail.ratingNotYetAvailable

talkDetail.signInRequired

feedback talkDetail.shareFeedback

talkDetail.feedbackNotYetAvailable

talkDetail.signInRequired

how_to_vote occupancy.title

occupancy.votingNotYetAvailable

talkDetail.signInRequired

talkDetail.rateThisTalk

talkDetail.shareFeedback

occupancy.title