Data & AIConference40min
Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI
This session covers model quantization as a key technique for creating efficient, affordable AI models—especially Small Language Models and agentic AI. Attendees will learn practical quantization methods, deployment strategies across hardware types, and techniques for maximizing performance and cost-effectiveness in multi-agent systems using the Agent2Agent protocol.
David vonThenenNetApp
talkDetail.whenAndWhere
Thursday, April 23, 12:05-12:45
Skalkotas
talks.roomOccupancytalks.noOccupancyInfo
The AI industry is shifting from bigger to better. As companies chase efficiency and performance, quantization has emerged as one of the most effective ways to make models smaller, faster, and more affordable—without crippling accuracy. With recent breakthroughs from teams like DeepSeek proving that optimization can shake entire markets, developers are rethinking what "efficient AI" really means. The real question isn't whether we can make models smarter... it's whether we can make them smarter per watt, per dollar, and per millisecond.
This session explores the full lifecycle of model quantization and how it powers the rise of Small Language Models (SLMs) and agentic AI systems. We'll cover how quantization works, when it pays off, and how it changes deployment tradeoffs across CPUs, GPUs, and AI accelerators. Attendees will walk away with practical techniques for compressing models, tuning quantization-aware training, and deploying specialized SLMs to leverage them in multi-agent Agentic systems using Agent2Agent (A2A) protocol. The end goal is to maximize hardware potential while staying responsive without breaking the bank on hardware costs.
This session explores the full lifecycle of model quantization and how it powers the rise of Small Language Models (SLMs) and agentic AI systems. We'll cover how quantization works, when it pays off, and how it changes deployment tradeoffs across CPUs, GPUs, and AI accelerators. Attendees will walk away with practical techniques for compressing models, tuning quantization-aware training, and deploying specialized SLMs to leverage them in multi-agent Agentic systems using Agent2Agent (A2A) protocol. The end goal is to maximize hardware potential while staying responsive without breaking the bank on hardware costs.
David vonThenen
David is a Senior AI/ML Engineer within the Office of the CTO at NetApp, where he’s dedicated to empowering developers to build, scale, and deploy AI/ML solutions in production environments. He brings deep expertise in building and training models for applications such as NLP, vision, real-time analytics, and even classifying debilitating diseases. His mission is to help users build, train, and deploy AI models efficiently, making advanced machine learning accessible to users of all levels.
Before NetApp, he was heavily involved in the AI/ML community, specifically in conversational AI solutions and driving AI platform growth in a DevRel and pre-sales role. David frequently shares his insights at industry conferences and events, offering hands-on guidance for implementing AI/ML in cloud environments. David's prior experience includes contributing to the Kubernetes and CNCF ecosystems, working hands-on with VMware virtualization, implementing backup/recovery solutions, and developing hardware storage adapter firmware and drivers.
Before NetApp, he was heavily involved in the AI/ML community, specifically in conversational AI solutions and driving AI platform growth in a DevRel and pre-sales role. David frequently shares his insights at industry conferences and events, offering hands-on guidance for implementing AI/ML in cloud environments. David's prior experience includes contributing to the Kubernetes and CNCF ecosystems, working hands-on with VMware virtualization, implementing backup/recovery solutions, and developing hardware storage adapter firmware and drivers.
talkDetail.shareFeedback
talkDetail.feedbackNotYetAvailable
talkDetail.feedbackAvailableAfterStart
talkDetail.signInRequired
talkDetail.signInToFeedbackDescription
occupancy.title
occupancy.votingNotYetAvailable
occupancy.votingAvailableBeforeStart
talkDetail.signInRequired
occupancy.signInToVoteDescription
comments.speakerNotEnabledComments