JavaJava
Conference40min
INTERMEDIATE

Building and Running LLMs on GPUs Directly from Java with TornadoVM and GPULlama3.java

This session presents GPULlama3.java, an open-source framework enabling efficient, GPU-accelerated LLM inference directly from Java. Leveraging TornadoVM, it supports modern models and data types, integrates with LangChain4j and Quarkus, and offers live demos across diverse hardware, highlighting seamless Java-based GPU computing and performance profiling tools for AI workloads.

Michalis Papadimitriou
Michalis PapadimitriouUniversity of Manchester/TornadoVM

talkDetail.whenAndWhere

Thursday, April 23, 10:55-11:35
MC 3
talks.roomOccupancytalks.noOccupancyInfo
talks.description
Java stands at a critical inflection point in the AI revolution. Running LLM inference on GPUs traditionally forces developers to leave the JVM for Python and CUDA, or accept CPU-bound performance neither feasible for enterprise Java systems. With modern JDK features like the Vector API and frameworks such as llama3.java and JLama, developers can efficiently perform CPU inference for LLMs. Also, GPU computing in Java is also maturing with TornadoVM and Project Babylon that enable seamless offloading of computations to GPUs.

This session introduces GPULlama3.java, an open-source framework built atop llama3.java and TornadoVM that transparently accelerates LLM inference on GPUs. It supports half-precision and quantized data types (FP16, Q8, Q4) mapped directly to GPU native types, GPU-optimized matrix operations, fast Flash Attention, and is compatible with Llama 2/3, Mistral, Qwen2/3, Phi-3, and Gemma3 models. Integration with LangChain4j and Quarkus that allows developers to build GPU-powered inference engines entirely in Java with minimal overhead.

Finally, the session will showcase a live demo running GPULlama3.java on any JDK and hardware varying from Apple Silicon to high-end NVIDIA GPUs showcasing GPU-accelerated LLM inference integrated with Quarkus and agentic LangChain4j workflows. The session will also demonstrate how to leverage TornadoVM’s profiling tools for GPU performance analysis to complement standard JVM tooling.
gpu
llm
tornadovm
java
talks.speakers
Michalis Papadimitriou

Michalis Papadimitriou

University of Manchester/TornadoVM

United Kingdom

Michalis Papadimitriou is a Research Fellow at the University of Manchester and a Staff Software Engineer on the TornadoVM team. His core expertise includes open-source software development, hardware abstractions for high-level programming languages, compiler optimizations for GPU computing, and enabling large language model (LLM) inference on GPUs for the Java Virtual Machine (JVM).
Michalis is focused on advancing GPU acceleration for machine learning workloads on the JVM through the TornadoVM framework and actively maintains the GPULlama3.java project.
Before joining the University of Manchester, he worked on a range of software stacks at Huawei Technologies and contributed to the open-source machine learning compiler Apache TVM, while working for OctoAI (formerly OctoML), which was later acquired by Nvidia.

talkDetail.rateThisTalk

talkDetail.poortalkDetail.excellent

talkDetail.ratingNotYetAvailable

talkDetail.ratingAvailableWhenStarted

talkDetail.signInRequired

talkDetail.signInToRateDescription

occupancy.title

occupancy.votingNotYetAvailable

occupancy.votingAvailableBeforeStart

talkDetail.signInRequired

occupancy.signInToVoteDescription

comments.title

comments.speakerNotEnabledComments