Practical LLM Inference in Modern Java

The session will explore practical methods of implementing local Large Language Model (LLM) inference in Java environments. It will demonstrate using the latest Java features for local inference of open-source LLMs, optimizing CPU performance, creating a flexible LLM framework, and integrating with LangChain4j for streamlined execution. Attendees will learn about optimizing performance with the Java Vector API, GraalVM, and how to create efficient AI applications using the latest Java technologies.

Alina YurenkoOracle Labs

Alfonso² PeterssenOracle Labs

talkDetail.whenAndWhere

Thursday, October 10, 13:50-14:40

Room 6

talks.description

Large Language Models (LLMs) have become essential in many applications, but integrating them effectively into Java environments can still be challenging. This session will explore practical approaches to implementing local LLM inference using modern Java.We'll demonstrate how to leverage the latest Java features to implement local inference for a variety of open-source LLMs, starting with Llama 2&3 (Meta). Importantly, we'll show how the same approach can easily be extended to run other popular open-source models on standard CPUs without the need for specialized hardware.Key topics we'll cover:- Implementing efficient LLM inference engines in modern Java for local execution- Utilizing Java 21+ features for optimized CPU-based performance- Creating a flexible framework adaptable to multiple LLM architectures- Maximizing standard CPU utilization for inference without GPU dependencies- Integrating with LangChain4j for streamlined local inference execution- Optimizing performance with Java Vector API for accelerated matrix operations and leveraging GraalVM to reduce latency and memory consumption.Join us to learn about implementing and optimizing local LLM inference for open-source models in your Java projects and creating fast and efficient AI applications using the latest Java technologies.

LLMs

Performance

Java

Inference

talks.speakers

Alina Yurenko

Oracle Labs

Switzerland

Alina is a developer advocate for GraalVM at Oracle Labs, a research & development organization at Oracle. Loves both programming and natural languages, compilers, and open source.

Alfonso² Peterssen

Oracle Labs

Switzerland

Industry researcher, member of the GraalVM team, working on Espresso: a meta-circular Java bytecode interpreter.

comments.title

comments.speakerNotEnabledComments