Data & (Gen)AIData & (Gen)AI
Hands-On Lab (2h)120min
INTERMEDIATE

Putting AI Into Real-time ETL with Apache Flink, Debezium, and LangChain4j

This hands-on lab explores real-time ETL using Apache Flink, Debezium, and LangChain4j. Participants will set up real-time data pipelines, stream data from an operational database to an analytics data store, and enable use cases such as full-text search and live dashboarding. They'll learn to build data pipelines, use Flink's connector capabilities, implement data transformations, and integrate a large language model for sentiment analysis. Prerequisites include a laptop with Java 11, Apache Maven, and Docker installed.

Gunnar Morling
Gunnar MorlingDecodable
Hans-Peter Grahsl
Hans-Peter GrahslDecodable

talkDetail.whenAndWhere

Tuesday, October 8, 16:50-18:50
BOF 2
talks.description
As the saying goes: nothing is older than yesterday’s news, uhm, data. Join us for an immersive hands-on lab to explore real-time ETL using the triumphant trio Apache Flink, Debezium, and LangChain4j.Participants will gain practical experience in setting up different end-to-end real-time data pipelines, streaming data from an operational database to an analytics data store—continuously, efficiently, and with a very low latency—enabling use cases such as full-text search and live dashboarding, enriched with LLM-derived metadata.In the lab, you will learn how to:Build a real-time data pipeline from Postgres to OpenSearch, based on Apache Flink and Debezium for change data capture (CDC)Use Flink's connector capabilities to set up seamless real-time ETL pipelines between various data sources and sinksImplement data transformations, filtering, and aggregations on top of CDC streams in real time with the help of streaming SQLIntegrate a large language model (LLM) for sentiment analysis based on LangChain4j, enabling deeper insights into the processed dataJoin this lab to advance your skills in working with real-time data and learn how robust and leading open-source technologies support your business-critical stream processing workloads.please pull the following Docker images onto your laptop before.This will save some time and network bandwidth on the day of the event:docker image pull quay.io/debezium/example-postgres:2.7.3.Finaldocker image pull quay.io/debezium/tooling:latestdocker image pull docker.io/opensearchproject/opensearch:1.3.19docker image pull docker.io/flink:1.19.1-scala_2.12-java17docker image pull docker.io/hpgrahsl/hol-devoxxbe-model-serving-app:1.0.0docker image pull docker.io/hpgrahsl/hol-devoxxbe-review-app:1.0.1docker image pull docker.io/hpgrahsl/data-generator:1.1.4
Apache Flink
Real-time ETL
Data Pipelines
Langchain4j
talks.speakers
Gunnar Morling

Gunnar Morling

Decodable

Germany

Gunnar Morling is a software engineer and open-source enthusiast by heart, currently working at Decodable on real-time ETL based on Apache Flink. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct. Gunnar is an avid blogger (morling.dev) and has spoken at various conferences like QCon, Java One, and Devoxx. He lives in Hamburg, Germany.
Hans-Peter Grahsl

Hans-Peter Grahsl

Decodable

Austria

Hans-Peter Grahsl is a Staff Developer Advocate at Decodable. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community awards. He likes to code and is a regular speaker at developer conferences around the world.
comments.title

comments.speakerNotEnabledComments