Flink Forward conference happened last week in Barcelona, Spain. It’s the main Apache Flink event organized by Ververica, which includes many practitioners from companies like Netflix, Apple, Shopify, LinkedIn, etc.
Ververica Announcements
Ververica has made several announcements during the keynote:
Apache Fluss is now available in the Ververica Platform. This makes a lot of sense given the significant investment in the project.
VERA-X: A native vectorized Apache Flink engine. I share some thoughts below.
“Real-Time AI With Rag And LLM Support”: It sounds fancy, but in practice, it typically means having a few UDFs that call OpenAI APIs, which is not that exciting. Also, RAG is so last year! 🙂
The rest of the keynote was a bit underwhelming, but I did appreciate Ben Gamble’s whirlwind demo of the latest features, like the Delta join.
“AI co-host” was very cringe. Please don’t do it.
VERA-X
Here’s my reaction:
As I said, VERA-X is not a new product; Alibaba has been working on it for years (under the name Flash).
After studying the implementation details, I was surprised to realize how similar it is to Iron Vector. E.g., having specialized Row-to-Column and Column-to-Row operations, columnar UDF support, a memory manager, etc.
I’m really excited about this direction. People have tried to replace Flink with new stream-processing engines, but they haven’t succeeded. I think Iron Vector or VERA-X is a healthier approach to getting a significant runtime upgrade.
The Atmosphere on the Ground
The event felt on the smaller side. Only a handful of sponsors and no real “expo hall”. When talks started, the hallways immediately emptied, which to me indicates the large number of engineers in attendance. And whenever I talked to someone in the crowd, they almost always ended up being an engineer or a manager closely involved in using or building on top of Flink.
I also heard many questions like “Why do you think data streaming is not getting enough adoption?”. I have some thoughts, so stay tuned. In the meantime, I’d be curious to compare this event to Current next week.
Interesting Talks
The *Big State* Monster: Taming State Size in Multi-Way Joins with FLIP-516
Great overview of a problem with large joins: they can grow in complexity and state size very quickly. For example, multi-step joins are usually translated into a chain of binary joins, which explodes the state.
The new multi-way join operator can alleviate this problem (and multi-way joins can also be chained to support complex topologies).
This reminds me of the work we did at Shopify to support joins of 10+ streams.
Apache Fluss and the Seven Deadly Sins of Streaming Analytics
Great talk. The reasoning behind Fluss is solid - I’m sold! The adoption is lacking, and the fact that Fluss doesn’t support the Kafka protocol (and may never will!) is not helping. Still, I hope Fluss will get more recognition.
Petabytes, Pipelines & PyFlink: How We Stream, Enrich & Validate Billions of Events
Good talk showing how to build data pipelines with PyFlink. Mentions advanced concepts like data enrichment, DLQs, etc.
Redefining Flink Reliability — Blue/Green Deployments in Production
Blue/Green deployments were one of the key features absent from the Flink Kubernetes Operator. However, basic support for Blue/Green has been added in recent versions. This is a good talk covering the reasons why you may want to use it, along with lots of gotchas and implementation details. Make sure to listen to the Q&A after the talk.
Democratizing Flink SQL at Shopify: Scaling Streaming for Every Developer
In my opinion, my ex-coworker Ryan delivered the best talk of the conference.
Shopify has been busy building an impressive developer experience for Flink. A custom VS Code extension is at the center of it.
It offers:
Flink Notebook experience right in your IDE
Flink Catalog integration
Access to local and remote Flink session clusters
CLI tool for creating UDFs
It seems to be inspired by this project from eBay. Shopify is planning to open-source its extension as well.
The talk also covered a brief history of Flink at Shopify, Stream/Batch unification efforts and their K8S setup for Flink.
Flink SQL 2025: Powering Real-Time AI and Stream Processing Innovations
A good overview of the latest additions to Flink SQL: things like CREATE MODEL and ML_PREDICT for working with LLMs, VARIANT type for making it easier to work with JSON, Delta and Multi-way Joins.
Powering Stateful Joins at Scale with Flink SQL at LinkedIn
This talk describes some of the internals of the managed streaming SQL platform at LinkedIn, focusing on practical challenges when running a large stateful Flink pipeline.
A few interesting observations:
LinkedIn still uses Flink 1.16 (released exactly 3 years ago!)
They switched from SATA SSDs to NVMe SSDs to boost IO performance (duh). Whenever someone asks me how to speed up a stateful Flink pipeline, my top answer is always: use the fastest SSDs you can get.
Dynamic, Scalable, and Schema-Evolving: Introducing the Flink Dynamic Iceberg Sink
A must-watch if you use Apache Iceberg sink with Flink.
There were many more great talks I didn’t have a chance to attend. The recordings will be available soon, and I highly recommend checking them out.
Irontools
Iron Vector is a native, columnar, vectorized, high-performance accelerator for Apache Flink SQL and Table API pipelines.
It’s easy to install, requires no code changes, and can increase compute efficiency by up to 2x (as of now).
Check the announcement here.
Events
Find me at Current, New Orleans 🇺🇸 next week (October 29th - October 30th).