Current London 2025

Inaugural Current London.

May 29, 2025

Current London 2025 happened last week in London, UK. I was fortunate enough to attend it, and I’d love to share some notes with you.

Key Themes

Of course, AI was everywhere: keynote, talks, and hallway chats. AI for coding, AI agents, AI for writing Flink jobs (!). It seems inevitable for any tech conference nowadays, regardless of its focus. To me, this is both exciting and scary; I’ll expand on this another day.

Iceberg, Tableflow, Delta Lake, and similar tech are still extremely relevant, but people seem to be more informed nowadays. The current version of Iceberg is not ideal for streaming, and it’s just terrible for changelog data streams.

Finally, I feel like the data streaming industry is still in a tough spot. The growth is slow, and the sales cycles are long. One person I spoke with said that “80% of the companies in the Expo hall will be dead in two years”. I don’t want to believe them, but it might be true.

Keynote

The keynote didn’t have any big, jaw-dropping announcements (I hope to see them in the next Current in New Orleans).

AI was obviously mentioned several times. Confluent’s messaging stays consistent over the years: AI needs data, specifically real-time data. But this year they introduced the idea of Flink jobs as AI agents! Coincedentaly, at the same time, FLIP-531: Initiate Flink Agents as a new Sub-Project was introduced, and 2/3 authors are Confluent employees. This tells me that Confluent is serious about the agents idea getting open-source adoption quickly.

One of the most interesting announcements for me was the snapshot queries. This is how I understood it:

Confluent Flink SQL queries can operate in a “snapshot” / batch way: they don’t run continuously, but stop after getting initial (?) results.
If Tableflow is enabled for a source topic, the query leverages the underlying Iceberg storage first, and then switches to Kafka, if needed (?).

This makes a lot of sense! I’ve been sharing this pattern for a while: Flink’s HybridSource makes it really straightforward to implement. It’s nice to see this as a fully managed product feature.

I also captured several great quotes during the keynote:

“Apache Flink is the key to making shift left practical” (Shaun Clowes).
“With Flink, what was possible in the analytical estate, now is possible in the operational estate” (Shaun Clowes).
“We don’t need ETL, we don’t need ELT”, in the context of Tableflow (Shaun Clowes).
“With Tableflow, your streams are tables” (Ahmed Saef Zamzam).
“Companies are becoming software” (Jay Kreps).

Talks

Here are a few solid talks I had a chance to attend:

Flink Jobs as Agents 🤖 – Unlocking Agentic AI with Stream Processing. As I mentioned above, AI was everywhere. This talk showed a way to leverage Flink for building agents: using various sources and transformations to build context, performing actions using that context, and then consuming events emitted by those actions, which potentially affects the context. Steffen also mentioned using AI for creating Flink SQL pipelines (!) and Complex Event Processing (CEP).

Building Stream Processing Platform at OpenAI. Great coverage of OpenAI’s data platform and their challenges with PyFlink. OpenAI introduced proxies for consumers and producers, hiding cluster abstractions1. This has a number of benefits, like straightforward HA, better scaling, etc. OpenAI heavily uses PyFlink; they shared their concerns about PyFlink's efficiency and lack of some features.
FlinkSQL Powered Asynchronous Data Processing in Pinterest’s Rule Engine Platform. Pinterested shared some insights about their rule engine, which helps fight spam (one of many use cases). It was interesting to see backfilling mentioned as a first-class citizen: this is a very mature and pragmatic decision, I’d like to see more data streaming projects acknowledging it.
Unified CDC Ingestion and Processing with Apache Flink and Iceberg. How far can you go to engineer a feature that’s not supported out of the box? Medidata Solutions and Decodable gave a masterclass 🙂. They had to engineer a sophisticated system to handle changelog data streams in Flink Iceberg, which is not supported out of the box2.
Simplifying Real-Time Vector Store Ingestion with Apache Flink. This was another masterclass, specifically on writing SQL UDFs. I loved detailed code snippets shared by Hans-Peter: we need more hands-on talks like this one.
Democratising Stream Processing: How Netflix Empowers Teams with Data Mesh and Streaming SQL. Netflix has been working on their internal data streaming platform for a while. It currently handles 14 trillion records a day (~160M/s on average), which is very impressive! Sujay talked about their latest initiative called “Data Mesh”3, which offers a high-level interface for defining streaming sources, sinks, and processors (using Flink SQL). I liked how they rigorously rely on schemas to prevent breaking changes. Also, Data Mesh has some really neat features: Iceberg Lookup join, query preview, revision history, and autoscaling. Unfortunately, they still haven’t figured out updates for stateful Flink SQL jobs.
Flink SQL Revolutions: Breaking Out of the Matrix with PTFs. This talk is similar to previous talks by Timo about PTFs, but it has many neat examples! So, if you’re interested in learning about PTFs, I recommend this talk. At the end of it, I was convinced that when PTFs are mature and polished, the need to use the DataStream API will be significantly reduced.

There were so many great talks I haven’t had a chance to watch! I’m eagerly waiting for the recordings of these:

From Zero to Hero: petabyte-scale Tiered Storage lessons.
Building Stream Processing Platform at OpenAl.
Tableflow: Not Just Another Kafka-to-Iceberg Connector.
Queues for Kafka.
… and many more!

Kafka Summit → Current

This was the first Current London. Last year and a few years before, London hosted Kafka Summits.

When Confluent announced Current as the next generation of Kafka Summit, they talked about “a place for everyone in the ecosystem to come together and share their knowledge and best practices”. Hilariously, the reality was quite the opposite.

Redpanda was banned from participating in the event.

I don’t know the details, but I got a confirmation from several Redpanda employees.

I guess this could’ve been one of the reasons to rebrand the community Kafka Summit conference, an event about the open-source Apache Kafka technology, into a vendor-specific event4.

Also, AWS didn’t have a booth at the conference despite being a gold sponsor (most of the other gold sponsors had presence). I can’t say why. AWS competes with Confluent with their managed Kafka and Flink offerings.

All of this suggests that the competition in the data streaming space is only intensifying 🍿.

Personal Announcement

I’ve recently launched Irontools: a suite of Apache Flink extensions to make your streaming pipelines faster, leaner, and more flexible.

This is a well-known pattern. I believe Netflix started using it as early as 2017.

IMO, implementing this as a feature in Iceberg could’ve probably been easier…

Which has nothing to do with this Data Mesh.

Yes, Confluent organized many Kafka Summits in the past. However, my point is that the spirit of those events was always community-first.

Adi Polak

May 29

"There were so many great talks I haven’t had a chance to watch!"

I’m just here to say thank you. Serving on the Program Committee is a true labor of love—and many spent days (and nights!) shaping the program. I’m genuinely grateful you captured it so elegantly. It means a lot.

Expand full comment

Data Streaming Journey

Discussion about this post