Discussion about this post

User's avatar
Tom Scott's avatar

Great piece and thanks for the Streambased shout out. It's true we follow the same principles as Iceberg Topics but we invert the flow. Instead of serving Kafka clients from Iceberg data as Iceberg Topics does we serve Iceberg from Kafka topics (Iceberg processors reading data directly from Kafka).

The nice thing about this approach is it is not dependent on tiered storage so will work with any Kafka compatible service. What's more, this means that we don't have to wait for data to be tiered before it is available as Iceberg, everything from the beginning of the topic to the latest offset is accessible.

Expand full comment
Kaiming Wan's avatar

> ... but this seems to be the first free and open-source implementation

This is not accurate. AutoMQ's Table Topic capability was officially released in 2024 and was officially open-sourced on May 19th. You can find the release information here: https://github.com/AutoMQ/automq/releases/tag/1.5.0

The difference from Aiven's solution is that AutoMQ adopts a Copy Based strategy. This is mainly based on the following considerations.

1. Materializing Topic data into Iceberg requires writing logs, converting to table format, and persisting. Storage/network costs are already incurred; keeping logs adds little extra cost.

2. Table Topic lets users set shorter log TTLs, balancing cold read performance and cost—flexibility users value.

3. Iceberg is a table; Topics are append-only. Can “zero copy” handle PK tables (e.g., CDC)? When data lands in Iceberg, CRUDs must be applied.

4. If “zero copy” supports PK tables, does Kafka lose its “source of truth” status? Can the original stream still be reconstructed?

We also have a blog to explain how we achieve this feature: https://www.automq.com/blog/automq-kafka-to-iceberg-table-topic

Expand full comment
5 more comments...

No posts