The KFC Stack: When Kafka, Flink and ClickHouse Actually Earn Their Keep
The KFC stack - Kafka, Flink and ClickHouse - has become the default blueprint for real-time analytics. It is on architecture diagrams at every scale-up, in every vendor deck, in every “modern data platform” talk at conference season. It is a genuinely excellent architecture for the problems it was designed to solve.
It is also massively over-adopted. Most teams running all three components could delete Flink entirely and lose nothing. Some could delete Kafka too. The stack has become a cargo cult choice - adopted because it is what the big logos run, not because the workload actually demands it.
This post is the honest breakdown. What each piece does, when the full stack earns its keep and when a much simpler version of it will do the same job for a tenth of the operational cost.
What Each Component Actually Does
Kafka is a durable, ordered log. Producers write events. Consumers read them at their own pace, independently of each other, with configurable retention. That is it. Everything else you have heard about Kafka is built on that primitive. It is the event backbone because nothing else in open source does durable ordered fan-out at this scale with the same operational maturity.
Flink is a stateful stream processor. It sits between raw events and analytical storage and does the work that cannot be expressed as a simple transform - windowed joins across multiple streams, exactly-once semantics for financial or fraud workloads, complex event patterns, sessionization with custom logic. Flink is the piece that does real computation on moving data.
ClickHouse is a columnar analytical database. It is built to answer aggregate queries across billions of rows in sub-second time. Dashboards, user-facing analytics, ad hoc exploration over huge time-series datasets. If you have ever watched Postgres crawl on a GROUP BY over a 10 billion row table, ClickHouse is what the other side of that problem looks like.
Put together you get a pipeline that ingests high-velocity events, does stateful processing on them in flight and lands the results somewhere that can answer analytical queries at interactive speed. When you actually need all three capabilities, it is a beautiful architecture.
When The Full Stack Earns Its Keep
The full KFC stack is the right call when you have all three of these things at once:
High-velocity event streams that multiple systems consume. You are ingesting millions of events per second, and more than one downstream system needs the same stream. Without Kafka you end up with point-to-point spaghetti.
Stateful stream computation you cannot express in SQL. Windowed joins across streams with different arrival rates, exactly-once processing where duplicates would cause real money to move twice, complex event pattern matching. This is Flink’s actual job.
Interactive analytical queries on the processed output. You need sub-second aggregate queries on billions of rows. A dashboard that has to load in 400ms. A user-facing analytics product where customers filter and slice huge datasets live.
Real examples where the full stack earns it: fraud detection pipelines that join card transactions, device fingerprints and user history in a ten second window and need exactly-once guarantees. Ad attribution that joins impressions and conversions with 30 day windows across petabyte-scale streams. Financial market data platforms doing online feature computation for trading models. IoT telemetry at the scale where you are ingesting from millions of devices and doing real-time anomaly detection.
If that is your workload, KFC is not over-engineering. It is the right tool.
When You Should Not Be Running Flink
Most teams do not have that workload. They have a moderate event stream, some transformations and a need for fast analytical queries. Flink is doing nothing in that pipeline that could not be done by ClickHouse itself.
ClickHouse has a Kafka Table Engine. You point it at your Kafka topic and it consumes directly. A materialized view does the transformation and writes into your actual table. No Flink job. No checkpoints to tune. No state backend to operate. For straightforward stateless transforms, filtering, enrichment joins against a reference table and time-based aggregations, the Kafka Table Engine plus materialized views covers 80% of what teams think they need Flink for.
The tell that you do not need Flink: your stream processing logic is a SQL query with a GROUP BY and maybe a join against a small dimension table. If Flink job code is effectively a SQL statement, ClickHouse can do it natively without adding an entire stateful processing framework to your operational footprint.
You might not need Kafka either. If you have a single producer and a single consumer, and your retention requirement is a few hours of replay, a simpler queue or even direct HTTP ingestion into ClickHouse might be the right answer. Kafka earns its keep when multiple systems consume the same stream or when the event log itself is valuable for replay and reprocessing.
The Operational Cost Nobody Mentions In The Architecture Diagram
Architecture diagrams make KFC look clean. Three boxes with arrows. The operational reality is not three boxes.
Kafka means a ZooKeeper or KRaft quorum, broker tuning, partition strategy, consumer group management, schema registry, monitoring for rebalance storms and a story for what happens when a broker loses its disk. This is a real platform team’s worth of work if you run it yourself. Confluent Cloud or MSK removes some of the operational burden but adds cost and vendor lock-in.
Flink is the expensive one. Checkpoint tuning is its own subdiscipline. State backends (RocksDB, filesystem) need capacity planning. Job upgrades with state migration are genuinely hard. Backpressure debugging requires deep understanding of the runtime. Savepoints, watermarks, late events, exactly-once semantics across sources and sinks - every one of those is a paragraph in an on-call runbook.
ClickHouse is the easiest of the three to operate but still has real gotchas. Merge tree tuning, replication setup, the fact that mutations are expensive and async, the peculiar behavior of JOIN compared to a traditional OLTP database. ClickHouse Cloud helps. Running it yourself is not hard but is not free either.
If you adopt the full KFC stack, budget for at least one dedicated data platform engineer and probably more. If you cannot, pick a smaller stack you can actually operate.
A Saner Starting Point
Here is the rollout I would recommend to most teams building real-time analytics today:
Stage one. Kafka plus ClickHouse with the Kafka Table Engine and materialized views. No Flink. You get durable event intake, fan-out to multiple consumers and fast analytical queries. This covers a huge range of use cases with two moving pieces instead of three.
Stage two. Add Flink only when you hit a concrete requirement the ClickHouse materialized view cannot express. Stateful windowed joins across streams. Exactly-once semantics on a pipeline where duplicates cost real money. Complex event pattern detection. Do not add Flink because a blog post said you needed it.
Stage three. If you are not at Kafka-scale yet and have one producer and one consumer, start even simpler. ClickHouse can ingest directly from HTTP, from S3, from object storage with scheduled loads. Graduate to Kafka when you have the second consumer or the replay requirement.
Scale up when the evidence pushes you there, not in advance.
What The Managed Services Story Looks Like
If you do adopt the full stack, the managed service landscape in 2026 is mature enough that running these yourself is mostly a choice. Confluent Cloud and AWS MSK for Kafka. Ververica Cloud, AWS Managed Flink and Decodable for Flink. ClickHouse Cloud, Altinity and Aiven for ClickHouse. The managed offerings are not free but they remove most of the operational tax.
The case for self-hosting is regulatory, cost at very large scale or deep customization. For most teams the managed path is the right call. Spend the engineering time on your actual product instead of tuning JVM flags on a Flink task manager.
Takeaway
KFC is a legitimate architecture. It solves real problems that smaller stacks cannot. It is also the default answer to “what is our real-time analytics stack” at far too many companies that do not have the workload to justify it.
Start with ClickHouse. Add Kafka when you have multiple consumers or need durable replay. Add Flink when you have stateful processing that ClickHouse genuinely cannot express. Every piece of the stack has an operational cost, and an architecture you cannot run is worse than a simpler one that ships.
FAQ
Do I need Flink if I am already running Kafka and ClickHouse?
Usually not. ClickHouse has a native Kafka Table Engine and materialized views that cover most stateless transforms, filtering and aggregation use cases. Add Flink only when you have stateful windowed joins across streams, exactly-once requirements or complex event pattern detection that cannot be expressed as a SQL materialized view.
Can ClickHouse replace Kafka?
For very simple single-producer single-consumer pipelines, yes. ClickHouse can ingest directly from HTTP, S3 or object storage. You should add Kafka when you have multiple downstream consumers of the same event stream, or when the event log itself needs to be replayable across reprocessing runs.
Is ClickHouse a good fit for operational workloads?
No. ClickHouse is built for analytical queries. It handles inserts and aggregations at huge scale but is not a replacement for Postgres or MySQL for transactional, row-level, update-heavy workloads. Use it for the analytical side of your system, not the transactional side.
What is the biggest operational gotcha in the KFC stack?
Flink state management. Checkpoint tuning, state backend sizing, job upgrades that require state migration and backpressure debugging are all genuinely hard. If you are adopting Flink, budget for someone on the team who can go deep on its runtime, or pay for a managed offering that abstracts most of it away.
Should I use managed services or run KFC myself?
For most teams, managed. Confluent Cloud or MSK for Kafka, Ververica or AWS Managed Flink for Flink, ClickHouse Cloud for ClickHouse. Self-hosting makes sense when you have strict regulatory requirements, very large scale where the cost curve flips or need deep customization that managed offerings do not support.