Kafka to ClickHouse

Ingest real-time data from self-hosted Kafka, Confluent Cloud, AWS MSK, or Redpanda to Propel.

Get started with Kafka

Step-by-step instructions to connect your Kafka cluster to Propel.

Architecture

The Kafka Data Pools connect to specified Kafka topics to ingest data in real-time into Propel.

Features

Kafka Data Pools support the following features:

Feature name	Supported	Notes
Real-time ingestion	✅	See How the Kafka Data Pool works.
Deduplication	✅	See the deduplication section.
Batch Delete API	✅	See Batch Delete API.
Batch Update API	✅	See Batch Update API.
API configurable	✅	See API](/docs/management-api) docs.
Terraform configurable	✅	See Terraform docs.

How does the Kafka Data Pool work?

The Kafka Data Pool connects to specified Kafka topic to read messages in real-time. It starts from the earliest available offset to start consuming messages.

These messages are then ingested into the Data Pool. Once in the Data Pool, you can query them via SQL, the Query APIs, or transform them with Materialized Views.

Schemaless ingestion

The Kafka Data Pool stores the message body in the _propel_payload column and the Kafka and ingestion-related metadata in other columns.

This flexible approach allows JSON Kafka messages to be ingested without needing pre-defined schemas.

Column	Type	Description
`_timestamp`	TIMESTAMP	The timestamp of the message.
`_topic`	STRING	The Kafka topic
`_key`	STRING	The key of the message.
`_offset`	INT64	The offset of the message.
`_partition`	INT64	The partition of Kafka topic.
`_propel_payload`	JSON	The raw message Payload in JSON.
`_propel_received_at`	TIMESTAMP	When the message is read by Propel.

Message deduplication

The Kafka Data Pool automatically manages the deduplication of messages. This happens when messages are either sent twice by the producer or read twice due to intermittent connectivity between Propel and the Kafka stream. The uniqueness of a message is determined by the combination of _topic, _partition, and _offset.

Supported formats

The Kafka Data Pool supports the ingestion of JSON messages that are stored in the _propel_payload column.

If you need AVRO support, please contact us.

Transforming data

Once your data is in a Kafka Data Pool, you can use Materialized Views to:

Overview Setup guide

On this page

Architecture
Features
How does the Kafka Data Pool work?
Schemaless ingestion
Message deduplication
Supported formats
Transforming data

Ingest real-time data from self-hosted Kafka, Confluent Cloud, AWS MSK, or Redpanda to Propel.

Get started with Kafka

Step-by-step instructions to connect your Kafka cluster to Propel.

Architecture

The Kafka Data Pools connect to specified Kafka topics to ingest data in real-time into Propel.

Features

Kafka Data Pools support the following features:

Feature name	Supported	Notes
Real-time ingestion	✅	See How the Kafka Data Pool works.
Deduplication	✅	See the deduplication section.
Batch Delete API	✅	See Batch Delete API.
Batch Update API	✅	See Batch Update API.
API configurable	✅	See API](/docs/management-api) docs.
Terraform configurable	✅	See Terraform docs.

How does the Kafka Data Pool work?

The Kafka Data Pool connects to specified Kafka topic to read messages in real-time. It starts from the earliest available offset to start consuming messages.

These messages are then ingested into the Data Pool. Once in the Data Pool, you can query them via SQL, the Query APIs, or transform them with Materialized Views.

Schemaless ingestion

The Kafka Data Pool stores the message body in the _propel_payload column and the Kafka and ingestion-related metadata in other columns.

This flexible approach allows JSON Kafka messages to be ingested without needing pre-defined schemas.

Column	Type	Description
`_timestamp`	TIMESTAMP	The timestamp of the message.
`_topic`	STRING	The Kafka topic
`_key`	STRING	The key of the message.
`_offset`	INT64	The offset of the message.
`_partition`	INT64	The partition of Kafka topic.
`_propel_payload`	JSON	The raw message Payload in JSON.
`_propel_received_at`	TIMESTAMP	When the message is read by Propel.

Message deduplication

Supported formats

The Kafka Data Pool supports the ingestion of JSON messages that are stored in the _propel_payload column.

If you need AVRO support, please contact us.

Transforming data

Once your data is in a Kafka Data Pool, you can use Materialized Views to:

Overview Setup guide

On this page

Architecture
Features
How does the Kafka Data Pool work?
Schemaless ingestion
Message deduplication
Supported formats
Transforming data

Get started with Kafka

Architecture

Features

How does the Kafka Data Pool work?

Schemaless ingestion

Message deduplication

Supported formats

Transforming data

Get Started

Streaming

Data warehouses

Databases

ETL Platforms

Kafka to ClickHouse

Get started with Kafka

Architecture

Features

How does the Kafka Data Pool work?

Schemaless ingestion

Message deduplication

Supported formats

Transforming data

Get started with Kafka

​Architecture

​Features

​How does the Kafka Data Pool work?

​Schemaless ingestion

​Message deduplication

​Supported formats

​Transforming data

Get Started

Streaming

Data warehouses

Databases

ETL Platforms

Get started with Kafka

​Architecture

​Features

​How does the Kafka Data Pool work?

​Schemaless ingestion

​Message deduplication

​Supported formats

​Transforming data

Architecture

Features

How does the Kafka Data Pool work?

Schemaless ingestion

Message deduplication

Supported formats

Transforming data

Architecture

Features

How does the Kafka Data Pool work?

Schemaless ingestion

Message deduplication

Supported formats

Transforming data