Kafka
The Kafka Data Pool lets you ingest real-time streaming data into Propel. It provides an easy way to power real-time dashboards, streaming analytics, and workflows with a low-latency data API on top of your Kafka topics.
Consider using Propel on top of Kafka when:
- You need an API on top of a Kafka topic.
- You need to power real-time analytics applications with streaming data from Kafka.
- You need to ingest Kafka messages into ClickHouse.
- You need to ingest from self-hosted Kafka, Confluent Cloud, AWS MSK, or Redpanda into ClickHouse.
- You need to transform or enrich your streaming data.
- You need to power real-time personalization and recommendations for use cases.
Get started​
Follow our step-by-step Kafka setup guide to connect your Kafka cluster to Propel.
Architecture Overview​
The Kafka Data Pools connect to specified Kafka topics and ingest data in real-time into Propel to power your data applications.
Features​
Kafka Data Pools support the following features:
Feature name | Supported | Notes |
---|---|---|
Real-time ingestion | ✅ | See How the Kafka Data Pool works. |
Deduplication | ✅ | See the deduplication section. |
Batch Delete API | ✅ | See Batch Delete API. |
Batch Update API | ✅ | See Batch Update API. |
API configurable | ✅ | See Management API docs. |
Terraform configurable | ✅ | See Propel Terraform docs. |
How does the Kafka Data Pool work?​
Propel creates a Data Pool that collects messages from one or many Kafka topics. The Data Pool can be queried via SQL and API or transformed with Materialized Views.
Once the connection is established, Propel synchronizes all accessible messages from the Kafka topics. It goes to the earliest available offset for each topic partition and starts consuming the messages. These messages are then loaded into the Data Pool.
Schemaless ingestion​
The Kafka Data Pool ingests the message body into the _propel_payload
and all the Kafka and ingestion-related metadata into the other columns. This approach provides flexibility, allowing JSON Kafka messages to be ingested without requiring pre-defined schemas. It is particularly useful when dealing with dynamic or constantly evolving data structures.
Column | Type | Description |
---|---|---|
_timestamp | TIMESTAMP | The timestamp of the message. |
_topic | STRING | The Kafka topic |
_key | STRING | The key of the message. |
_offset | INT64 | The offset of the message. |
_partition | INT64 | The partition of Kafka topic. |
_propel_payload | JSON | The raw message Payload in JSON. |
_propel_received_at | TIMESTAMP | When the message is read by Propel. |
Message deduplication​
The Kafka Data Pool automatically manages the deduplication of messages. This happens when messages are either sent twice by the producer or read twice due to intermittent connectivity between Propel and the Kafka stream. The uniqueness of a message is determined by the combination of _topic
, _partition
, and _offset
.
Supported formats​
The Kafka Data Pool supports the ingestion of JSON messages that are stored in the _propel_payload
column.
If you need AVRO support, please contact us.
Transforming data​
Once the data has been ingested into the Kafka Data Pool, you can create Materialized Views to transform the data. This includes transformations such as filtering, aggregation, and joining with other data. These transformations are defined using SQL and can be updated in real time as new data arrives.
Materialized Views can be used to:
- Separate the messages from a specific topic into their own tables, each with its own schema.
- Handle real-time updates and deletes for mutable data.
- Transform data in real time.
- Enrich data joining with other Data Pools.
Learn more about Transforming your data with Materialized Views.
Management API​
Below is the relevant API documentation for the Kafka Data Pool.
Queries​
Mutations​
- Create Data Pool
- Modify Data Pool
- Delete Data Pool by ID
- Delete Data Pool by unique name
- Create Kafka Data Source - coming soon
- Modify Kafka Data Source - coming soon
- Delete Data Source by ID
- Delete Data Source by unique name
Limits​
No limits at this point.