Skip to main content

Kafka

The Kafka Data Pool lets you ingest real-time streaming data into Propel. It provides an easy way to power real-time dashboards, streaming analytics, and workflows with a low-latency data API on top of your Kafka topics.

Consider using Propel on top of Kafka when:

  • You need an API on top of a Kafka topic.
  • You need to power real-time analytics applications with streaming data from Kafka.
  • You need to ingest Kafka messages into ClickHouse.
  • You need to ingest from self-hosted Kafka, Confluent Cloud, AWS MSK, or Redpanda into ClickHouse.
  • You need to transform or enrich your streaming data.
  • You need to power real-time personalization and recommendations for use cases.

Get started​

Set up guide

Follow our step-by-step Kafka setup guide to connect your Kafka cluster to Propel.

Architecture Overview​

The Kafka Data Pools connect to specified Kafka topics and ingest data in real-time into Propel to power your data applications.

The architectural overview when connecting Kafka to Propel.

Features​

Kafka Data Pools support the following features:

Feature nameSupportedNotes
Real-time ingestion✅See How the Kafka Data Pool works.
Deduplication✅See the deduplication section.
Batch Delete API✅See Batch Delete API.
Batch Update API✅See Batch Update API.
API configurable✅See Management API docs.
Terraform configurable✅See Propel Terraform docs.

How does the Kafka Data Pool work?​

Propel creates a Data Pool that collects messages from one or many Kafka topics. The Data Pool can be queried via SQL and API or transformed with Materialized Views.

Once the connection is established, Propel synchronizes all accessible messages from the Kafka topics. It goes to the earliest available offset for each topic partition and starts consuming the messages. These messages are then loaded into the Data Pool.

Schemaless ingestion​

The Kafka Data Pool ingests the message body into the _propel_payload and all the Kafka and ingestion-related metadata into the other columns. This approach provides flexibility, allowing JSON Kafka messages to be ingested without requiring pre-defined schemas. It is particularly useful when dealing with dynamic or constantly evolving data structures.

ColumnTypeDescription
_timestampTIMESTAMPThe timestamp of the message.
_topicSTRINGThe Kafka topic
_keySTRINGThe key of the message.
_offsetINT64The offset of the message.
_partitionINT64The partition of Kafka topic.
_propel_payloadJSONThe raw message Payload in JSON.
_propel_received_atTIMESTAMPWhen the message is read by Propel.

Message deduplication​

The Kafka Data Pool automatically manages the deduplication of messages. This happens when messages are either sent twice by the producer or read twice due to intermittent connectivity between Propel and the Kafka stream. The uniqueness of a message is determined by the combination of _topic, _partition, and _offset.

Supported formats​

The Kafka Data Pool supports the ingestion of JSON messages that are stored in the _propel_payload column.

If you need AVRO support, please contact us.

Transforming data​

Once the data has been ingested into the Kafka Data Pool, you can create Materialized Views to transform the data. This includes transformations such as filtering, aggregation, and joining with other data. These transformations are defined using SQL and can be updated in real time as new data arrives.

Materialized Views can be used to:

  • Separate the messages from a specific topic into their own tables, each with its own schema.
  • Handle real-time updates and deletes for mutable data.
  • Transform data in real time.
  • Enrich data joining with other Data Pools.

Learn more about Transforming your data with Materialized Views.

Management API​

Below is the relevant API documentation for the Kafka Data Pool.

Queries​

Mutations​

Limits​

No limits at this point.