Skip to main content

Airbyte

The Airbyte destination lets you synchronize data from over 350+ sources to Propel's Serverless ClickHouse infrastructure. It provides an easy way to power your customer-facing analytics and data applications with data from any SaaS application, database, or platform supported by Airbyte.

Consider using Airbyte to sync data to Propel when:

  • You need to sync data from a SaaS application or platform like Hubspot, Stripe or Shopify.
  • You need to sync data from a database that Propel does not support natively like Oracle, MSSQL, MySQL, or PostgreSQL.
  • You need to sync data from streaming sources that Propel does not support natively like AWS Kinesis or Google Pub/Sub.

Get started

Set up guide

Follow our step-by-step Airbyte setup guide to connect any Airbye source to Propel.

Features

Data Pools created with the Airbyte destination are Webhook Data Pools that support the following features:

Feature nameSupportedNotes
Syncs inserts, updates, and deletesPropel collects inserts, updates, and deletes sent from Airbyte.
Re-syncYou can trigger a resync in Airbyte.
Configurable sync intervalYou can configure sync interval in Airbyte.
Sync Pausing / ResumingYou can pause / resume syncs in Airbyte.
Sync mode: Full refresh - OverwriteSupportes the Full refresh - Overwrite Airbyte sync mode.
Sync mode: Full refresh - AppendSupportes the Full refresh - Append Airbyte sync mode.
Sync mode: Incremental - AppendSupportes the Incremental - Append Airbyte sync mode.
Sync mode: Incremental - Append + DedupedSupport for the Incremental - Append + Deduped Airbyte sync mode is coming soon.
Schema changesWhen Airbyte detects a column was added in your source table, Propel will automatically add it to the payload in the _propel_payload column. It will not add it as a column in your destination Data Pool.
Delete Job APISee Delete Job API.
API configurableSee API reference docs.
Terraform configurableSee Propel Terraform docs.

How does the Airbyte destination work?

When syncing a Airbyte source, Propel creates a Webhook Data Pool for every source table. Airbyte treats each source table as a stream and sends to Propel every the insert, update, and delete as individual records.

Airbyte tables example

The Data Pools will have the same columns as the source table plus the following columns added by Airbyte and Propel.

ColumnTypeDescription
_propel_received_atTIMESTAMPThe timestamp when the record was received by Propel in UTC.
_propel_payloadJSONThe JSON Payload of the record.
_airbyte_raw_idSTRINGThe unique ID of the record generated by Airbyte.
_airbyte_extracted_atTIMESTAMPThe timestamp when the data was extracted from the source by Airbyte in UTC.

Once connected, Airbyte will perform an initial sync and subsequent syncs based on the sync frequency you have configured. Airbyte will send the data to Propel via the Webhook Data Pool API. Propel will process the data and make it available for querying via the SQL and API.

Default timestamp column

Propel will use the _airbyte_extracted_at as the deafult timestamp.

Unique ID column

Propel will use the _airbyte_raw_id as the unique ID.

Sort Key

Propel use the unique ID and timestamp as the Data Pool's sort key. You can change the sort key by creating a Materialized View that orders the data by a different field.

Updates and deletes

Inserts, updates, and deletes are captured by the Airbyte stream and ingested into the Data Pool. You can handle updates and deletes by creating a Materialized View that processes inserts, updates, and deletes and inserts them into a new Data Pool.

Schema changes

Any schema changes in the source will be automatically reflected in the _propel_payload data in the Data Pool. Schema changes that happen after the connection is set up will NOT be reflected in the Data Pool schema.

Data Types

The following table illustrates how we transform Airbyte data types into Propel-supported types:

Airbyte TypeAirbyte FormatPropel Type
(Not specified or Null)-String
String-String
StringDateDate
StringDateTime (TimestampWOTZ)String
StringDateTimeTimestamp
StringTimeString
Boolean-Boolean
Number-Double
Integer-Int64
Object-JSON
Array-JSON

Transforming data

Once the data has been ingested into the Webhook Data Pool, you can create Materialized Views to transform the data. This includes transformations such as ordering, filtering, aggregation, and joining with other data. These transformations are defined using SQL and can be updated in real time as new data arrives.

Materialized Views can be used to:

  • Separate the messages from a specific topic into their own tables, each with its own schema.
  • Handle real-time updates and deletes for mutable data.
  • Transform data in real time.
  • Enrich data joining with other Data Pools.

Learn more about Transforming your data with Materialized Views.

API reference documentation

Below is the relevant API documentation for the Webhook Data Pool.

Queries

Mutations