Skip to main content

Connect your data

In this section, you will find all the information you need to connect your data to Propel. This overview guide covers the different data sources supported, how the data is synced from the data sources to Propel, and the key concepts you need to familiarize yourself with.

High-Level Overview

The two key concepts to understand how data flows are Data Sources and Data Pools. Data Sources represent the source of the data, while Data Pools are Propel’s high-speed data store and cache optimized for serving data with low latency via the API.

Propel supports various types of data sources, including data warehouses and data lakes, event-based data sources, and databases. The diagram below shows an example of a Snowflake and a webhook data source syncing and collecting data into Propel Data Pools to then be served via the GraphQL API to your apps.

A high level overview of how data is connected to Propel.

Key Concept #1: Data Sources

The first concept to familiarize yourself with is Data Sources. They represent the source of the data, which can be a data warehouse, such as Snowflake or BigQuery, a database like Postgres or MySQL, or an event producer, such as a webhook or a stream. The Data Source object in Propel holds the necessary configuration information to connect to an underlying data store or to collect events.

Sync modes

Sync modes determine how Propel reads or collects data from the source and writes it into Propel. It is important to understand how the data is synced because different Data Sources support different modes to get your data into Propel. The different sync modes include:

  • Incremental append: This mode incrementally syncs new records from the source and appends them to Propel. Ideal for immutable event data.
  • Incremental append + update (preview): This mode incrementally syncs new and updated records from the source and appends or updates them in Propel. Ideal for tables that slowly update. Contact us for early access to the preview.

Also, see the Deleting Data section below to learn more about deleting data from Propel.

Available Data Sources

Propel supports three types of data sources: data warehouses and data lakes, event-based data sources, and databases. They are supported either via a native integration or via AWS S3. A native integration is a direct connection that is configured in Propel to connect to the data source. An integration via AWS S3 requires you to export records incrementally to Parquet files in an AWS S3 bucket that are then synced into Propel via the AWS S3 Data Source. We offer a step-by-step guide for each natively supported data source and integration through S3 + Parquet. You can find these guides in the corresponding sections of each data source.

Data warehouses and data lakes

Data SourceSync mode supportedIntegration
SnowflakeIncremental append
Incremental append + updates (preview)
Native
AWS S3 (Parquet)Incremental appendNative
BigQueryIncremental appendVia S3 + Parquet
AWS RedshiftIncremental appendNative (Preview)
DatabricksIncremental appendVia S3 + Parquet

Event sources

Data SourceSync mode supportedIntegration
WebhooksIncremental appendNative (Preview)
Kinesis FirehoseIncremental appendVia S3 + Parquet
KafkaIncremental appendVia S3 + Parquet

Databases

Data SourceSync mode supportedIntegration
PostgresIncremental appendVia S3 + Parquet
MySQLIncremental appendVia S3 + Parquet
DynamoIncremental appendVia S3 + Parquet
MongoDBIncremental appendVia S3 + Parquet

Don’t see a data source you need or want access to any preview? Let us know.

Key Concept #2: Data Pools

The second key concept that you need to familiarize yourself with is Data Pools. Data Pools are Propel's high-speed data store and cache that is optimized for serving data with low latency (sub-second response times) and high concurrency (for thousands or millions of users). All queries to the Propel APIs are served from Data Pools, not their underlying data source.

A high level overview of how data is connected to Propel.

Understanding Event-Based Data Pools

Important

🚧 The Webhook Data Source is in preview. Please reach out if you are interested in getting early access.

Event-based Data Sources, such as Webhooks or Kinesis Firehose, collect events and write them into Data Pools. Events are collected and synced to Data Pools every minute. These Data Pools have a very simple schema:

ColumnTypeDescription
received_atTIMESTAMPThe timestamp when the event was collected in UTC.
payloadJSONThe JSON Payload of the event

The event-based Data Pools have a one-to-one association with their Data Sources. If an event-based Data Source is deleted, its corresponding Data Pool will be deleted as well.

Understanding Data Warehouse and Data Lake-Based Data Pools

Data warehouses and data lake-based Data Pools, such as Snowflake or AWS S3, synchronize records at a given interval from the source table and write them into Data Pools. You can create multiple Data Pools from a single Data Source as you might need to bring multiple tables into Propel to query them via the API.

Data warehouses and data lake-based Data Pools also offer additional properties that enable you to control their synchronization behavior. These include:

  • Sync interval: Determines how often Propel checks for new data to synchronize. For near real-time applications, the interval can be as short as 1 minute, while for applications with more relaxed data freshness requirements, it can be set to once a day or anything in between.
  • Pausing and Resuming Syncing: Control whether a Data Pool is synchronizing data or not. When paused, Propel stops synchronizing records to your Data Pool. When resumed, it will start syncing on the configured interval.
  • (Advanced) Custom Cursor: The cursor is a pointer to a row that tracks which records have been synchronized. The row is identified by a value in a given column, like a date in a timestamp column. Data Pools default to the primary timestamp column to drive the cursor. However, in specific cases, such as if you have to handle late-arriving events or updates, you might need to select a different column to drive the cursor. For more details, read our guide on Working with Custom Cursors.

Deleting data

Deleting data is an important feature for compliance and GDPR. Propel provides a simple way to delete data using the requestDelete API. This schedules a new Deletion Job on the specified Data Pool, and data matching the filters provided will be deleted. Keep in mind that deleting data is permanent and cannot be undone, so use this feature with caution.

Here's an example of how to use this API:

mutation {
requestDelete(
input: {
dataPool: "DPO00000000000000000000000000"
filters: [
{ column: "column_name", operator: "equals", value: "value_to_delete" }
]
}
) {
id
}
}

Note that this API may take some time to complete, depending on the amount of data to be deleted. Remember, deleting data is permanent and cannot be undone, so use this feature with caution.

Key Guides

Here are some key guides to help you as you onboard your data to Propel:

Frequently Asked Questions

How long does it take for my data to be synced into Propel? Is Propel real-time?

Once data gets to Propel via syncs or events, it is available via the API in 2-4 minutes.

In what region is the data stored?

The data is stored in the AWS US East 2 region. We are working on expanding our region coverage. If you are interested in using Propel in a different region, please contact us.

How much data can I bring into Propel?

As much as you need. Propel does not have any limits on how much data you bring. You should think of the data in Propel as the data you need to serve to your applications.

How long does Propel keep the data?

You can keep data in Propel for as long as you need. For instance, if your application requires data for only 90 days, you can use the Delete API to remove data after 90 days.

Can you sync only certain columns from a table into a Data Pool?

Yes. When you create the Data Pool, you can select which columns from the underlying table you want to sync. This is useful if there is PII or any other data that you don’t need in Propel.

What happens if the underlying data source is not available? For example, what happens if Snowflake is down?

Even if the underlying data source is down, Propel will continue to serve data via the API. New data will not sync until the data store comes back online.

When does the syncing interval start?

The syncing interval starts when your Data Pool goes LIVE or when Syncing is resumed.

Data Source and Data Pool APIs

Everything that you can do in the Propel Console, you can do via the API. This means you can programmatically create and manage Data Sources and Data Pools using the APIs below:

Queries

Mutations