Skip to main content

Connect Your Data

In this section, you will find all the information you need to connect your data to Propel. This overview guide covers the different types of Data Pools supported, how the data is synced from your data sources to Propel, and the key concept you need to familiarize yourself with.

High-level overview

The key concept to understand how data flows is Data Pools. These are Propel's high-speed tables, optimized for serving data with low latency via the API.

Propel supports various types of data, including data from warehouses and lakes, event-based data, and databases. The diagram below shows an example of how data is collected into Propel Data Pools and then served via the GraphQL API to your apps.

A high level overview of how data is connected to Propel.

Supported data sources

Propel supports integration with various types of data: event-based data, data warehouses and data lakes, and databases. These can be integrated either via a native connection or through Amazon S3. We offer step-by-step guides for each supported data integration type, whether native or via Amazon S3 Parquet.

Event and streaming sources

Data sourceIntegration
WebhooksNative
KafkaNative
AWS KinesisVia ELT/ETL Platforms

Data warehouses and data lakes

Data sourceIntegration
SnowflakeNative
Amazon S3 ParquetNative
BigQueryPreview
DatabricksVia ELT/ETL Platforms
AWS RedshiftVia ELT/ETL Platforms

Databases

Data sourceIntegration
ClickHouseNative
PostgreSQLComing soon
MySQLVia ELT/ETL Platforms
DynamoDBVia ELT/ETL Platforms
MongoDBVia ELT/ETL Platforms

ELT / ETL Platforms

Data sourceIntegration
FivetranNative
AirbyteNative

Don't see a data source you need or want access to any preview? Let us know.

Understanding Data Pools

Data Pools are Propel's high-speed data store and cache that is optimized for serving data with low latency (sub-second response times) and high concurrency (for thousands or millions of users). All queries to the Propel APIs are served from Data Pools, not their underlying data source.

A screenshot of a Data Pool in the Propel Console.

Understanding event-based Data Pools

Event-based data sources like the Webhook Data Pool collect and write events into Data Pools. Events are collected and synced to Data Pools every minute. These Data Pools have a very simple schema:

ColumnTypeDescription
_propel_received_atTIMESTAMPThe timestamp when the event was collected in UTC.
_propel_payloadJSONThe JSON Payload of the event

During the setup of a Webhook Data Pool, you can optionally unpack top-level or nested keys from the incoming JSON event into specific columns. See the Webhook Data Pool for more details.

Understanding data warehouse and data lake-based Data Pools

Data warehouses and data lake-based Data Pools, such as Snowflake or Amazon S3 Parquet, synchronize records at a given interval from the source table and write them into Data Pools. You can create multiple Data Pools, one for each table.

Data warehouses and data lake-based Data Pools also offer additional properties that enable you to control their synchronization behavior. These include:

  • Scheduled Syncs: A Data Pool's sync interval determines how often Propel checks for new data to synchronize. For near real-time applications, the interval can be as short as 1 minute, while for applications with more relaxed data freshness requirements, it can be set to once a day or anything in between.
  • Manually triggered Syncs: Syncs can be triggered on-demand when a Data Pool's underlying data source has changed, or in order to re-sync the Data Pool from scratch.
  • Pausing and resuming syncing: Controls whether a Data Pool syncs data or not. When paused, Propel stops synchronizing records to your Data Pool. When resumed, it will start syncing on the configured interval.

Deleting data

Real-time deletes

Most Propel Data Pools support real-time deletions that automatically propagate from the data source. Please refer to the documentation for the specific Data Pool you are using.

Batch deletion

The ability to delete batches of data is crucial for compliance and GDPR. Propel provides a simple way to delete data via the Console or the createDeletionJob API. When you perform a delete operation on a Data Pool, the data matching the provided filters gets deleted. Keep in mind that deleting data is permanent and cannot be undone, so use this feature with caution.

In the console, you can initiate a delete operation by navigating to the Data Pool from which you need to delete data, clicking on the "Operations" tab, and then clicking “Delete data.”

Data Pool operations menu

Here you can specify the filters of the data to delete.

Screenshot 2024-01-04 at 3.57.19 PM.png

Once deleted, you can monitor the progress of the delete operation in the “Operations” tab. Note that this API may take some time to complete, depending on how much data is updated.

Here's an example of how to delete data using the API (read the docs for more details):

mutation {
createDeletionJob(
input: {
dataPool: "DPO00000000000000000000000000"
filters: [{ column: "taco_name", operator: "EQUALS", value: "Breakfast" }]
}
) {
id
}
}

Remember, deleting data is permanent and cannot be undone, so use this feature cautiously.

Updating data

Real-time updates

Most Propel Data Pools support real-time updates that automatically propagate from the data source. Please refer to the documentation for the specific Data Pool you are using.

Batch updates

Updating batches of data is an important feature to maintain data integrity and to backfill data when there are schema changes. Propel provides a simple way to update data asynchronously using the Console or the createUpdateDataPoolRecordsJob API. An update data operation on a Data Pool updates the data matching the filters provided.

In the console, you can initiate an update job by navigating to the Data Pool from which you need to update data, clicking on the "Operations" tab, and then clicking “Update data.”

Data Pool operations menu

Here you can specify the filters of the data to update and set the values to update.

Screenshot 2024-01-04 at 4.25.03 PM.png

Once deleted, you can monitor the progress of the delete operation in the “Operations” tab. Note that this API may take some time to complete, depending on how much data is updated.

Here's an example of how to update data using the API (read the docs for more details):

mutation {
createUpdateDataPoolRecordsJob (
input: {
dataPool: "DPO00000000000000000000000000"
filters: [
{ column: "restaurant_name", operator: "EQUALS", value: "Farolito" },
{ column: "taco_name", operator: "EQUALS", value: "Veggie" }
]
set: [
{ "column": "taco_name", "expression": "'Vegetarian'"}]
}
) {
id
}
}

Remember, updating data is permanent and cannot be undone, so use this feature cautiously.

Notes on updating non-nullable columns:

  • Suppose we have a non-nullable column A and a nullable column B. We execute an update setting A = B + 1. The operation yields null if the job encounters a record where B is null. This will result in an error when attempting to assign it to A, since A cannot be null. Consequently, the job fails, and the remaining records remain unchanged, while the records processed before encountering the null value are updated.
  • If the column being updated has a different data type than that of the update expression, the result will be null. This could cause a similar error as the previous example if the column is non-nullable.

Schema changes

Propel supports non-breaking schema changes, specifically adding columns to existing Data Pools. For more details, please refer to the documentation related to the specific Data Pool you are using.

Key guides

Here are some key guides to help you as you onboard your data to Propel:

Frequently asked questions

How long does it take for my data to be synced into Propel? Is Propel real-time?

Once data gets to Propel via syncs or events, it is available via the API in 2-4 minutes.

In what region is the data stored?

The data is stored in the AWS US East 2 region. We are working on expanding our region coverage. If you are interested in using Propel in a different region, please contact us.

How much data can I bring into Propel?

As much as you need. Propel does not have any limits on how much data you bring. You should think of the data in Propel as the data you need to serve to your applications.

How long does Propel keep the data?

You can keep data in Propel for as long as you need. For instance, if your application requires data for only 90 days, you can use the Delete API to remove data after 90 days.

Can you sync only certain columns from a table into a Data Pool?

Yes. When you create the Data Pool, you can select which columns from the underlying table you want to sync. This is useful if there is PII or any other data that you don’t need in Propel.

What happens if the underlying data source is not available? For example, what happens if Snowflake is down?

Even if the underlying data source is down, Propel will continue to serve data via the API. New data will not sync until the data store comes back online.

When does the Data Pool syncing interval start?

The syncing interval starts when your Data Pool goes LIVE or when syncing is resumed.

APIs for data management

Everything that you can do in the Propel Console, you can also achieve via the API. This enables you to create and manage Data Pools programmatically.

Data Pool APIs

Queries

Mutations

Note on Data Source APIs

The following Data Source APIs are deprecated and are listed here for reference.

Data Source APIs (Deprecated)

Queries

Mutations