Connect Your Data
In this section, you will find all the information you need to connect your data to Propel. This overview guide covers the different types of Data Pools supported, how the data is synced from your data sources to Propel, and the key concept you need to familiarize yourself with.
High-level overview
The key concept to understand how data flows is Data Pools. These are Propel's high-speed tables, optimized for serving data with low latency via the API.
Propel supports various types of data, including data from warehouses and lakes, event-based data, and databases. The diagram below shows an example of how data is collected into Propel Data Pools and then served via the GraphQL API to your apps.
Supported data sources
Propel supports integration with various types of data: event-based data, data warehouses and data lakes, and databases. These can be integrated either via a native connection or through Amazon S3. We offer step-by-step guides for each supported data integration type, whether native or via Amazon S3 Parquet.
Event and streaming sources
Data source | Integration |
---|---|
Webhooks | Native |
Kafka | Native |
AWS Kinesis | Via ELT/ETL Platforms |
Data warehouses and data lakes
Data source | Integration |
---|---|
Snowflake | Native |
Amazon S3 Parquet | Native |
BigQuery | Preview |
Databricks | Via ELT/ETL Platforms |
AWS Redshift | Via ELT/ETL Platforms |
Databases
Data source | Integration |
---|---|
ClickHouse | Native |
PostgreSQL | Coming soon |
MySQL | Via ELT/ETL Platforms |
DynamoDB | Via ELT/ETL Platforms |
MongoDB | Via ELT/ETL Platforms |
ELT / ETL Platforms
Data source | Integration |
---|---|
Fivetran | Native |
Airbyte | Native |
Don't see a data source you need or want access to any preview? Let us know.
Understanding Data Pools
Data Pools are Propel's high-speed data store and cache that is optimized for serving data with low latency (sub-second response times) and high concurrency (for thousands or millions of users). All queries to the Propel APIs are served from Data Pools, not their underlying data source.
Understanding event-based Data Pools
Event-based data sources like the Webhook Data Pool collect and write events into Data Pools. Events are collected and synced to Data Pools every minute. These Data Pools have a very simple schema:
Column | Type | Description |
---|---|---|
_propel_received_at | TIMESTAMP | The timestamp when the event was collected in UTC. |
_propel_payload | JSON | The JSON Payload of the event |
During the setup of a Webhook Data Pool, you can optionally unpack top-level or nested keys from the incoming JSON event into specific columns. See the Webhook Data Pool for more details.
Understanding data warehouse and data lake-based Data Pools
Data warehouses and data lake-based Data Pools, such as Snowflake or Amazon S3 Parquet, synchronize records at a given interval from the source table and write them into Data Pools. You can create multiple Data Pools, one for each table.
Data warehouses and data lake-based Data Pools also offer additional properties that enable you to control their synchronization behavior. These include:
- Scheduled Syncs: A Data Pool's sync interval determines how often Propel checks for new data to synchronize. For near real-time applications, the interval can be as short as 1 minute, while for applications with more relaxed data freshness requirements, it can be set to once a day or anything in between.
- Manually triggered Syncs: Syncs can be triggered on-demand when a Data Pool's underlying data source has changed, or in order to re-sync the Data Pool from scratch.
- Pausing and resuming syncing: Controls whether a Data Pool syncs data or not. When paused, Propel stops synchronizing records to your Data Pool. When resumed, it will start syncing on the configured interval.
Deleting data
Real-time deletes
Most Propel Data Pools support real-time deletions that automatically propagate from the data source. Please refer to the documentation for the specific Data Pool you are using.
Batch deletion
The ability to delete batches of data is crucial for compliance and GDPR. Propel provides a simple way to delete data via the Console or the createDeletionJob
API. When you perform a delete operation on a Data Pool, the data matching the provided filters gets deleted. Keep in mind that deleting data is permanent and cannot be undone, so use this feature with caution.
In the console, you can initiate a delete operation by navigating to the Data Pool from which you need to delete data, clicking on the "Operations" tab, and then clicking “Delete data.”
Here you can specify the filters of the data to delete.
Once deleted, you can monitor the progress of the delete operation in the “Operations” tab. Note that this API may take some time to complete, depending on how much data is updated.
Here's an example of how to delete data using the API (read the docs for more details):
mutation {
createDeletionJob(
input: {
dataPool: "DPO00000000000000000000000000"
filters: [{ column: "taco_name", operator: "EQUALS", value: "Breakfast" }]
}
) {
id
}
}
Remember, deleting data is permanent and cannot be undone, so use this feature cautiously.
Updating data
Real-time updates
Most Propel Data Pools support real-time updates that automatically propagate from the data source. Please refer to the documentation for the specific Data Pool you are using.
Batch updates
Updating batches of data is an important feature to maintain data integrity and to backfill data when there are schema changes. Propel provides a simple way to update data asynchronously using the Console or the createUpdateDataPoolRecordsJob
API. An update data operation on a Data Pool updates the data matching the filters provided.
In the console, you can initiate an update job by navigating to the Data Pool from which you need to update data, clicking on the "Operations" tab, and then clicking “Update data.”
Here you can specify the filters of the data to update and set the values to update.
Once deleted, you can monitor the progress of the delete operation in the “Operations” tab. Note that this API may take some time to complete, depending on how much data is updated.
Here's an example of how to update data using the API (read the docs for more details):
mutation {
createUpdateDataPoolRecordsJob (
input: {
dataPool: "DPO00000000000000000000000000"
filters: [
{ column: "restaurant_name", operator: "EQUALS", value: "Farolito" },
{ column: "taco_name", operator: "EQUALS", value: "Veggie" }
]
set: [
{ "column": "taco_name", "expression": "'Vegetarian'"}]
}
) {
id
}
}
Remember, updating data is permanent and cannot be undone, so use this feature cautiously.
Notes on updating non-nullable columns:
- Suppose we have a non-nullable column A and a nullable column B. We execute an update setting
A = B + 1
. The operation yields null if the job encounters a record where B is null. This will result in an error when attempting to assign it to A, since A cannot be null. Consequently, the job fails, and the remaining records remain unchanged, while the records processed before encountering the null value are updated. - If the column being updated has a different data type than that of the update expression, the result will be null. This could cause a similar error as the previous example if the column is non-nullable.
Schema changes
Propel supports non-breaking schema changes, specifically adding columns to existing Data Pools. For more details, please refer to the documentation related to the specific Data Pool you are using.
Key guides
Here are some key guides to help you as you onboard your data to Propel:
- Selecting a Timestamp for Your Data Pool
- Working with Propel and dbt
- Building multi-tenant applications
Frequently asked questions
How long does it take for my data to be synced into Propel? Is Propel real-time?
Once data gets to Propel via syncs or events, it is available via the API in 2-4 minutes.
In what region is the data stored?
The data is stored in the AWS US East 2 region. We are working on expanding our region coverage. If you are interested in using Propel in a different region, please contact us.
How much data can I bring into Propel?
As much as you need. Propel does not have any limits on how much data you bring. You should think of the data in Propel as the data you need to serve to your applications.
How long does Propel keep the data?
You can keep data in Propel for as long as you need. For instance, if your application requires data for only 90 days, you can use the Delete API to remove data after 90 days.
Can you sync only certain columns from a table into a Data Pool?
Yes. When you create the Data Pool, you can select which columns from the underlying table you want to sync. This is useful if there is PII or any other data that you don’t need in Propel.
What happens if the underlying data source is not available? For example, what happens if Snowflake is down?
Even if the underlying data source is down, Propel will continue to serve data via the API. New data will not sync until the data store comes back online.
When does the Data Pool syncing interval start?
The syncing interval starts when your Data Pool goes LIVE
or when syncing is resumed.
APIs for data management
Everything that you can do in the Propel Console, you can also achieve via the API. This enables you to create and manage Data Pools programmatically.
Data Pool APIs
Queries
Mutations
- Create a Data Pool
- Delete a Data Pool by ID
- Delete a Data Pool by name
- Modify Data Pool
- Disable Data Pool syncing
- Enable Data Pool syncing
- Inspect Data Pool schema
- Reconnect Data Pool
- Retry Data Pool set up by ID
- Retry Data Pool set up by name
- Test Data Pool
- Request delete
Note on Data Source APIs
The following Data Source APIs are deprecated and are listed here for reference.