Skip to main content

Connect Your Data

In this section, you will find all the information you need to connect your data to Propel. This overview guide covers the different types of Data Pools supported, how the data is synced from your data sources to Propel, and the key concept you need to familiarize yourself with.

High-level overviewโ€‹

The key concept to understand how data flows is Data Pools. These are Propel's high-speed tables, optimized for serving data with low latency via the API.

Propel supports various types of data, including data from warehouses and lakes, event-based data, and databases. The diagram below shows an example of how data is collected into Propel Data Pools and then served via the GraphQL API to your apps.

A high level overview of how data is connected to Propel.

Supported data sourcesโ€‹

Propel supports integration with various types of data: event-based data, data warehouses and data lakes, and databases. These can be integrated either via a native connection or through Amazon S3. We offer step-by-step guides for each supported data integration type, whether native or via Amazon S3 Parquet.

Event and streaming sourcesโ€‹

Data sourceIntegration
WebhooksNative
KafkaNative
AWS KinesisVia ELT/ETL Platforms

Data warehouses and data lakesโ€‹

Data sourceIntegration
SnowflakeNative
Amazon S3 ParquetNative
BigQueryPreview
DatabricksVia ELT/ETL Platforms
AWS RedshiftVia ELT/ETL Platforms

Databasesโ€‹

Data sourceIntegration
ClickHouseNative
PostgreSQLComing soon
MySQLVia ELT/ETL Platforms
DynamoDBVia ELT/ETL Platforms
MongoDBVia ELT/ETL Platforms

ELT / ETL Platformsโ€‹

Data sourceIntegration
FivetranNative
AirbyteNative

Don't see a data source you need or want access to any preview? Let us know.

Understanding Data Poolsโ€‹

Data Pools are Propel's high-speed data store and cache that is optimized for serving data with low latency (sub-second response times) and high concurrency (for thousands or millions of users). All queries to the Propel APIs are served from Data Pools, not their underlying data source.

A screenshot of a Data Pool in the Propel Console.

Understanding event-based Data Poolsโ€‹

Event-based data sources like the Webhook Data Pool collect and write events into Data Pools. Events are collected and synced to Data Pools every minute. These Data Pools have a very simple schema:

ColumnTypeDescription
_propel_received_atTIMESTAMPThe timestamp when the event was collected in UTC.
_propel_payloadJSONThe JSON Payload of the event

During the setup of a Webhook Data Pool, you can optionally unpack top-level or nested keys from the incoming JSON event into specific columns. See the Webhook Data Pool for more details.

Understanding data warehouse and data lake-based Data Poolsโ€‹

Data warehouses and data lake-based Data Pools, such as Snowflake or Amazon S3 Parquet, synchronize records at a given interval from the source table and write them into Data Pools. You can create multiple Data Pools, one for each table.

Data warehouses and data lake-based Data Pools also offer additional properties that enable you to control their synchronization behavior. These include:

  • Scheduled Syncs: A Data Pool's sync interval determines how often Propel checks for new data to synchronize. For near real-time applications, the interval can be as short as 1 minute, while for applications with more relaxed data freshness requirements, it can be set to once a day or anything in between.
  • Manually triggered Syncs: Syncs can be triggered on-demand when a Data Pool's underlying data source has changed, or in order to re-sync the Data Pool from scratch.
  • Pausing and resuming syncing: Controls whether a Data Pool syncs data or not. When paused, Propel stops synchronizing records to your Data Pool. When resumed, it will start syncing on the configured interval.

Schema changesโ€‹

Propel supports non-breaking schema changes, specifically adding columns to existing Data Pools. For more details, please refer to the documentation related to the specific Data Pool you are using.

Key guidesโ€‹

Here are some key guides to help you as you onboard your data to Propel:

Frequently asked questionsโ€‹

How long does it take for my data to be synced into Propel? Is Propel real-time?

Once data gets to Propel via syncs or events, it is available via the API in 2-4 minutes.

In what region is the data stored?

The data is stored in the AWS US East 2 region. We are working on expanding our region coverage. If you are interested in using Propel in a different region, pleaseย contact us.

How much data can I bring into Propel?

As much as you need. Propel does not have any limits on how much data you bring. You should think of the data in Propel as the data you need to serve to your applications.

How long does Propel keep the data?

You can keep data in Propel for as long as you need. For instance, if your application requires data for only 90 days, you can use the Delete API to remove data after 90 days.

Can you sync only certain columns from a table into a Data Pool?

Yes. When you create the Data Pool, you can select which columns from the underlying table you want to sync. This is useful if there is PII or any other data that you donโ€™t need in Propel.

What happens if the underlying data source is not available? For example, what happens if Snowflake is down?

Even if the underlying data source is down, Propel will continue to serve data via the API. New data will not sync until the data store comes back online.

When does the Data Pool syncing interval start?

The syncing interval starts when your Data Pool goes LIVE or when syncing is resumed.

APIs for data managementโ€‹

Everything that you can do in the Propel Console, you can also achieve via the API. This enables you to create and manage Data Pools programmatically.

Data Pool APIsโ€‹

Queriesโ€‹

Mutationsโ€‹

Note on Data Source APIsโ€‹

The following Data Source APIs are deprecated and are listed here for reference.

Data Source APIs (Deprecated)โ€‹

Queriesโ€‹

Mutationsโ€‹