Ingest data from Amazon Data Firehose into Propel.
Step-by-step instructions to ingest an Amazon Data Firehose stream to Propel.
Amazon Data Firehose Data Pools in Propel provide you with an HTTP endpoint and X-Amz-Firehose-Access-Key
to configure as a destination in Amazon Data Firehose.
Amazon Data Firehose ingestion supports the following features:
Feature name | Supported | Notes |
---|---|---|
Event collection | ✅ | Collects individual events and batches of events in JSON format. |
Real-time updates | ✅ | See the Real-time updates section. |
Real-time deletes | ✅ | See the Real-time deletes section. |
Batch Delete API | ✅ | See Batch Delete API. |
Batch Update API | ✅ | See Batch Update API. |
Bulk insert | ✅ | Up to 500 events per HTTP request. |
API configurable | ✅ | See API docs. |
Terraform configurable | ✅ | See Terraform docs. |
The Amazon Data Firehose Data Pool works by receiving events from an Amazon Data Firehose via an HTTP endpoint destination.
Propel handles the special encoding, data format, and basic authentication required for receiving Amazon Data Firehose events.
By default, the Amazon Data Firehose Data Pool includes two columns:
Column | Type | Description |
---|---|---|
_propel_received_at | TIMESTAMP | The timestamp when the event was collected in UTC. |
_propel_payload | JSON | The JSON payload of the event. |
When creating an Amazon Data Firehose Data Pool, you can flatten top-level or nested JSON keys into specific columns.
See our step-by-step setup guide.
Use the disable_partial_success=true
query parameter to make sure that if any event in a batch fails validation, the entire request will fail.
For example:
The Amazon Data Firehose Data Pool is designed to handle semi-structured, schema-less JSON data. This flexibility allows you to add new properties to your payload as needed. The entire payload is always stored in the _propel_payload
column.
However, Propel enforces the schema for required fields. If you stop providing data for a required field that was previously unpacked into its own column, Propel will return an error.
Go to the Schema tab
Go to the Data Pool and click the “Schema” tab.
Click the “Add Column” button to define the new column.
Add column
Specify the JSON property to extract, the column name, and the type and click “Add column”.
Track progress
After clicking adding the column, an asynchronous operation will begin to add the column to the Data Pool. You can track the progress in the “Operations” tab.
Note that adding a column does not backfill existing rows. To backfill, run a batch update operation.
Column deletions, modifications, and data type changes are not supported as they are breaking changes to the schema. If you need to change the schema, you can create a new Data Pool.
The table below shows the default mappings from JSON types to Propel types. You can change these mappings when creating an Amazon Data Firehose Data Pool.
JSON Type | Propel Type |
---|---|
String | STRING |
Number | DOUBLE |
Object | JSON |
Array | JSON |
Boolean | BOOLEAN |
Null | JSON |
_propel_received_at
column.Once your data is in a Webhook Data Pool, you can use Materialized Views to:
How long does it take for an event to be available via SQL or the API?
It will depend on the buffer you set and the internal buffers Propel uses to optimize performance. It can range from 10 seconds to 2 minutes depending on the buffer size.
Ingest data from Amazon Data Firehose into Propel.
Step-by-step instructions to ingest an Amazon Data Firehose stream to Propel.
Amazon Data Firehose Data Pools in Propel provide you with an HTTP endpoint and X-Amz-Firehose-Access-Key
to configure as a destination in Amazon Data Firehose.
Amazon Data Firehose ingestion supports the following features:
Feature name | Supported | Notes |
---|---|---|
Event collection | ✅ | Collects individual events and batches of events in JSON format. |
Real-time updates | ✅ | See the Real-time updates section. |
Real-time deletes | ✅ | See the Real-time deletes section. |
Batch Delete API | ✅ | See Batch Delete API. |
Batch Update API | ✅ | See Batch Update API. |
Bulk insert | ✅ | Up to 500 events per HTTP request. |
API configurable | ✅ | See API docs. |
Terraform configurable | ✅ | See Terraform docs. |
The Amazon Data Firehose Data Pool works by receiving events from an Amazon Data Firehose via an HTTP endpoint destination.
Propel handles the special encoding, data format, and basic authentication required for receiving Amazon Data Firehose events.
By default, the Amazon Data Firehose Data Pool includes two columns:
Column | Type | Description |
---|---|---|
_propel_received_at | TIMESTAMP | The timestamp when the event was collected in UTC. |
_propel_payload | JSON | The JSON payload of the event. |
When creating an Amazon Data Firehose Data Pool, you can flatten top-level or nested JSON keys into specific columns.
See our step-by-step setup guide.
Use the disable_partial_success=true
query parameter to make sure that if any event in a batch fails validation, the entire request will fail.
For example:
The Amazon Data Firehose Data Pool is designed to handle semi-structured, schema-less JSON data. This flexibility allows you to add new properties to your payload as needed. The entire payload is always stored in the _propel_payload
column.
However, Propel enforces the schema for required fields. If you stop providing data for a required field that was previously unpacked into its own column, Propel will return an error.
Go to the Schema tab
Go to the Data Pool and click the “Schema” tab.
Click the “Add Column” button to define the new column.
Add column
Specify the JSON property to extract, the column name, and the type and click “Add column”.
Track progress
After clicking adding the column, an asynchronous operation will begin to add the column to the Data Pool. You can track the progress in the “Operations” tab.
Note that adding a column does not backfill existing rows. To backfill, run a batch update operation.
Column deletions, modifications, and data type changes are not supported as they are breaking changes to the schema. If you need to change the schema, you can create a new Data Pool.
The table below shows the default mappings from JSON types to Propel types. You can change these mappings when creating an Amazon Data Firehose Data Pool.
JSON Type | Propel Type |
---|---|
String | STRING |
Number | DOUBLE |
Object | JSON |
Array | JSON |
Boolean | BOOLEAN |
Null | JSON |
_propel_received_at
column.Once your data is in a Webhook Data Pool, you can use Materialized Views to:
How long does it take for an event to be available via SQL or the API?
It will depend on the buffer you set and the internal buffers Propel uses to optimize performance. It can range from 10 seconds to 2 minutes depending on the buffer size.