Ingest data from S3.
Ingest Parquet files from an Amazon S3 bucket to Propel.
Step-by-step instructions to ingest Parquet files in S3 to Propel.
Amazon S3 Data Pools connect to a specified Amazon S3 bucket and automatically synchronize Parquet files from the bucket into your Data Pool.
Amazon S3 Parquet Data Pools support the following features:
Feature name | Supported | Notes |
---|---|---|
Syncs new records | ✅ | |
Configurable sync interval | ✅ | See the How Propel syncs section below. It can be configured to occur at intervals ranging from every minute to every 24 hours. |
Sync Pausing / Resuming | ✅ | |
Real-time updates | ✅ | See the Real-time updates section. |
Real-time deletes | ❌ | See the Real-time deletes section. |
Batch Delete API | ✅ | See Batch Delete API. |
Batch Update API | ✅ | See Batch Update API. |
API configurable | ✅ | See API docs. |
Terraform configurable | ✅ | See Terraform docs. |
The Amazon S3-based Data Pool syncs Parquet files from your S3 bucket. You specify:
During each sync, Propel retrieves and synchronizes all new files in the specified S3 bucket path
To sync all Parquet files, use the following path:
To sync files in a specific directory (e.g., “sales”):
Use the *.parquet
pattern to sync only Parquet files, excluding other file types.
How records are ingested depends on the table engine you select when you create your Data Pool.
Read our guide on Selecting table engine and sorting key for details.
Propel enables the addition of new columns to Amazon S3 Data Pools through the AddColumnToDataPool
job.
For breaking changes like column deletions or type modifications, recreate the Data Pool.
See our Changing Schemas section for more details.
The table below shows default Parquet to Propel data type mappings. When creating an Amazon S3 Parquet Data Pool, you can customize these mappings.
Parquet Type | Propel Type | Notes |
---|---|---|
BOOLEAN | BOOLEAN | |
INT8 | INT8 | |
UINT8 | INT16 | |
INT16 | INT16 | |
UINT16 | INT32 | |
INT32 | INT32 | |
UINT32 | INT64 | |
INT64 | INT64 | |
UINT64 | INT64 | |
FLOAT | FLOAT | |
DOUBLE | DOUBLE | |
DECIMAL(p ≤ 9, s=0) | INT32 | |
DECIMAL(p ≤ 9, s>0) | FLOAT | |
DECIMAL(p ≤ 18, s=0) | INT64 | |
DECIMAL(p ≤ 18, s>0) | DOUBLE | |
DECIMAL(p ≤ 76, s) | DOUBLE | |
DATE | DATE | |
TIME (ms) | INT32 | |
TIME (µs, ns) | INT64 | |
TIMESTAMP | TIMESTAMP | |
INT96 | TIMESTAMP | |
BINARY | STRING | |
STRING | STRING | |
ENUM | STRING | |
FIXED_LENGTH_BYTE_ARRAY | STRING | |
MAP | JSON | |
LIST | JSON |
Ingest data from S3.
Ingest Parquet files from an Amazon S3 bucket to Propel.
Step-by-step instructions to ingest Parquet files in S3 to Propel.
Amazon S3 Data Pools connect to a specified Amazon S3 bucket and automatically synchronize Parquet files from the bucket into your Data Pool.
Amazon S3 Parquet Data Pools support the following features:
Feature name | Supported | Notes |
---|---|---|
Syncs new records | ✅ | |
Configurable sync interval | ✅ | See the How Propel syncs section below. It can be configured to occur at intervals ranging from every minute to every 24 hours. |
Sync Pausing / Resuming | ✅ | |
Real-time updates | ✅ | See the Real-time updates section. |
Real-time deletes | ❌ | See the Real-time deletes section. |
Batch Delete API | ✅ | See Batch Delete API. |
Batch Update API | ✅ | See Batch Update API. |
API configurable | ✅ | See API docs. |
Terraform configurable | ✅ | See Terraform docs. |
The Amazon S3-based Data Pool syncs Parquet files from your S3 bucket. You specify:
During each sync, Propel retrieves and synchronizes all new files in the specified S3 bucket path
To sync all Parquet files, use the following path:
To sync files in a specific directory (e.g., “sales”):
Use the *.parquet
pattern to sync only Parquet files, excluding other file types.
How records are ingested depends on the table engine you select when you create your Data Pool.
Read our guide on Selecting table engine and sorting key for details.
Propel enables the addition of new columns to Amazon S3 Data Pools through the AddColumnToDataPool
job.
For breaking changes like column deletions or type modifications, recreate the Data Pool.
See our Changing Schemas section for more details.
The table below shows default Parquet to Propel data type mappings. When creating an Amazon S3 Parquet Data Pool, you can customize these mappings.
Parquet Type | Propel Type | Notes |
---|---|---|
BOOLEAN | BOOLEAN | |
INT8 | INT8 | |
UINT8 | INT16 | |
INT16 | INT16 | |
UINT16 | INT32 | |
INT32 | INT32 | |
UINT32 | INT64 | |
INT64 | INT64 | |
UINT64 | INT64 | |
FLOAT | FLOAT | |
DOUBLE | DOUBLE | |
DECIMAL(p ≤ 9, s=0) | INT32 | |
DECIMAL(p ≤ 9, s>0) | FLOAT | |
DECIMAL(p ≤ 18, s=0) | INT64 | |
DECIMAL(p ≤ 18, s>0) | DOUBLE | |
DECIMAL(p ≤ 76, s) | DOUBLE | |
DATE | DATE | |
TIME (ms) | INT32 | |
TIME (µs, ns) | INT64 | |
TIMESTAMP | TIMESTAMP | |
INT96 | TIMESTAMP | |
BINARY | STRING | |
STRING | STRING | |
ENUM | STRING | |
FIXED_LENGTH_BYTE_ARRAY | STRING | |
MAP | JSON | |
LIST | JSON |