Amazon S3
This Data Source enables you to use Parquet files stored in your Amazon S3 bucket with Propel.
To setup an S3 Data Source, you provide AWS credentials for connecting to an S3 bucket where you store Parquet files of table data for use with Propel.
Each Amazon S3 Data Source requires the name of the table and a path variable. The path variable allows you to specify where in your bucket the files referencing the table reside.
An S3 Data Source can contain one or many tables. Propel determines which Parquet files belong to which tables using the tables' S3 paths. For example, your S3 bucket might contain Parquet files representing sales data and signup data under two different paths:
s3://your-bucket
├── sales
│ ├── metadata.txt
│ ├── sales_1.parquet
│ ├── sales_2.parquet
│ └── sales_3.parquet
└── signups
├── metadata.txt
├── signups_1.parquet
├── signups_2.parquet
└── signups_3.parquet
A single S3 Data Source can contain tables for each by specifying two different S3 paths:
sales/**/*.parquet
signups/**/*.parquet
Notice that the S3 paths above only match Parquet files using the *.parquet
wildcard pattern. This is important, because we don't want to attempt to sync non-Parquet files, like metadata.txt
.
Setup guide​
Follow our step-by-step Amazon S3 setup guide to connect your Amazon S3 bucket to Propel.
Supported Sync behaviors​
- Approximately every minute, Propel checks the Amazon S3 bucket for new files at the tables' specified paths.
- Append-only: Syncs data from new files only. Updated or deleted Parquet files are ignored. Append is suitable for immutable event data.
Usage​
Once you create your Amazon S3 Data Source, you can then create Data Pools from your table:
- Click on "Data Pools" on the left-hand side menu.
- Click on the "New Data Pool" button.
- Select a name and description for your Data Pool.
- Select your Amazon S3 Data Source.
- Click the "Create" button.
After you create your Data Pool, you can define Metrics with your Data Pool's data.
Propel's guide to Amazon S3 as a Data Source​
API reference documentation​
Below is the relevant API documentation for the S3 Data Source.
Objects​
Enums​
Inputs​
- S3 Connection Settings Input
- Partial S3 Connection Settings Input
- Create Amazon S3 Data Source Input
- Modify Amazon S3 Data Source Input
Queries​
Mutations​
- Create Amazon S3 Data Source
- Modify Amazon S3 Data Source
- Delete Data Source by ID
- Delete Data Source by unique name
Limits​
No limits at this point.