Skip to main content

Amazon S3

This Data Source enables you to use Parquet files stored in your Amazon S3 bucket with Propel.

To setup an S3 Data Source, you provide AWS credentials for connecting to an S3 bucket where you store Parquet files of table data for use with Propel.

Each Amazon S3 Data Source requires the name of the table and a path variable. The path variable allows you to specify where in your bucket the files referencing the table reside.

An S3 Data Source can contain one or many tables. Propel determines which Parquet files belong to which tables using the tables' S3 paths. For example, your S3 bucket might contain Parquet files representing sales data and signup data under two different paths:

s3://your-bucket
├── sales
│ ├── metadata.txt
│ ├── sales_1.parquet
│ ├── sales_2.parquet
│ └── sales_3.parquet
└── signups
├── metadata.txt
├── signups_1.parquet
├── signups_2.parquet
└── signups_3.parquet

A single S3 Data Source can contain tables for each by specifying two different S3 paths:

sales/**/*.parquet
signups/**/*.parquet
tip

Notice that the S3 paths above only match Parquet files using the *.parquet wildcard pattern. This is important, because we don't want to attempt to sync non-Parquet files, like metadata.txt.

Setup guide

Follow our step-by-step Amazon S3 setup guide to connect your Amazon S3 bucket to Propel.

Supported Sync behaviors

  • Approximately every minute, Propel checks the Amazon S3 bucket for new files at the tables' specified paths.
  • Append-only: Syncs data from new files only. Updated or deleted Parquet files are ignored. Append is suitable for immutable event data.

Usage

Once you create your Amazon S3 Data Source, you can then create Data Pools from your table:

  1. Click on "Data Pools" on the left-hand side menu.
  2. Click on the "New Data Pool" button.
  3. Select a name and description for your Data Pool.
  4. Select your Amazon S3 Data Source.
  5. Click the "Create" button.

An animated screen capture of of how to use an Amazon S3 Data Source to create a Data Pool.

After you create your Data Pool, you can define Metrics with your Data Pool's data.

Propel's guide to Amazon S3 as a Data Source

API reference documentation

Below is the relevant API documentation for the S3 Data Source.

Objects

Enums

Inputs

Queries

Mutations

Limits

No limits at this point.