Skip to main content

How to set up an Amazon S3 Data Source

This step-by-step guide explains how to connect Propel to your Amazon S3 bucket.

It covers how to allow Propel to connect to your Amazon S3 bucket, how to create the Data Source in the Propel Console, and how to know if it is working.

  1. Configure an Amazon S3 bucket for Propel to access
  2. Create the Data Source in the Propel Console

Requirements

  • You have a Propel account.
  • You have an AWS account.
  • You have an Amazon S3 bucket configured with at least one Parquet file at a specific path. The Parquet file(s) contain data you intend to operationalize for use in Propel.

Step 1: Create an IAM policy and user to allow the Propel Data Source to read from the Amazon S3 bucket

1. Create a new IAM policy

As a first step, you need to create an AWS IAM policy that allows two actions, the bucket contents to be listed and individual files to be retrieved.

Create an IAM policy in your AWS account using the AWS Console or CLI.

Replace <YOUR_BUCKET_NAME> with the name of your bucket.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:List*",
"s3:Get*"
],
"Resource": [
"arn:aws:s3:::<YOUR_BUCKET_NAME>",
"arn:aws:s3:::<YOUR_bucket_NAME>/*"
]
}
]
}
Important

We recommend that you configure the policy to be as specific as possible. For example, if all of your Parquet files are written to a specific location within the bucket, then grant access to that path only. The above example policy grants access to all files in your named Amazon S3 bucket.

With the IAM policy created, you can create a specific AWS IAM user to access your Amazon S3 bucket based on the above policy.

2. Create a new IAM user

Create an IAM user in your AWS account using the AWS Console or CLI.

3. Connect the IAM policy created above directly to the IAM user

Attach the policy you created to the user.

4. Create an access key for your IAM user

Once the user is created, create an new Access Key. Be careful to note the secret access key before closing the dialog, because the secret is not displayed again once the dialog is closed.

You will use the Access Key and the Secret Access Key in Step 2.

Step 2: Create the Data Source in the Propel Console

Now that you have configured your AWS IAM user for access, you can create an Amazon S3 Data Source in the Propel Console.

1. Go to the Data Sources section and click "Create"

On the left hand side menu, click on "Data Sources" then "Create Data Source". Then select the "Amazon S3" Data Source type.

2. Enter a unique name and description

Enter a name and description for your Data Source. The name must be unique within the environment.

3. Enter your Amazon S3 connection details

  1. S3 bucket name. For example: tacosoft-sample-data. Do not include the "s3://".
  2. The Access key and secret access key you created above.

Then click "Next".

Propel Data Source creation: Entering Amazon S3 bucket name and credentials.

4. Enter a table name

This can be anything you want, but it should make sense according to the data in your Parquet files. After you create the S3 Data Source, you can define more tables for different schemas.

5. Enter a path to the location of the Parquet files that constitute your table

For example path/to/your/files/**/*.parquet. If left blank, the entire bucket will be scanned.

Important

Specifying a path is strongly recommended, especially as the number of Parquet files in your S3 bucket increases. By specifying a path, Propel can skip over directories it doesn't need to read, thereby speeding up the syncing process.

Propel Data Source creation: Entering Amazon S3 bucket name and credentials.

6. Define table schema

There are two ways to define the schema for the table.

  • Drag and drop a Parquet file from your filesystem. The Parquet file's schema should match the Parquet files located in your S3 bucket. Propel will scan the file and generate the schema based on the data.
Important

Propel does not upload the file, the file is scanned locally in the browser for discovering the schema only.

  • Alternatively, you can define the schema manually by the specifying the name, type, and nullability for each column you want to import.

    Propel Data Source creation: Entering Amazon S3 bucket name and credentials.

Once the schema is defined, click "Next".

If successful, you'll see a green "Data Source created" notification along with the S3 Data Source details.

7. Verify your setup ✅

To verify your setup, click on the "Table details" button, where you can view the S3 bucket details and table schema for the imported table.

Now that you have your Amazon S3 Data Source created, you can proceed to creating a Data Pool.