This guide covers how to:
  1. Configure the DynamoDB Stream
  2. Create a Propel DynamoDB Data Pool
  3. Create the Data Firehose stream to deliver events to Propel
  4. Transform DynamoDB events in Propel

Requirements


Step 1: Configure the DynamoDB Stream

1

Go to DynamoDB dashboard

Go to your DynamoDB dashboard in the AWS Console and click “Tables”.When you are in the tables section, click on the table that you need to replicate to Propel.
Screenshot 2024-05-08 at 1.20.37 PM.png
2

Configure DynamoDB Stream

Go to “Exports and streams” and then click on “Turn on” the “Amazon Kinesis stream details”.
Screenshot 2024-05-08 at 1.21.12 PM.png
Choose an existing Kinesis data stream or create a new one.
Screenshot 2024-05-08 at 1.26.29 PM.png
3

Create or verify stream

If you need to create a new one, click on “Create New”.
Screenshot 2024-05-08 at 1.25.14 PM.png
Once turned on, you will see your stream with an “On” status.
Screenshot 2024-05-08 at 1.50.27 PM.png
And you are all set for this step. You have all your DynamoDB changes flowing into a Kinesis data stream.

Step 2: Create a Propel DynamoDB Data Pool

Next, we’ll need to create a DynamoDB Data Pool in Propel. This is the table where the data will be stored.
1

Create a Data Pool

Log into the Propel Console, click on “Data Pools” on the left menu, and then click on “Create Data Pool.”Then, select “DynamoDB”.
Screenshot showing the DynamoDB Data Pool selection.
2

Define schema

In the “Define schema” section, leave the defaults as we want to land the data in its original DynamoDB change data format and transform it later using Materialized Views if necessary.
Screenshot showing the schema definition page.
3

Configure authentication

Enter a user and password for the Basic Authentication required by Amazon Data Firehose.
Screenshot showing the authentication configuration page.
4

Configure the table settings

Select whether your data is “Append-only” or “Mutable data”. For collecting change data capture events, select “Append-only data”.To learn more, read out guide on Selecting table engine and sorting key.
Selecting table settings for a DynamoDB Data Pool
Answer the questions in the wizard to complete the setup.
Completing the table settings wizard for a DynamoDB Data Pool
Confirm your table settings and click “Continue”.
Confirming table settings for a DynamoDB Data Pool
5

Name your Data Pool

Name your Data Pool and give it a description
Screenshot showing the Data Pool naming page.
6

Get the Data Pool's HTTP endpoint

You’re done! You’ve created a DynamoDB Data Pool. Note the Amazon Data Firehose HTTP endpoint and X-Amz-Firehose-Access-Key key, you’ll need this to configure the Amazon Data Firehose.
Screenshot showing the Data Pool URL.

Step 3: Create the Data Firehose stream to deliver events to Propel

Once you have the DynamoDB Data Pool created, we’ll need to create an Amazon Data Firehose that takes the events from the Kinesis data stream and sends them to Propel via HTTP POST request.
1

Go to Amazon Data Firehose console

Navigate to the “Amazon Data Firehose” console and click on “Create Firehose stream”.
Screenshot 2024-05-08 at 1.50.27 PM.png
2

Select source and destination

Select “Kinesis Data streams” as the source and “HTTP Endpoint” as the destination.
Screenshot showing the source and destination selection.
3

Configure the source

In the “Source settings,” click “Browse.”
Screenshot showing the source settings.
Select the Kinesis data stream from your DynamoDB table.
Screenshot showing the Kinesis data stream selection.
4

Name your stream

Give your stream a name for future reference.
Screenshot showing the stream name.
5

Configure the destination

We’ll need the information about the Propel Data Pool we created earlier.
Screenshot showing the Propel Data Pool URL.
  1. Enter the destination details:
  • “HTTP endpoint name” - Name your endpoint (e.g. “Propel Orders”)
  • “HTTP endpoint URL” - Enter the DynamoDB Data Pool HTTP endpoint URL.
  • “Access key” - Enter the X-Amz-Firehose-Access-Key key that you’ll find under the Data Pool’s HTTP endpoint in the Propel Console.
  • “Retry duration” - Set to “7200”
  • “Content encoding” - Set to “Not enabled
Screenshot showing the Propel Data Pool URL.
  • “Buffer hints” - Set Buffer size to 1MiB
Screenshot showing the destination settings.
6

Configure backup S3 bucket

Set up an S3 bucket for failed deliveries under “Backup settings”.
Screenshot showing the backup S3 bucket configuration.
7

Create the stream

Click “Create Firehose stream” to complete the setup.
1

Monitor for delivery failures

If everything is set up correctly, you should not see any deliveries to the S3 bucket.
Screenshot showing no deliveries to the S3 bucket.
2

Preview data

Once items are inserted into your DynamoDB table, they stream directly to Propel. By going to your Data Pool and clicking “Preview Data”, you’ll be able to see the records as they land.
Screenshot showing the data preview in a DynamoDB Data Pool.

Step 4: Transform DynamoDB events in Propel

Now that we are collecting DynamoDB change data capture events in Propel, we can transform them into an entity table. The following is an example SQL query that flattens the DynamoDB event into an entity table.
SELECT
  -- The unique identifier for the entity
  keys.order_id.S AS order_id, -- DynamoDB Partition key
  parseDateTimeBestEffort(keys.timestamp.S) AS timestamp, -- DynamoDB Sorting key

	-- Save the deleted flag
  if(event_name = 'REMOVE', true, false) AS deleted,

  --All the columns you want to flatten, for example

  toString(new_image.taqueria.S) AS taqueria,
  toInt32OrNull(new_image.quantity.N) AS quantity,
  toFloat64OrNull(new_image.total_price.S) AS total_price,
  new_image.orderItems AS order_items,

	--Always save the new and old image
  new_image,
  old_image
FROM my_dynamo_orders_change_events
Replace my_dynamo_orders_change_events with your table’s name and customize the columns to your needs. You can test the SQL Query in the “SQL Console” to make sure it works for your events and make any necessary changes. Once you’ve tested the SQL query, you can create a Materialized View.
  1. Go to the “Materialized Views” section in the Console and click “Create Materialized View”.
  2. Enter the SQL above in the SQL to define the Materialized View.
  3. Select “New Data Pool”.
  4. Give your new Data Pool a name, for example, dynamodb_<TABLE_NAME>. Replace <TABLE_NAME> with the name of your table.
  5. Select “Mutable records”, as the records are updatable.
  6. Answer the following questions to set the Table settings.
    1. “Which columns uniquely identify your records?”
    Select the columns that uniquely identify a record. In DynamoDB, the combination of the PartitionKey and the SortingKey uniquely identifies a record. In the example above, order_id uniquely identifies a record.
    1. “Will you query your data by time?”
    Select the timestamp column.
    1. “Are there columns you always filter by?”
    For small tables (under 100M records), leave empty. For very large tables, see the Handling updates in large tables guide.
    1. “Is there an updated_at or version column?”
    Leave empty. New records that arrive will replace old ones.
  7. Confirm the table settings. You should see:
    1. Table engine: REPLACING_MERGE_TREE
    2. Sorting key: order_id
    3. Default timestamp: timestamp
  8. Select “Backfill” to ensure that existing records get copied over.
  9. Give your Materialized View a name and description, for example, flattened_dynamodb_<TABLE_NAME>_materialized_view .
Now that you have created a Data Pool for your table, you can go to the “Data Pools” section and click on your Data Pool. You can click on “Preview Data” to see your transformed data.