Data Engineering

How to get your data from an AWS RDS database into Snowflake

In this article, you'll learn how to move data from Amazon RDS into Snowflake so that it can be used for analytics.

Photo by kkssr on shutterstock

3D Rendering of abstract fast moving binary computer data.

Amazon Relational Database Service (Amazon RDS) is a popular option for storing data, since you can use it to easily set up, operate, and scale multiple databases in the cloud. When you need to improve the readiness of the data and perform analytics, though, you'll need to move it to a data warehouse for processing.

One solution is to use Snowflake, a software-as-a-service (SaaS) platform that can quickly and flexibly process all types of data, from unstructured to streaming.

Why a data warehouse?

A data warehouse creates a central repository of information that can be used in analyzing your data. This is helpful for multiple use cases, including the following:

  • Customer sales data: Analyzing customer sales data helps you to get insights into your customers, such as their average purchase amount, their lifetime value, or their level of satisfaction, so that you can improve your sales results.
  • Retail transactions: Retail transaction analysis can enable you to project revenue and calculate profits, allowing you to be more selective in your marketing efforts. You can then maximize the returns on your marketing investment.
  • Log data: You can analyze batch or streaming log data generated by your applications, network traffic, or access logs. This is helpful if, for example, you must meet regulatory requirements to keep logs. The analysis can help you identify your most active users, the most common actions performed, and any possible threats, giving you visibility into your entire ecosystem.

How Does Snowflake Work?

Snowflake is a data platform provided as a cloud service. This means that you have no hardware or software to manage, because Snowflake handles maintenance, management, and upgrades. It also meets regulatory compliance to ensure data integrity, security, and governance. 

Why Use Snowflake?

In addition to its data storage and analysis capabilities, Snowflake offers other benefits:

  • It's cloud agnostic, with unlimited, seamless scalability across AWS, GCP, or Microsoft Azure.
  • It provides complete ANSI SQL language support.
  • It's not built on any existing database technology or big data software platforms such as Hadoop.

To see this in action, you're going to move data from an RDS account into Snowflake. 

Prerequisites

You'll need the following for this tutorial:

A PostgreSQL database with the Northwind sample database

Setting Up an Amazon RDS Database

Log in to the Amazon RDS console and click "Create database":

RDS Console
RDS Console

Select the "Standard create" option as the database creation method and "PostgreSQL" as the engine type. Then, select the latest version, which is 14.4-R1 at the time of this article:

Create database
Create database

Under "Templates", select "Free tier" if you have that option. If not, select "Dev/Test".

Select free tier
Select free tier

In "Settings", put 'rds-to-snowflake' as the DB instance identifier, keep 'postgres' as the default master username, and set 'RDS-sn0wflake' as the master password. With configuration complete, it will look like this:

Postgres configuration
Postgres configuration

Keep everything else as default except "Public access" under the connectivity configurations. Make sure you select "Yes"; otherwise, you'll have to set up Amazon EC2 instances to connect to the database:

Enable public access
Enable public access

Scroll down to "Additional configuration". You need to specify an initial database name under "Database options" to create the database. Since you'll be using the Northwind sample database, name this database 'northwind':

Set database name
Set database name

Check your estimated monthly costs to ensure that running this service is within your budget, then click "Create database":

Check estimated monthly costs
Check estimated monthly costs

You'll get a confirmation message:

Creating database
Creating database

After the database is created, make sure its status is Available so you can connect to the instance:

Database successfully created
Database successfully created

Click the button in the top right corner to view the connection details. Confirm the master username and master password that you set up, and keep the password and endpoint URL in a safe place:

Connection details
Connection details

Connecting and Loading Data

With the database set up, you need to connect to it and load your sample data. You'll be using pgAdmin, which is a web-based graphical user interface (GUI) tool that is used to interact with Postgres database sessions locally and with remote servers. Install it and then open it:

Opening pgAdmin
Opening pgAdmin

Once it opens up, right-click "Servers" and then select "Register > Server":

Register server
Register server

Add the name of the server on the "General" tab. This tutorial uses 'RDS-Snowflake':

Set up server name
Set up server name

On the "Connection" tab, for the host name/address, put in the endpoint URL that you got when you set up the RDS database. Make sure to put in the correct port and username, and add the password. This configuration looks as follows:

Set up server hostname
Set up server hostname

Click "Save". This will save the server details and connect to your RDS Postgres instance. You should now have 'RDS-Snowflake' under Servers:

Connection successful
Connection successful

Expand the 'RDS-Snowflake' server on the left and then expand the list of databases. You should see your 'northwind' database:

Northwind database
Northwind database

Open the query tool and run the following SQL statement to check if you have any tables in the database:

sql
select count(*)from information_schema.tableswhere table_schema = 'public';

It should return a count of 0:

Number of tables verification
Number of tables verification

To load the database with your sample data, copy the Northwind database SQL and paste it into the query window:

Loading Northwind sample data
Loading Northwind sample data

Run it. If all goes well, you should get a "query returned successfully" message and there should now be eight tables available. Refresh the database if the tables aren't showing.

Query returned successfully
Query returned successfully

Run the following query to select all the employees and verify that you have data in the database:

sql
select * from employees

You should get a response with ten employee details:

Select all employees
Select all employees

You now have an RDS database with sample data. Next, you'll move it into Snowflake.

Exporting Data from RDS

You can export the data either manually or by using an extract, transform, load (ETL) tool like Airbyte, Fivetran, or Hevo Data. The following are the steps for both methods.

Manual Export

This section will walk you through steps to manually export your data:

  • Creating an S3 bucket to store the database snapshot.
  • Taking a snapshot of the database.
  • Granting Amazon RDS access to your S3 bucket.
  • Exporting the snapshot.
  • Loading the data.

Creating the Bucket and Taking a Snapshot

To create the S3 bucket, go to the Amazon S3 console and click **Create bucket**. Name your bucket 'rds-to-snowflake' and choose the AWS region that has your RDS instance. Keep all the other options as default and click "Create bucket" at the bottom to create the bucket:

Create S3 bucket
Create S3 bucket

To take a snapshot, you can use the AWS CLI or the RDS API, but this tutorial uses the console. Log in to the RDS console and select your 'rds-to-snowflake' instance, then click "Actions > Take snapshot":

Take snapshot
Take snapshot

Name your snapshot 'northwind-default-data' and click "Take snapshot":

Set snapshot name
Set snapshot name

When the snapshot is created, the status will be Available and it will be listed under "RDS > Snapshots":

Snapshot
Snapshot

Allowing Access to S3 Buckets

Next, you'll need to create an IAM role and attach a policy that will allow Amazon RDS to access your Amazon S3 buckets. Go to the IAM console, select "Policies", then click "Create policy":

Create policy
Create policy

Select the "JSON" tab and paste into it the following policy:

{    
	"Version": "2012-10-17",    
	"Statement": [        
		{            
			"Sid": "ExportPolicy",            
			"Effect": "Allow",            
			"Action": [               
				"s3:PutObject*",               
				"s3:ListBucket",               
				"s3:GetObject*",               
				"s3:DeleteObject*",                
				"s3:GetBucketLocation"            
			],            
			"Resource": [               
				"arn:aws:s3:::rds-to-snowflake",                
				"arn:aws:s3:::rds-to-snowflake/*"            
			]        
		}    
	]
}

If you changed it, make sure to replace 'rds-to-snowflake' in the 'Resource' list with the name of your S3 bucket.

Click "Next: Tags" to go to the tags page. You're not going to add any tags, so click "Next: Review" to review your policy. Name it `ExportPolicy` and give it a description, then click "Create policy":

Review policy
Review policy

You should get a confirmation that the policy was created successfully:

Policy created successfully
Policy created successfully

Now, you need to create a role and attach this policy. Go to the IAM console, select "Roles", and click "Create role".

Select "Custom trust policy":

Create trusted entity role
Create trusted entity role

In the space underneath, paste in the following JSON:

{	
"Version": "2012-10-17",	
"Statement": [		
    {			
       "Effect": "Allow",			
       "Principal": {				
            "Service": "export.rds.amazonaws.com"			
        },			
        "Action": "sts:AssumeRole"		
    }	
  ]
}

Click "Next" to go to the permissions. Select the `ExportPolicy` permissions you created to add them to the role, then click "Next" to review your role:

Export policy permissions
Export policy permissions

Name the role 'RDS-S3-Export-Role' and give it a description, then click "Create role":

Role name
Role name

You should get a confirmation that the role was created successfully:

Role created successfully
Role created successfully

With your role ready, you can export the snapshot you created to an S3 bucket. You'll use the 'rds-to-snowflake' bucket you specified in the IAM policy.

Exporting the Snapshot

Go to "RDS > Snapshots" and select the snapshot that you want to export. Then, click "Actions > Export to Amazon S3":

Export to Amazon S3
Export to Amazon S3

When the "Export to Amazon S3" dialog shows, put in the following configurations, including 'northwind-export' as the export identifier and the appropriate S3 bucket and IAM role:

Export configurations
Export configurations

Under "Encryption", select a KMS key or enter a key ARN to use for encrypting the snapshot. You can't export the snapshot without a KMS key, so if you don't have one, go to the AWS Key Management Service console in a different browser tab and click "Create a key":

AWS KMS Console
AWS KMS Console

Then, keep the default options selected on the "Configure key" step:

Configure key
Configure key

Click "Next" to go to the "Add labels" step. Here, you can give the key an alias, which is the display name, and put in an optional description and tags:

AWS KMS labels
AWS KMS labels

Click "Next" to define the key administrative permissions, or IAM users who can administer the key. Then, click "Next" to define key usage permissions, meaning IAM users and roles who can use this key. Make sure the user you are using to make the snapshot export is selected.

Click "Next" to go to the review page. If everything looks good, click "Finish" to create the key. You'll get confirmation that the key was created successfully:

KMS key created
KMS key created

Click "View key" to see the key details, then copy the ARN:

View key details
View key details

Back in the RDS console, paste the ARN, then click "Export to Amazon S3":

KMS key
KMS key

This will start the export. When it's done, the status will change to Complete:

Complete export
Complete export

In the S3 console, if you go into the bucket you created, then into the name of the export, you'll see a folder called 'northwind' that contains all the tables in the Northwind database:

All tables exported
All tables exported

This will correspond to all the tables you saw in pgAdmin.

Loading the Data

Log in to Snowflake using the SnowSQL CLI. If you don't have it installed, go through the process in the documentation to install it. Pass in your organization ID, region, and username, then enter your password.

The login syntax is as follows:

snowsql -a OrganizationID.REGION -u USERNAME

In this tutorial, the organization ID is 'YXB51689', the region is 'us-east-1', and the username is 'kletters':

log in to Snowflake
Log in to Snowflake

Create the database where you want to load the data. This will automatically select this database for subsequent steps. If you already have a database set up, you can skip this step. Use the following command:

create or replace DATABASE manual_northwind;

You'll get a confirmation that the database was created successfully:

manual_northwind db created successfully
manual_northwind db created successfully

If you already have a database, you have to set it up for use in your session. Run the following command and replace 'name_of_database' with your database:

use database name_of_database;

The snapshot objects in S3 are in the Parquet file format, so you need to create a Snowflake file format object that specifies the Parquet file format. To do that, run the following command in SnowSQL:

create or replace file format parquet_format type = 'parquet';

You'll get a confirmation that the file format object was created successfully:

Confirmation of file format creation
Confirmation of file format creation

Create a target table in Snowflake to load the data from the snapshot. You'll have to do this for every table, but for demonstration purposes, you're just going to load the employees' data. Run the following command to create the target table:

create or replace table employees(employees_raw variant);

You'll get a confirmation that the 'employees' table was successfully created:

Table employees successfully created
Table employees successfully created

You'll now load the data into the table using the SnowSQL 'copy' command.

The data is in the S3 folder 'northwind/public.employees', and you need to get the full path name by navigating there in the S3 console, selecting the .parquet file, and copying the S3 URI:

Employees S3 file link
Employees S3 file link

Because you're reading directly from your S3 bucket, you also have to provide the credentials. In the command that you'll run, the 'aws_key_id' and the 'aws_secret_key' are the credentials you used to set up your AWS CLI. More information about setting up the AWS CLI can be found here.

Run the following command to load the employees' data from your snapshot in S3, making sure to substitute in the appropriate values from above:

copy into employees
from 
's3://rds-to-snowflake/northwind-export/northwind/public.employees/part-00000-3fe695ac-296b-4386-9e7e-630ca61520a0-c000.gz.parquet' credentials=(aws_key_id='YOUR AWS KEY ID' aws_secret_key='YOUR AWS SECRET KEY')
file_format = (format_name = parquet_format);

After running the command, you'll get a message that the data was successfully loaded into the employees table:

Copy into the employees table
Copy into the employees table

You can see under the columns 'rows_parsed' and 'rows_loaded' that you have the value of ten, which corresponds to the ten employee rows that you had in your database.

To verify that data was loaded, run the following query:

select * from employees;

Or, simply go to the Snowflake GUI, open the 'MANUAL_NORTHWIND' database, and expand its list of tables to see the 'EMPLOYEES' table that you created. If you select the "Data Preview" tab, you should see the ten employee records in the 'EMPLOYEES_RAW' column:

Employee load verification
Employee load verification

Automated Export

A tool like Airbyte, Fivetran, or Hevo Data allows you to automatically transfer your data from RDS into Snowflake in real time. All are official Snowflake partners, so any is a good choice here.

For example, this is what the process would look like with Hevo Data.

Log in to Hevo Data and then create a pipeline. You'll need to configure a source of your data.

If this is your first time using Hevo Data, you need to select "Source", choose "Amazon RDS PostgreSQL", then proceed with the following steps. Name your pipeline 'Northwind Tables PostgreSQL Source', since you'll just be using this pipeline to get data from a few tables. Fill in the hostname, database name, database user, and database password details. On "Select an Ingestion Mode" select "Table". The page should look something like this:

Configure source
Configure source

Click "TEST & CONTINUE". This will move to the next page if the connection was successful. Next, select the objects you want to replicate, in this case 'customers`' 'employees', and 'products', then click "CONTINUE":

Selected objects
Selected objects

Leave "Full Load" selected for the three tables and click "CONTINUE":

Query loads
Query loads

With the source and objects selected, configure the destination. Select Snowflake and provide the configurations as shown in the following screenshot:

Destination configuration
Destination configuration

The destination configuration includes 'AUTOMATIC_NORTHWIND' as the database name. Note that you'll have to create the 'AUTOMATIC_NORTHWIND' database in Snowflake first, otherwise the export will fail.

Click "TEST CONNECTION". If the connection is successful, click "SAVE & CONTINUE".

Connection successful
Connection successful

Add a prefix to identify which tables were loaded by a specific pipeline. Use 'hevo', then click "CONTINUE":

Prefix
Prefix

The Hevo Data tool processes and loads the data:

Processing
Processing

After it has finished loading, you'll get a summary of the process:

Loading summary
Loading summary

Go to Snowflake and open the Northwind database. You'll see the three tables: 'HEVO_CUSTOMERS', 'HEVO_EMPLOYEES', and 'HEVO_PRODUCTS'.

Look at the 'HEVO_EMPLOYEES' table to see the employee details.

Employees loaded by Hevo Data
Employees loaded by Hevo Data

Conclusion: How to get your data from an AWS RDS database into Snowflake

You should now have a solid understanding of how to move data from Amazon RDS into Snowflake. As you saw, the two main options are manual and automated.

The manual option comes with some limitations, as not all database engines in RDS are supported for export, and the process is a bit cumbersome since you have to constantly update the scripts yourself. The automated method using the Hevo Data tool, though, provides a quick setup and allows you to efficiently load incremental data. Whichever method you prefer, Snowflake will help you optimize your data analytics.

Related Content

Creating charts for data visualization and analytics is difficult by hand, illustrated by this drawing of a line chart on graph paper with a pen and ruler on a wooden table, so we’ve selected our favorite React charting libraries: Recharts, Echarts for React, React ChartJS 2, and VISX.

Data Engineering

Best React Charting Libraries for Data Visualization and Analytics

We've picked Recharts, Echarts, React ChartJS 2, and VISX as the best charting libraries for data visualization and data analytics in React.

Abstract background image

Data Engineering

What is the separation of storage and compute in data platforms and why does it matter?

As businesses strive to become data-driven, the separation of storage and compute has become a critical factor in data platforms.