How-to

How to set up development and production environments in Snowflake

Learn Propel's recommended way to set up development and production environments in Snowflake to develop and test code without impacting end users.

Graphic by Propel

Abstract Databases

In software development, engineers typically build applications in a separate system from the one where the end product will be running. Such a system can be a simple stand-alone machine or a complex multi-cloud setup with hundreds of nodes, load balancers, API Gateways, etc. However, the idea remains the same: you develop and test your application in an environment different from the one your end users will use. So you have one environment for development, one for testing, one for production, and so on. Each environment needs to have sufficient resources to run the application, including databases, network bandwidth, storage capacity, CPU power, and the like.

The production environment is where the latest stable version of the application runs. Software engineers primarily interact with the development environment to develop new features or patches and validate any new code before releasing it for testing. Therefore, the software version in development will be more recent than the production version. The development environment also won't have the same traffic as the production. Both environments could be running in physically separate networks or at least using different resources within the same network.

Keeping this distinction between development and production environments during the software development lifecycle is vital for a few reasons:

First, it helps in preventing downtime. A production environment runs a stable, tested version of your software. Directly making changes to this version is a surefire way to break something and cause an outage. A development environment reduces this risk and validates modifications or enhancements.

Second, a development environment can help catch security vulnerabilities and bugs in the code before deploying it in production. Similarly, developers and QA engineers won't have full access to the production environment, ensuring best security practices.

A development environment also allows developers to identify and test new dependencies for their code. Such dependencies can be infrastructure or software related. Developers can test the dependencies before implementing them in production.

Modern software development teams typically follow the agile methodology, where the team makes incremental enhancements or changes to the code, tests them, and ships them to production. Agile development also sees stakeholders or customers taking part in initial testing so developers can take on any feedback in the early stages. A development environment allows you to showcase any changes to such external parties.

Now that you know why you need different environments for software development, the following sections explain the essential criteria for ensuring this separation and how you can implement such separation using the Snowflake data warehouse.

Requirements to Create Separate Development and Production Environments

You must fulfill a few requirements to separate your application environments.

Isolation

With isolation, you run production and nonproduction workloads separately, so one environment doesn't affect the other.

Using separate networks is one way you can implement isolation. For example, you can use different Virtual Private Clouds (VPCs) within the same cloud account or different accounts from the same cloud provider, preventing network traffic in each environment from reaching the other.

Another method is using separate fleets of servers, load balancers, API Gateways, web application firewalls (WAFs), or databases for separate environments. These resources can reside within the same network or use the same managed resources with separate tenancies.

Businesses may sometimes host development and production environments on the same resource. For example, a physical server could run a set of virtual machines for both development and production purposes. Similarly, a web server can have different VirtualHosts to serve development and production traffic. Although having both development and production environments running on the same server or application is not ideal, companies often use such separation for noncritical workloads to save costs.

Access Control

Environment separation may also require a separate authentication and authorization layer, where each environment will have its user accounts, groups, roles, and permissions.

Security is the primary reason for separate access control. Typically, development environments are less secure than production environments as developers and testers have elevated server, database, and file-share privileges and sometimes know service account passwords. The firewalls in the development environment may also allow a wide IP range. However, even if a cyberattack breaches the development environment's security, the attackers won't be able to compromise the production data because developers usually do not have wide-ranging access to production servers. Separate service accounts could be running applications in the production environment, and firewall rules may be far stricter.

Data Separation

A development environment must host a different data set from its production equivalent.

Sometimes, production data could be sensitive. Examples include personally identifiable information (PII), medical data, financial information, or user credentials. Using the same data in a development environment unnecessarily exposes it to exploitation and misuse. When using a copy of production data, all sensitive information must be redacted, encrypted, scrambled, or replaced with dummy versions.

Using the latest version of production data also requires backing it up and restoring it to development on a regular basis. Although you can automate this process, it may be unnecessary because functional testing doesn't always need the latest data. Replicating data from production to development may also overwrite any data changes that were made during testing.

Another problem that could occur when duplicating data from development to production is that program code that depends on auto-incrementing fields may fail when restoring a new data set from production. A blanket restore could overwrite configuration parameters and replace them with production values. Such configuration updates might have serious consequences if parameters are changed from their original values. For example, developers and testers may inadvertently update or delete data without knowing that it's impacting the production system.

Development teams often use scripts to generate dummy data or load from sample data files to implement data separation. As an alternative to replicating the latest data from production to development, they could keep an older copy of production data and restore it when necessary.

If you need to use the latest version of production data in development, consider restoring a subset of it wherever possible. A partial restore will ensure you have a small dataset to roll back if something goes wrong.

You need to be aware of two things whenever you are restoring a copy of production into development. First, you must ensure the development environment uses the original configuration values. Such configurations can include ODBC strings, IP addresses, service accounts, usernames, or databases. Second, you must check that no sensitive data is present in the restored copy, such as information related to medical history or data related to credit card details or account transactions, which only a certified DBA should have access to in case of a data defect.

How to Implement Separate Development and Production Environments in Snowflake

Snowflake is an SQL-based cloud data platform that is best known as a feature-rich data warehouse. Organizations are increasingly deploying their analytics workloads on Snowflake due to its unlimited scaling capabilities, flexible pricing, support for major cloud providers, and almost zero administration overhead.

The next sections outline two methods you can use to separate development and production environments in Snowflake; specifically, you can achieve this using either of the following:

- One Database or Schema per Environment or

- One Snowflake Account per Environment

One Database or Schema per Environment

With this approach, you will have a single Snowflake account and create one database for each environment, as shown below:

One database for each environment in Snowflake
One database for each environment in Snowflake

You can also have a single database and create a separate schema for each environment:

One schema for each environment in Snowflake database
One schema for each environment in Snowflake database

However, creating a separate schema for each environment is not recommended for a couple of reasons. Firstly, code that is meant to run on a development schema may sometimes accidentally be run on the production schema, causing serious data issues. Secondly, using the same database for both production and development purposes will mean fine-tuning security for each schema, which can be a tedious and error-prone process. Lastly, you might want to leverage the database and schema constructs in both development and production environments, using a database or a schema as an environment would prevent you from using them to separate marketing and finance data for example.

Advantages of One Database or Schema per Environment

Using separate databases (or schemas) for each environment has some advantages:

  • All databases (or schemas) remain in the same account (for separate databases) or in the same database (for separate schemas). This simplifies administration compared to using multiple Snowflake accounts.
  • You can clone a database or schema at any time with almost zero latency. Unlike loading data, cloning doesn't incur any cost as Snowflake runs the command without starting a warehouse.
  • You can assign a separate warehouse for running queries against each database, ensuring performance by having separate resources for the development and production environments. Using separate warehouses means that the development environment won't interfere with the performance of analytical queries run in the production environment
  • Snowflake Time Travel allows you to create a new database from a particular version of the source database. For example, if you want to create a development database from a particular point-in-time snapshot of the production database, you can run a command like this:

        <span class="code-example">
         CREATE DATABASE MY_DEV_DATABASE</br>   
         CLONE SAMPLE_DB</br>    
         AT ( TIMESTAMP => '2022-07-04 12:00:00'::timestamp);</br>
         </span>

    The point in time is useful when you want to recover data from a saved backup to your staging/ or production database from a specific point in time. This is particularly helpful if a developer makes a mistake, for example overriding the staging database with production copy while losing the last configuration updates that were meant to roll out on the next release.

The point in time is useful when you want to recover data from a saved backup to your staging/ or production database from a specific point in time. This is particularly helpful if a developer makes a mistake, for example overriding the staging database with production copy while losing the last configuration updates that were meant to roll out on the next release.

Disadvantages of One Database or Schema per Environment

Using separate databases (or schemas) for each environment also has some disadvantages:

  • A single database or Snowflake account is a single point of failure, particularly if the database or account is compromised or if there's data corruption and you haven't configured Time Travel.
  • If there's an issue with the Snowflake account, all environments, including production, are impacted.
  • You can't use the Database or Schema constructs to represent anything other than the environment. This is a pretty big limitation and the main reason we don't recommend this approach.

One Snowflake Account Per Environment

You can also implement separate development and production environments in Snowflake by creating a separate account for each environment. You use the Snowflake Organization feature to set up the accounts under the same organization.

Advantages of One Snowflake Account per Environment

There are many advantages to having one Snowflake account per environment, such as:

  • You can create your environments in different regions and cloud service providers—allowing you to comply with data-sovereignty regulations and test Snowflake's performance on different cloud platforms.
  • The accounts are completely isolated, ensuring enhanced security.
  • You can assign different levels of security to each environment's account.
  • You can use Snowflake's Data Sharing capabilities to use the same data across environments if needed.
  • You can replicate an account's data to a separate account in a different region.
  • You can take advantage of different pricing tiers for each account. For example, the development environment can be Standard Edition, while the production environment can be Enterprise or Business Critical Edition.

Disadvantages of One Snowflake Account per Environment

There are some disadvantages to having one Snowflake account per environment, but these mainly affect the operation and the financial side and not the quality or security of your development cycle. These disadvantages include:

  • You have to spend more time designing your Snowflake architecture and carefully manage each account separately, which requires extra effort from the technical team.
  • Your total cost of ownership (TCO) will increase. Having separate accounts per environment requires multiple warehouses, as at least one data warehouse will be created for each environment. You usually pay by credit, depending on the warehouse size (X-Small, Small, Medium, Large, X-Large, 2X-Large, 3X-Large, 4X-Large). This cost will probably be at least around 1.5 times more expensive than having one account per environment but can be as much as two or three times more expensive. It all depends on your use case because you might use the X-Large on production but the medium on staging or less.
  • Cloning databases, schemas, tables, or any object across different accounts isn't feasible as this is an anti-pattern and not supported. However, Snowflake provides sharing of secured views that you can create and share across different accounts and regions. 

Snowflake ORGAMIN

When choosing how to separate your development and production environments in Snowflake, it's important to consider the organization administrator role (ORGADMIN). In Snowflake, an organization is the object that links the accounts owned by your business entity, and the ORGADMIN is responsible for managing all operations on the organization level. 

The ORGADMIN user will have the following benefits:

The ORGADMIN role is usually enabled for a single account in an organization if a user creates an account using the self-service portal. Users with an ACCOUNTADMIN role can assign the ORGADMIN role to a user. 

Example of Assigning ORGADMIN role:

use role accountadmin;
grant role orgadmin to user ;
grant role orgadmin to role ;

![Assigning ORGADMIN Role](https://i.imgur.com/Lllysdw.png)

Assigning ORGADMIN Role
Assigning ORGADMIN Role

The ORGADMIN role can only be enabled for one account; if you want to enable it for multiple accounts, you will need to contact Snowflake support. However, this practice is not recommended by Snowflake.

Conclusion

Now you know why you need separate application environments and some of the criteria to ensure such separation. You also learned how Snowflake, a robust cloud data warehouse, can implement environment separation for analytics workloads using either one database (or schema) or one Snowflake account per environment. 

There are some advantages and disadvantages whether you choose to have separate Snowflake accounts per environment or one account for all environments. The best option for you depends on the use case and the budget you are working with. If you have a limited budget and often require repetitive cloning operations from different environments, then using the same Snowflake account for all your environments is the best fit. If you don't have a budget constraint and you want to limit production access, for example, due to data compliance and policies, then having separate Snowflake accounts is more suitable.

Propel Data is a serverless analytics API platform to power customer-product analytics for web and mobile applications. Propel's Analytics API platform integrates seamlessly with Snowflake so you can use the data you already have in your customer-facing applications.

To learn more, join Propel Data's waitlist.

Related Content

Propel Data offers a blazing fast GraphQL API for building data apps from data that organizations already have in Snowflake data warehouse, as illustrated by this photograph of a laptop showing a real-time product dashboard with analytical reports and data visualization.

How-to

How to build Snowflake data apps with GraphQL

Need to build a Snowflake data app? Here's how to create and query a Metric on top of Snowflake data warehouse using Propel’s GraphQL API.

duplicate abstract graphic

How-to

How to Deduplicate Events in Snowflake with dbt

This article will demonstrate how to deduplicate events in Snowflake using dbt