Snowflake is a data analytics platform that offers advantages over traditional databases for a variety of interesting data engineering and analytics applications.
Snowflake uses databases to hold large amounts of data (“storage”) separate from the “virtual warehouses” (or just “warehouses”), which process and manipulate that data (“compute”).
While data warehouses are central repositories of data used for reporting and data analysis, Snowflake uses the term “warehouse” specifically to mean a virtual computational cluster that allows you to manipulate and process data for analytical queries.
“A virtual warehouse is a cluster of compute resources. A warehouse is needed to execute certain types of SQL statements because it provides resources such as CPU, memory, and local storage.” –Snowflake Docs
Meanwhile, databases in Snowflake have a more traditional definition. Snowflake’s relational databases are built using SQL (Structured Query Language), and they’re where you store the data you’ll be using in the platform.
“All data in Snowflake is maintained in databases. Each database consists of one or more schemas, which are logical groupings of database objects, such as tables and views.” –Snowflake Docs
Each schema belongs to a single Snowflake database, and each database belongs to only one Snowflake account. Together, a database and schema are called a “namespace” in Snowflake. Snowflake does not limit the number of databases, the number of schemas (within a database), or the number of objects (within a schema) that you are able to create in a single account.
Snowflake’s warehouses are the compute engines that allow you to easily scale up and down the size of the computing cluster running your analytics queries. The warehouse is what runs the analytics in Snowflake, while the database itself is just a static repository holding the data.
Snowflake has a unique approach to databases compared to other data warehousing solutions. Snowflake’s advantage comes from the use of micro-partitioning, which are small partitions of 50 to 500MB that are created automatically and enable faster queries than static partitions.
The automatic creation of clusters in Snowflake is called automatic clustering, a process that consumes Snowflake credits and thus costs money. However, automatic clustering does not require you to set up a virtual warehouse manually; it happens automatically behind the scenes. Nevertheless, you’ll have to pay for those warehouse resources in addition to the storage costs.
Snowflake’s storage databases include Continuous Data Protection (CDP) features known as Time Travel and Fail-safe. Snowflake Time Travel allows querying, cloning, and restoring historical data from Snowflake tables, schemas, and databases for 1 day (Snowflake Standard Edition) or for up to 90 days (Snowflake Enterprise Edition). Snowflake Fail-safe is a disaster recovery system for historical data that allows recovery for 7 days after Time-Travel expires. Neither of these CDP features requires a warehouse to be running, but both incur storage costs.
In conclusion, Snowflake is a data warehouse that separates storage from compute resources. The data held at Snowflake is held inside namespaces, which are composed of databases and schemas, while virtual warehouses handle the analytical queries. You use Snowflake credits to pay for both storage in databases as well as processing time (when warehouses are running).
If you enjoyed this article on Snowflake, please sign up for Propel’s email newsletter, follow us on our Twitter account @propeldatacloud, or join the customer waiting list to stay in touch. We’re onboarding Snowflake customers as fast as we can!
Snowflake data platform is referred to as a data warehouse or data lake because it separates storage (data) from compute (processing power).