Snowflake is a data analytics platform that has become popular because of its excellent performance when running business intelligence (BI) and other data analytics queries. The term “data warehouse” refers to any service or computing infrastructure used to store analytical data, especially one that separates storage (the data) from compute (the processing of said data).
A data warehouse’s servers hold a copy of your data, which you originally stored somewhere else, so data warehouses are analogous to storing physical inventory in an off-site warehouse.
On the other hand, Snowflake’s concept of “virtual warehouses” (or just “warehouses”) is a little different from “data warehouse.” In Snowflake, warehouses are compute engines similar to virtual machines, and they’re how you scale up Snowflake. When you need more processing power to run faster queries, you make your Snowflake warehouses bigger in order to do so.
Colloquially, many people refer to Snowflake data platform as being a data warehouse. After all, data analysts and data engineers copy data from various sources (such as databases) over to Snowflake. In that sense, you store or “warehouse” your data on Snowflake’s platform.
However, unlike a private server, data warehouses typically separate billing for storage space from fees for compute time. Snowflake’s documentation also uses the related term “data lake” to highlight that Snowflake is compatible with massive amounts of unstructured or semistructured data, not just structured data with a fixed database schema (like would be necessary with SQL).
Snowflake is a data management system, one designed to support business intelligence (BI) activities, especially data analytics. Many people refer to Snowflake as the “data warehouse” in their data analytics stack, since it is optimized to store and compute massive amounts of data.
However, as mentioned above, the term “warehouse” has a specific meaning when using Snowflake. In Snowflake terminology, a warehouse (or “virtual warehouse”) is a temporary allocation of virtual computer time that allows the platform to perform certain database tasks.
A warehouse in Snowflake is a virtual machine (VM) that allows you to use its cloud-based SQL engine for big data analytics. In other words, a warehouse is like a computing cluster, with its own allocation of CPU (processing speed) and memory (RAM), but one that you don’t have to manage. According to Snowflake’s definition of the term, Snowflake’s “warehouses” are not data warehouses, even if you have many virtual warehouses in the same Snowflake account, but most people refer to Snowflake’s platform itself as a data warehouse or data lake.
The main advantage of using Snowflake as a data warehouse or data lake is its performance while processing huge amounts of data, especially when running complex analytical queries. Snowflake shines when compared to traditional database technologies such as SQL or noSQL.
For example, generating a leaderboard of sales revenue based on product category will be much more efficient and rapid with Snowflake than SQL. Snowflake would also be faster at finding the store with the highest sales or the salesperson with the highest revenue numbers.
The difference between Snowflake and more traditional database technologies has to do with whether the underlying database is organized into rows or columns. A row-based database like PostgreSQL or MySQL is extremely fast for recording transactions, so they are great for applications that need to handle customer sales and other types of transactional information at high speeds. On the other hand, a columnar database like Snowflake is optimized for analyzing large amounts of data quickly, but it isn’t as good at recording each transaction as it occurs.
That trade-off between “transactional performance” and “analytical performance” is why data warehouses like Snowflake exist, and it’s also why the term data warehouse became popular. Nearly everyone who uses Snowflake copies their data from some original source, like a SaaS application, into Snowflake, instead of using Snowflake as the only database technology. By warehousing data in Snowflake, running analytics doesn’t slow down any other application.
Using any data warehouse requires a lot of manual setup and maintenance, especially when it comes to connecting the data warehouse to customer-facing applications to provide insights. While a data analyst may be comfortable working with SQL and waiting 10 or 15 minutes for a database query to finish, that query performance won’t be good enough to build something like a customer-facing data app that provides customers with in-product analytics and insights.
Think of Google Flights, which processes huge amounts of data to tell you when to buy plane tickets, as an example of an app that offers useful in-product insights as a key feature. That feature would be practically useless if it took half an hour to run on every search. Customers expect snappy product experiences and actionable insights instantaneously – and on demand.
Of course, it’s possible to write custom middleware with tools like Delta Lake and Iceberg to bridge between Snowflake’s data platform and the data apps that generate customer insights. The tricky part is that Snowflake’s virtual warehouses are optimized for a single user at a time, not hundreds of concurrent users, so you’ll need to perform a lot of custom data engineering.
We built Propel Data to solve exactly this problem. Our data analytics platform allows you to access Snowflake data directly via a high-performance GraphQL API without any unnecessary overhead. Propel connects the data warehouse (Snowflake) to your customer-facing data apps, while we handle elasticity, scalability, performance, and security, all automatically. At Propel, our goal is to help you build the data apps you’ve always wanted with the team you already have.
Snowflake uses credits, which are analogous to CPU nodes, in order to pay for the virtual warehouses that power its analytical query engine.
Snowflake uses databases for data storage, while a “Snowflake warehouse” is a virtual computing cluster that processes analytical queries.