Snowflake, like any data warehouse, comes with a set of inherent trade-offs between performance, cost, and concurrency. If you need more performance, meaning faster analytics queries, you’ll need larger virtual warehouses, which cost more. If you need to run multiple simultaneous queries, you’ll need to set up more warehouses ahead of time to handle the load.
One of the real downsides of Snowflake is the fact that you have to manually configure and set up your warehouses, including specifying their size (and therefore their cost). For simple analytics, that’s not a problem, but it becomes a huge issue when dealing with concurrency.
While the simplest solution to handle concurrent requests would be to choose a very large warehouse, one with a lot of processing power, you’ll need to pay to keep that warehouse running any time you want to run analytics. That can come with a huge financial cost, since you’ll need to keep those warehouses spun-up and running, waiting for concurrent queries.
Of course, if your database queries are simple, you’ll do fine with a small Snowflake warehouse, which can even handle some level of concurrent queries. But, if you have multiple users running complex analytical queries, you’ll probably need to set up multiple warehouses in Snowflake.
As mentioned above, Snowflake asks you to pay for performance by sizing up your virtual warehouses. If you have a complex database query, you’ll need a larger warehouse, and if you need to run concurrent analytics queries, you might need a multi-cluster virtual warehouse. So what makes a complex analytics query? And when do you need a multi-cluster warehouse?
Three of the most important factors that affect SQL database query performance are table size, joins, and aggregations. Taken together, these three factors create complex analytical queries.
" Table size: If your query hits one or more tables with millions of rows or more, it could affect performance.
 Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. [...]
 Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows."
Each of these factors increases the number of calculations required to perform your analytics query. For example, querying 1,000,000 rows joined from 10 tables where you’ve pre-aggregated timestamp data into a date without the time, you’ll need a lot more CPU time than when querying that timestamp data directly from one of the 100,000-row component tables.
Now, let’s imagine you have 100 customers who expect an in-product dashboard to be updated every 15 minutes. You’ve transformed the analytics data from your servers and loaded it into Snowflake as a data warehouse, and you’ve configured your software to run an analytics query for 7 customers every minute to meet your service level agreement (SLA) of 15 minute latency.
You could just keep a very, very large warehouse (maybe 4X-Large) running in Snowflake, but that will cost you a lot of money: a 4X-Large uses 128 credits hourly, 64 times more expensive than a Small warehouse, which uses just 2 credits per hour.
Assuming a 4X-Large warehouse could handle each query in 5 seconds, you might still end up with delays due to overwhelming concurrency needs if 10 customers were to request updated analytics from their dashboards at the same time. Plus, to pause your queries overnight, you’d have to manually turn off the warehouse; otherwise you’d have to pay Snowflake to keep it on.
Instead of setting up 7-10 Small warehouses, which you’d still have to shut down manually at the end of the day, you’ll probably want to explore multi-cluster virtual warehouses instead.
Multi-cluster warehouses are a feature of Snowflake Enterprise Edition and have the main benefit of automatically scaling based upon the demand placed on them. In comparison, a regular virtual warehouse in Snowflake always has to be resized manually, meaning you have to notice and resize the warehouse yourself when your queries are backing up or taking too long.
Additionally, while a Snowflake warehouse can handle some limited concurrency (simultaneous queries), Snowflake will place additional queries into a pending queue if insufficient resources are available to execute all of the queries that have been submitted to run on the warehouse. There’s no way to specify that you’d like the queries to run at the same time (but slower) in parallel, other than just increasing the size of the virtual warehouse and hoping for the best.
In comparison, a multi-cluster virtual warehouse in the “automatic scaling” (auto-scale) mode can handle a flexible number of concurrent user sessions and/or queries. As the queries start to queue up in pending status, Snowflake will automatically add additional clusters within the multi-cluster warehouse. Similarly, as the load on the virtual warehouse decreases, Snowflake will automatically shut down clusters from the multi-cluster warehouse to reduce the number of running clusters and thus the cost (in Snowflake credits) to keep the warehouse running.
Compare this to what happens with a standard warehouse with a single cluster. With a standard warehouse, when user or query load increases to the point where queries are taking too long, you have to intervene manually and either increase the size of the warehouse or start up additional warehouses. If you decide to spin up additional warehouses, you will have to redirect traffic (users or queries) to those new warehouses by hand. Then, to save costs, you’ll have to downsize or suspend the warehouses when the compute resources are no longer needed.
In comparison to a standard warehouse, a multi-cluster warehouse enables larger numbers of users to connect to the same size warehouse, because there are multiple computing clusters. There are still some manual settings needed for multi-cluster virtual warehouses, since you need to specify the minimum and maximum number of clusters that fits with your needs for concurrency. Combined with Snowflake’s ability to automatically start up and stop additional clusters in auto-scale mode, the benefits of multi-cluster warehouses should become clear.
One word of warning: multi-cluster warehouses should be used for large numbers of concurrent users or queries in order to increase analytical throughput in parallel. If a single analytics query is running slowly, or if it’s taking too long to load data into Snowflake, then it makes more sense to just use a standard warehouse of a larger size in order to improve Snowflake’s performance.
Not everyone has access to the Enterprise Edition of Snowflake, though, meaning multi-cluster warehouses aren’t available to everyone. Additionally, many organizations want to build data apps from data already in Snowflake without hiring data engineers to manage user concurrency. That’s why we built Propel – because we believe developers should be able to build dashboards and other data apps in minutes, not months, and it’s why we offer a GraphQL API for Snowflake.
If you enjoyed this article on Snowflake data platform, please sign up for our free email newsletter, follow us @propeldatacloud on Twitter, or join our new customer waiting list for the latest news and educational content from Propel Data Cloud, Inc.
Snowflake data platform is referred to as a data warehouse or data lake because it separates storage (data) from compute (processing power).