Let’s imagine your company has a single data analyst who handles everything Snowflake and is running complex database queries on a daily basis. He or she might be fine waiting a few minutes, but not a few hours, so they increase the size of a single virtual warehouse until the query processing speed is correct. In this case, a single Snowflake warehouse works great.
On the other hand, building in-product analytics would be a great example of when you’ll need to account for concurrency. If you offer in-product analytics or insights to your customers, then your customers might request dozens of unique data queries simultaneously. You don’t want any individual customer’s results to be slowed down because of other requests in the queue.
The preferred approach would be to just turn the dial up to 11 and spring for the 6X-Large warehouse, but your budget may not love that idea. At a hefty 512 credits per hour (costing about $1024 per hour at $2 per credit), you’ll need to pay $747,520 per month to keep that behemoth online. And if you only need user concurrency sometimes, not 24/7, you’ll be wasting money.
When you’re dealing with complex concurrent requests, you’ll probably want to explore Snowflake’s multi-cluster warehouses, which offer fully-automated scaling for concurrency. Multi-cluster warehouses allow you to make a larger pool of processing power available for running queries, and they are a feature available only in the Enterprise version of Snowflake.
While a standard virtual warehouse has only a single cluster of compute resources, a multi-cluster warehouse can have up to 10 clusters inside of it. Like standard warehouses, there seems to be no limit on the number of multi-cluster warehouses you can have in your Snowflake account, but each multi-cluster warehouse is limited to a maximum of 10 compute clusters.
Multi-cluster warehouses give you the concurrency benefits of multiple warehouses, with the added advantage that Snowflake automatically starts and shuts down running clusters as needed to handle peaks and troughs in demand for user concurrency. In other words, the number of clusters used by the warehouse automatically scales based on the current demand on Snowflake for data analytics. But, you’ll likely need some additional data engineering to connect more than 1 multi-cluster warehouse of 10 compute resources into a single application.
There are two modes of operation available for multi-cluster virtual warehouses in Snowflake:
The auto-scale mode is obviously better, but it’s also way more complicated. For example, you have to specify a scaling policy, where “Standard” always increases the number of clusters according to demand, while “Economy” only adds a cluster if there’s enough query load to keep the new cluster busy for 6 minutes. As you might guess from the name, the latter saves money, but it can lead to queries still queuing up, even though you are using a multi-cluster warehouse.
Of course, the alternative is complicated as well. To set up multiple individual warehouses manually, you’ll need to choose a size appropriate for each one, based on your own estimates of the CPU load of having multiple users and queries running simultaneously. That’s why it is so useful to have multi-cluster warehouses that auto-scale the amount of compute available up and down as needed, but your company will need to sign up for the Enterprise edition of Snowflake.
Building any data app with multiple users, especially in-product analytics, is the perfect example of when you might need a multi-cluster virtual warehouse in Snowflake. Let’s say a company builds a leaderboard metric inside its product, which has 1000s of customers. At first, they’re comfortable just resizing a single warehouse to be larger in order to improve slow queries and data loading issues, but then they see the bill. In response, they switch to a multi-cluster virtual warehouse in auto-scale mode, and they get improved performance at lower cost.
At first, the company tries the Standard scaling policy, but spinning up lots of warehouses is still costing them too much money. The customers of their SaaS (software as a service) product are comfortable seeing their leaderboard metrics delayed by 15 minutes, so the company is able to queue up the requests from their serverless functions efficiently. After a few experiments, they are able to provide fresh data to the leaderboards using a single multi-cluster virtual warehouse in auto-scale mode with the Economy scaling policy selecting between 1 and 6 clusters.
Of course, that company may have needed to hire an entire team of data engineers to build that in-products analytics app on top of their data warehouse. If you’re considering such a project, you should check out Propel Data, where you can build analytics with the team you have now.
At Propel, we provide developers with easy-to-use GraphQL APIs that allow frontend and full-stack developers to build analytics products that support massive user concurrency at scale. Our backend handles multi-cluster Snowflake warehouses, data access policies, and fast responsive times in a performant, cost-effective manner, without extra data engineering. Propel is the clear choice for building in-product analytics with Snowflake as a data warehouse or data lake, because we help developers query the data they already have in Snowflake via our GraphQL API.
Snowflake data platform is referred to as a data warehouse or data lake because it separates storage (data) from compute (processing power).
Snowflake uses credits, which are analogous to CPU nodes, in order to pay for the virtual warehouses that power its analytical query engine.