Facing high Snowflake costs? Check out our practitioner's five-point checklist to reduce Snowflake costs and optimize your data warehouse usage.
Snowflake is an incredibly powerful and scalable data warehouse, but without proper management, your costs can skyrocket and get out of control. In this checklist, we will explore five key strategies for optimizing Snowflake costs, ensuring that your data warehouse remains efficient and cost-effective.
1. Set the correct warehouse auto-suspend
What is the Auto-Suspend feature?
Auto-Suspend is a Snowflake feature that allows your warehouse to automatically suspend itself after a specified period of inactivity. This helps to reduce costs by preventing unnecessary credit usage when the warehouse is not in use.
The default value and why it's inefficient
The default auto-suspend value is 10 minutes, but this is often too long for many workloads. For example, if a query lasts 20 seconds, the warehouse will still be charged for 10 minutes of usage, leading to unnecessary costs.
Example query for querying and modifying auto-suspend
To query the auto-suspend settings for all warehouses, use the following query:
The <span class="code-exp">auto_suspend</span> column specifies how long a running warehouse can remain inactive, in seconds, before automatically suspending and stopping credit usage. A <span class="code-exp">null</span> value means that the warehouse will never automatically suspend. You’ll want to avoid having any warehouse with a <span class="code-exp">null</span> auto-suspend. Typically, <span class="code-exp">60</span> , meaning an auto-suspend of a minute is a good place to start.
To modify the auto-suspend setting for a specific warehouse to 1 minute (60 seconds), you can run the following:
2. Right-size your warehouses: monitor remote disk spillage
What is remote disk spillage, and why it causes costs to increase
Remote disk spillage occurs when a virtual warehouse runs out of memory and begins spilling intermediate results to remote storage. This can lead to increased query times and higher Snowflake costs due to increased I/O operations and additional credit usage.
Identify the warehouses that have remote disk spillable
To monitor remote disk spillage, you can use Snowflake's QUERY_HISTORY function. Here's an example query to identify the warehouses with remote disk spillage over the last 30 days:
Fixing remote disk spillage by increasing warehouse size
To fix remote disk spillage, you can increase the size of your warehouse. By doing so, you will allocate more memory to the warehouse and reduce the need for remote storage, ultimately lowering costs. With a larger warehouse, your queries will complete more than twice as fast. Make sure to monitor and adjust warehouse sizes as needed to prevent spillage.
3. Use a serving layer for high-concurrency sub-second queries
Sometimes, you require high-concurrency sub-second queries. These are critical for customer-facing dashboards, data APIs, and usage metering, where fast response times are essential.
How a serving layer on top of Snowflake works
A serving layer on top of Snowflake syncs data to high-speed storage, optimizes it, and serves it via an API without consuming Snowflake credits. This reduces Snowflake costs while providing high-performance analytics to your end-user applications.
It is a better alternative to traditional embedded analytics because it reduces the load on Snowflake. Furthermore, by providing an API, it offers greater flexibility and control.
Propel's data API platform provides engineering teams with a unified platform for delivering high-performance customer-facing analytics. Its serving layer solution offers an analytics backend with a GraphQL API and React UI component library, requiring no infrastructure scaling or management. To learn more about Propel, read the docs.
4. Identify and fix inefficient queries
Inefficient queries are the source of a lot of waste. In this section, we show you how to identify them and share some tips to optimize them.
How to identify inefficient queries
Inefficient queries can significantly contribute to higher Snowflake costs. To identify them, you can monitor the <span class="code-exp">QUERY_HISTORY</span> view and look for queries with long execution times or high resource consumption.
You can use the following query to identify your slowest queries in the last 30 days in Snowflake:
Tips to improve query efficiency
To make queries more efficient, consider the following tips:
- Use selective filtering and avoid SELECT *.
- Optimize JOIN operations.
- Make sure tables have an ORDER BY.
- Limit the number of rows returned using LIMIT.
- Use materialized views for pre-computed results.
- Leverage clustering keys to improve query performance.
5. Set up Resource Monitors
What are Resource Monitors?
Resource Monitors in Snowflake allow you to track and manage your credit usage, helping to prevent unexpected costs. You can set thresholds that, when exceeded, trigger actions such as suspending a warehouse or sending notifications.
Example: Creating a Resource Monitor
To create a Resource Monitor that suspends a warehouse when credit usage exceeds a specified amount, you can use the following SQL command:
Optimizing Snowflake costs is crucial for maintaining an efficient and cost-effective data warehouse. By following this five-point checklist, you can optimize warehouse usage by preventing remote disk spillage, adjusting auto-suspend settings, leveraging a serving layer for high-concurrency queries, and monitoring resource consumption. So, start implementing these strategies today and make the most of your Snowflake investment.
- How to build a Snowflake API
- Introducing the new Metric Report API: Powerful reports for any app with a single GraphQL request
- Propel UI Kit: Data visualization and dashboard React components
If you don’t have a Propel account yet, you can try Propel for free and start building data apps.