What are Snowflake Tasks?

A Complete Guide to Snowflake Tasks and Their Use Cases

Photo: Propel

‍

What are Snowflake Tasks?

Snowflake Tasks are user-defined objects in Snowflake that enable the automation of repetitive SQL operations. They allow for the scheduling and execution of SQL statements on a regular basis while also providing flexibility in terms of time intervals and chaining of tasks. Snowflake Tasks help streamline the process of data ingestion, transformation, and monitoring by offering an efficient and flexible mechanism for managing these operations.

Source: Snowflake Tasks Documentation

What are the use cases for Snowflake Tasks

There are numerous use cases for Snowflake Tasks, including:

Scheduled queries

Snowflake Tasks can be employed to automate the execution of recurring SQL queries. By scheduling queries, you can ensure that data is consistently refreshed and that analytics are always up-to-date. These scheduled queries may be employed for various purposes, such as data validation, monitoring, and generating reports or dashboards.

Transformations

Data transformations are a critical aspect of any data pipeline. Snowflake Tasks can facilitate these transformations by allowing the scheduling of SQL statements that manipulate and transform raw data into a consumable format. This enables users to maintain clean and structured data while reducing manual intervention.

Change data capture

In combination with Snowflake Streams, Tasks can be used for change data capture (CDC). This process involves monitoring changes to your source data and applying those changes to a target system. Snowflake Tasks and Streams make CDC efficient and reliable, allowing you to keep your data synchronized across multiple systems.

Source: Snowflake Streams and Tasks for CDC

Snowflake Tasks vs. Airflow: Key Differences

While Snowflake Tasks and Apache Airflow both provide automation and scheduling capabilities, there are key differences between the two:

Feature	Snowflake Tasks	Apache Airflow
Platform Intergration	Native to Snowflake	External to Snowflake
Language	SQL	Python
Dependency Management	Built-in	Requires external management
Scalability	Managed by Snowflake	Requires user management
Maintenance	Low	Moderate to Hign

Snowflake Task example: Creating a Task for a Scheduled Query

To create a Snowflake Task for a scheduled query, follow the steps below:

1. Create a warehouse for the task:

CREATE WAREHOUSE task_warehouse
WITH WAREHOUSE_SIZE = 'XSMALL'
AUTO_SUSPEND = 1
AUTO_RESUME = TRUE;

2. Create the scheduled SQL statement:

CREATE OR REPLACE TABLE daily_summary AS
SELECT
  DATE_TRUNC('DAY', timestamp) AS day,
  COUNT(*) AS total_events
FROM event_data
GROUP BY day;

3. Create the Snowflake Task:

CREATE TASK scheduled_daily_summary
  WAREHOUSE = task_warehouse
  SCHEDULE = 'USING CRON 0 0 * * * UTC'
AS
  INSERT INTO daily_summary (day, total_events)
  SELECT
    DATE_TRUNC('DAY', timestamp) AS day,
    COUNT(*) AS total_events
  FROM event_data
  GROUP BY day;

4. Describe the task you just created. You will notice that it is created in a suspended state.

DESCRIBE TASK scheduled_daily_summary RESUME;

5. Enable the task:

ALTER TASK scheduled_daily_summary RESUME;

Snowflake Tasks can be chained together to create complex data processing workflows. Chaining tasks allows you to define dependencies between tasks, ensuring that a task is executed only after its predecessor tasks have been completed successfully. This feature enables you to create a directed acyclic graph (DAG) of tasks, making the execution of multi-stage workflows more manageable.

Source: Snowflake CREATE TASK Documentation

Chaining Tasks Example

Consider a scenario where you need to perform two sequential operations on your data: first, you aggregate the data at a daily level, and then you compute the moving average over a seven-day window. You can achieve this by chaining two Snowflake Tasks.

1. Create the table for storing the daily summary:

CREATE OR REPLACE TABLE daily_summary (
  day DATE,
  total_events INT
);

2. Create the table for storing the seven-day moving average:

CREATE OR REPLACE TABLE moving_average (
  day DATE,
  average_events FLOAT
);

3. Create the first task for the daily summary:

CREATE TASK daily_summary_task
  WAREHOUSE = task_warehouse
  SCHEDULE = 'USING CRON 0 0 * * * UTC'
AS
  INSERT INTO daily_summary (day, total_events)
  SELECT
    DATE_TRUNC('DAY', timestamp) AS day,
    COUNT(*) AS total_events
  FROM event_data
  GROUP BY day;

4. Create the second task for the moving average, with a dependency on the first task. Note that the second task does not have a predefined schedule. It runs after the first task.

CREATE TASK moving_average_task
  WAREHOUSE = task_warehouse
  AFTER daily_summary_task
AS
  INSERT INTO moving_average (day, average_events)
  SELECT
    day,
    AVG(total_events) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS average_events
  FROM daily_summary;

5. Enable the tasks:

ALTER TASK daily_summary_task RESUME;
ALTER TASK moving_average_task RESUME;

Visual DAG

In this example, the DAG of tasks consists of four nodes ( PARENT_TASK, CHILD_TASK1, CHILD_TASK2, and CHILD_TASK3) and a single edge representing the dependency between them.

The DAG can be a bit tricky to find in the Snowflake UI. Make sure you are in Snowsight the new UI, then go to “Data”, then “Databases”, then “Tasks”.

This simple DAG illustrates how Snowflake Tasks can be chained to create multi-stage workflows. More complex DAGs can be created by chaining multiple tasks with different dependencies, resulting in a powerful and flexible way to manage your data processing workflows.

How to debug Snowflake Tasks

Debugging Snowflake Tasks is essential to ensure that your data pipelines run smoothly and as expected. This subsection covers the process of debugging tasks.

Debugging Tasks

When a Snowflake Task encounters an error, the task execution stops and does not automatically resume. To investigate issues with tasks, you can use the following techniques:‍

1. Examine the task history: Snowflake maintains a history of task executions, including their status, the query ID, and the execution time. Use the TASK_HISTORY table function to review this information:

SELECT *
FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY('your_task_name'))
ORDER BY start_time DESC;

2. Inspect the query history: Once you have identified a problematic task execution from the task history, examine the corresponding query using the QUERY_HISTORY function to gain more insight into the issue:

SELECT *
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE QUERY_ID = 'your_query_id';

3. Review the error message: The error message associated with the failed query can provide valuable information about the root cause of the issue. Use the SYSTEM$EXTRACT_ERROR_MESSAGE function to retrieve the error message:

SELECT SYSTEM$EXTRACT_ERROR_MESSAGE('your_query_id');

In summary, debugging Snowflake Tasks is a crucial step in ensuring that your data pipelines run smoothly. In a future blog post, we’ll cover how to send notifications to AWS SNS. By employing the techniques and examples provided in this subsection, you can effectively monitor your tasks, identify issues, and promptly address any problems that arise.

Conclusion

Snowflake Tasks offer a powerful and flexible way to automate SQL operations, making them an essential tool for managing data pipelines. With a variety of use cases, such as scheduled queries, transformations, and change data capture, Snowflake

Tasks help streamline and optimize data processing workflows. While Apache Airflow provides more extensibility and advanced monitoring features, Snowflake Tasks' native integration and simplicity make them a compelling choice for Snowflake users.

By understanding the fundamentals of Snowflake Tasks and leveraging their capabilities, you can improve the efficiency and reliability of your data pipelines, ensuring that your data is always up-to-date and ready for analysis.

Snowflake API: Comprehensive Guide to 3 Methods With Examples

This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.