How to Build an Incremental Model for Events Using dbt and Snowflake

The incremental model in data is a system that incrementally updates, adding new data to your data storage without overwriting or reprocessing the entire database. dbt has popularized this model for data transformations as an easy and more performant way to update your database, since the table is only updated with records from the last time dbt was run.

Photo by Propel

In this tutorial, you'll learn how to use the incremental model in dbt to manage data streams in your Snowflake warehouse. This should help you optimize your data engineering structure and improve data accessibility.

What Are Incremental Models

Incremental models play an important part in ensuring data readiness for your downstream processes. You can drastically cut down on the volume of data that needs to be converted during the course of your changes, which in turn enhances warehousing efficiency and lowers compute expenses.

An incremental model is especially useful when exploring extract, transform, load (ETL) processes within your organization and when syncing event-style data across systems such as your enterprise resource planning (ERP) and customer relationship management (CRM) structure. You can effectively limit the amount of data actively being changed and processed.

Incremental models are also the better choice in cases where compute costs need to be optimized, for example when processing large source data with millions of rows, or with necessary transformations that are expensive or time-consuming to run.

Incremental Models in dbt

Incremental model

In dbt, an incremental model of your data is built as a saved materialization of all rows of the source data. On dbt's first run, all available data is processed and stored as its own table. After that, each successive dbt run will only process the rows that you specify, inserting them or updating existing rows in the saved materialization of your source data.

The rows in your source data that have been added or updated since dbt last ran are often the rows you filter for on an incremental run. As a result, your model is developed incrementally with each dbt execution.

While setting up your incremental model represents an added complexity at the start, it leads to significant improvements in performance.

‍

Building an Incremental Model Using dbt and Snowflake

dbt generally streamlines your data pipeline process, giving you more flexibility and boosting the transform aspects of your extract, load, transform (ELT) or ETL tasks. With dbt, you can optimize data readiness. You can aggregate, normalize, or sort your data however you wish for your downstream processes (analytics, AI, etc.) without continuously updating your pipeline and resending data.

‍

Prerequisites

You'll need the following to build your incremental model:

• dbt, which can be installed on Mac, Windows, and Linux using package managers such as Homebrew and pip, or as a Docker image. This tutorial uses Windows.

• A Snowflake account with sample streaming data. You can sign up for a thirty-day trial.

Note: If using Windows, Python 3.7+ and git are prerequisites for dbt/Snowflake installation.

‍

Setting Up Snowflake

On your Snowflake account, go to the **Marketplace** and search for the "Global Weather & Climate Data for BI" database. It contains weather and climate-specific data across countries and updates on an hourly basis.

Click **Get** and add the database to your Snowflake account:

You can check that the database has been added under **Data > Databases**:

Click the **+ Database** button and create another database, which will be a dbt access point where the software can create and store tables and views for you:

Lastly, create a warehouse on your Snowflake account under **Admin > Warehouses**. Warehouses are clusters of compute resources that are needed to perform various operations in Snowflake.

For this tutorial, create an **X-Small** warehouse:

Setting Up dbt

dbt utilizes technology-specific adapters to connect and run commands against your database platforms. For your Snowflake connection, you'll be installing a dbt-Snowflake adapter package and configuring dbt with Snowflake-specific information.

‍

Install dbt locally with the following command on your CLI:

Next, input the following command to create your dbt project and connect to your Snowflake account:

You'll be asked to input the name of the project and the database platform you want to use (Snowflake), as well as some authentication details such as your account name, username, role, and password. You'll also input details such as the database and schema that dbt is using (the second database you created and the PUBLIC schema). The process should look something like this:

With this, your dbt project should be created. A folder named after the project will be generated with the information you inputted, and it'll contain some default files.

In your CLI, change directory to the created folder, then run the `dbt debug` command to check the status of your dbt project and the Snowflake connection:

Creating an Incremental Model in dbt

Now that your Snowflake database has been set up and connected with dbt, you can use dbt to create models and transformations of your Snowflake data.

The weather data you added to your account has three available tables: CLIMATOLOGY_DAY, FORECAST_DAY, and HISTORY_DAY. You can find them under the views in the STANDARD_TILE schema. You'll use the FORECAST_DAY table, which contains records of locations and their weather forecasts and measurements.

To create your incremental model, create a .sql file within the models folder of your dbt project:

Copy the following code into your SQL file, replacing {Weather_Database} with the name of your imported weather data, and save:

The config() block tells dbt the kind of model you are building and constructs a unique ID for records, limiting duplicates in the incremental model. Your unique_key is a combination of the variables representing location, date of forecast, and the time it is valid for. This ensures that only new forecasts for each location are appended.

The is_incremental() function denotes the logic for filtering records from your incremental model. Here, you state that only records that come after the max date (time_init_utc) in the table should be added on dbt's subsequent executions.

For the main code body—the select statement—you get the raw weather data as well as a transformation of the air temperature data, creating a range variable from the minimum and maximum values per forecast. The incremental model will have all of the original data plus the range of air temperature for each record.

Once saved, this incremental model can be run with the following command:

This command executes all available operations in your dbt project, including the default dbt-generated examples:

Since this is the first dbt execution, the incremental model will collate all available data:

As the source data updates, you can rerun the command to get an updated table with new records added in:

When using dbt for your production use cases, you'll want to schedule your dbt operations and executions rather than manually running dbt run. dbt offers a cloud service called dbt Cloud where you can create and schedule jobs with alerts and logs on the operations.

You can also utilize external scheduler software like Apache Airflow or GitLab CI/CD, or code cron jobs yourself.

‍

Conclusion

Incremental models offer numerous benefits to your data-intensive projects. Using them helps you streamline your data readiness process while reducing compute costs and time. This helps you ensure more effective and efficient downstream operations, such as analytics, dashboarding, reporting, and predictive modeling.

dbt is a crucial tool in helping you work with this model. As you saw, it gives you the ability to execute more complex business logic earlier in your data structure. This will boost your data transformation processes and provide greater flexibility.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.