Transform data with Materialized Views.
In ClickHouse, Materialized Views are the primary way to transform data. They are a persisted query result that is automatically updated when the underlying data changes.
You can use them to reshape, filter, or enrich data from one or more source Data Pools into a new Data Pool. For example, you could create a Materialized View that:
The transformed data in the destination Data Pool is automatically kept up-to-date as the underlying source data changes.
Illustration of data flowing from a source Data Pool, through a Materialized View, and into a destination Data Pool.
Materialized Views in ClickHouse work by automatically executing a specified SQL query over new data inserted into a source Data Pool and writing the query results into a destination Data Pool.
When creating a Materialized View, you define a SELECT query that transforms or aggregates data from one or more source Data Pools. You also define a destination Data Pool where the resulting data will be written.
Diagram of (1) an insert to a source Data Pool, (2) the insert triggering a Materialized View, (3) the Materialized View executing its SQL query on the newly inserted data, and (4) the Materialized View's query results being written to the destination Data Pool.
Whenever new rows are inserted into the source Data Pools, Propel automatically triggers the Materialized View SQL query over just the new data and writes the results to the destination Data Pool. This allows incrementally updating the destination data Pool without re-computing the entire query from scratch.
Setting the Materialized View’s destination to a SummingMergeTree or AggregatingMergeTree Data Pool enables efficient incremental updates and storage of aggregations.
Materialized Views can be chained, with one Materialized View reading from the destination Data Pool of another, enabling multi-stage data transformation pipelines.
This section provides step-by-step instructions on creating a Materialized View in the Console, the API, and Terraform.
To start, go to the “Materialized Views” section of the Console, then click on “Create new Materialized View”.
First, you need to enter the SQL query that will define the transformation. Once you have the query ready, click “Continue”.
For this example, we are going to create a new Data Pool, so select “New Data Pool” and give it a name.
For this example, we are going to use the “Append-only data” settings. Answer the questions to generate the table settings. Select the “timestamp” column on the first question and click “Continue”.
Here, you will see your recommended table settings. Click “Continue”.
To learn more, see our How to select a table engine and sorting key guide.
Next, decide whether you want to backfill the existing data in the source Data Pool to the destination Data Pool. In most cases, you’d want to backfill. Propel takes care of this process for you.
Lastly, give your Materialized View a name and description.
You’ll notice the new Data Pool is created with the new schema and data.
Click on the “Preview Data” tab to see your transformed records.
To start, go to the “Materialized Views” section of the Console, then click on “Create new Materialized View”.
First, you need to enter the SQL query that will define the transformation. Once you have the query ready, click “Continue”.
For this example, we are going to create a new Data Pool, so select “New Data Pool” and give it a name.
For this example, we are going to use the “Append-only data” settings. Answer the questions to generate the table settings. Select the “timestamp” column on the first question and click “Continue”.
Here, you will see your recommended table settings. Click “Continue”.
To learn more, see our How to select a table engine and sorting key guide.
Next, decide whether you want to backfill the existing data in the source Data Pool to the destination Data Pool. In most cases, you’d want to backfill. Propel takes care of this process for you.
Lastly, give your Materialized View a name and description.
You’ll notice the new Data Pool is created with the new schema and data.
Click on the “Preview Data” tab to see your transformed records.
In this section, we will provide examples of common use cases solved with Materialized Views.
For all the examples, we’ll use a source Data Pool called events
with two columns:
_propel_received_at
(TIMESTAMP)_propel_payload
(JSON)To replicate the examples, create a Webhook Data Pool with just the _propel_received_at
and _propel_payload
columns. Then click on the Data Pool, click on “Schema” tab and paste the event below to create sample records.
The JSON events in the _propel_payload
column are of the form:
For the API examples, you can copy and paste them to the API Playground.
The following Materialized View flattens the JSON into individual columns. In Propel, you can access nested JSON keys by using dot notation, as shown in the example below. We are also using the parseDateTimeBestEffort
function to parse the timestamp from a string to ClickHouse timestamp.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
The following Materialized View flattens a JSON array into rows.
Given a table TacoOrders
with the following schema:
With the following data:
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | orderDate |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | orderDate |
Given an additional table stores
with two columns,
The Materialized view below performs a JOIN to enrich the event with the store name.
Materialized Views trigger off the left-most table of the join which is considered the source Data Pool. The Materialized View will pull values from right-side tables in the join but will not trigger if those tables change.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
This Materialized View calculates the total price multiplying the taco_count
times the price
column.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
The Materialized View below incrementally aggregates the number of tacos sold and sales by customer_id
and month
. This Materialized View uses the SummingMergeTree table engine to incrementally aggregate rows as they are written. To learn more, read our guide on How to select a table engine and sorting key.
Destination Data Pool | |
---|---|
Table Engine | SummingMergeTree |
Sorting Key | month |
Destination Data Pool | |
---|---|
Table Engine | SummingMergeTree |
Sorting Key | month |
The Materialized View below creates a destination Data Pool with a different sorting key. It sorts the rows by the checkout_time
column instead of the _propel_received_at
column of the source Data Pool.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | checkout_time |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | checkout_time |
The Materialized View below filters out rows older than 2024.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
The Materialized View below flattens and deduplicates events. It uses the ReplacingMergeTree table engine to duplicate events with the same sorting key. To learn more, read our guide on How to select a table engine and sorting key.
Destination Data Pool | |
---|---|
Table Engine | ReplacingMergeTree |
Sorting Key | created_at , order_ids |
Destination Data Pool | |
---|---|
Table Engine | ReplacingMergeTree |
Sorting Key | created_at , order_ids |
What is the difference between materialized views and views?
In ClickHouse, a view is a virtual table based on the result set of a SELECT statement. It is used to simplify complex queries by breaking them up into manageable parts. A view always shows up-to-date data—the query is run every time the view is referenced in a query.
On the other hand, a Materialized View is a persisted version of a SELECT query’s result set, which is automatically updated when the data underlying the query changes.
Do Materialized Views transform data in real-time or on a schedule?
Materialized Views in ClickHouse transform data in real-time. Whenever new data is inserted into the source Data Pool, the Materialized View is automatically triggered to transform the new data and write the results to the destination Data Pool.
How much do Materialized Views cost?
Materialized Views do not have a cost per se, but they incur data write costs just like any other Data Pool. Similarly, the destination Data Pools consume storage just like any other Data Pool.
What happens if I delete a Materialized View?
If you delete a Materialized View in Propel, new data will stop being inserted into the destination Data Pool. The destination Data Pool associated with it will not be automatically deleted.
Can a Materialized View be modified?
In ClickHouse, Materialized Views cannot be directly modified. If you need to change the fields or the query, you would need to create a new Materialized View.
What happens if I update or delete data in the source Data Pool with the update or delete API?
Data deleted or updated with the Batch update or delete API will not trigger the Materialized View and will not be propagated to the destination Data Pool.
Transform data with Materialized Views.
In ClickHouse, Materialized Views are the primary way to transform data. They are a persisted query result that is automatically updated when the underlying data changes.
You can use them to reshape, filter, or enrich data from one or more source Data Pools into a new Data Pool. For example, you could create a Materialized View that:
The transformed data in the destination Data Pool is automatically kept up-to-date as the underlying source data changes.
Illustration of data flowing from a source Data Pool, through a Materialized View, and into a destination Data Pool.
Materialized Views in ClickHouse work by automatically executing a specified SQL query over new data inserted into a source Data Pool and writing the query results into a destination Data Pool.
When creating a Materialized View, you define a SELECT query that transforms or aggregates data from one or more source Data Pools. You also define a destination Data Pool where the resulting data will be written.
Diagram of (1) an insert to a source Data Pool, (2) the insert triggering a Materialized View, (3) the Materialized View executing its SQL query on the newly inserted data, and (4) the Materialized View's query results being written to the destination Data Pool.
Whenever new rows are inserted into the source Data Pools, Propel automatically triggers the Materialized View SQL query over just the new data and writes the results to the destination Data Pool. This allows incrementally updating the destination data Pool without re-computing the entire query from scratch.
Setting the Materialized View’s destination to a SummingMergeTree or AggregatingMergeTree Data Pool enables efficient incremental updates and storage of aggregations.
Materialized Views can be chained, with one Materialized View reading from the destination Data Pool of another, enabling multi-stage data transformation pipelines.
This section provides step-by-step instructions on creating a Materialized View in the Console, the API, and Terraform.
To start, go to the “Materialized Views” section of the Console, then click on “Create new Materialized View”.
First, you need to enter the SQL query that will define the transformation. Once you have the query ready, click “Continue”.
For this example, we are going to create a new Data Pool, so select “New Data Pool” and give it a name.
For this example, we are going to use the “Append-only data” settings. Answer the questions to generate the table settings. Select the “timestamp” column on the first question and click “Continue”.
Here, you will see your recommended table settings. Click “Continue”.
To learn more, see our How to select a table engine and sorting key guide.
Next, decide whether you want to backfill the existing data in the source Data Pool to the destination Data Pool. In most cases, you’d want to backfill. Propel takes care of this process for you.
Lastly, give your Materialized View a name and description.
You’ll notice the new Data Pool is created with the new schema and data.
Click on the “Preview Data” tab to see your transformed records.
To start, go to the “Materialized Views” section of the Console, then click on “Create new Materialized View”.
First, you need to enter the SQL query that will define the transformation. Once you have the query ready, click “Continue”.
For this example, we are going to create a new Data Pool, so select “New Data Pool” and give it a name.
For this example, we are going to use the “Append-only data” settings. Answer the questions to generate the table settings. Select the “timestamp” column on the first question and click “Continue”.
Here, you will see your recommended table settings. Click “Continue”.
To learn more, see our How to select a table engine and sorting key guide.
Next, decide whether you want to backfill the existing data in the source Data Pool to the destination Data Pool. In most cases, you’d want to backfill. Propel takes care of this process for you.
Lastly, give your Materialized View a name and description.
You’ll notice the new Data Pool is created with the new schema and data.
Click on the “Preview Data” tab to see your transformed records.
In this section, we will provide examples of common use cases solved with Materialized Views.
For all the examples, we’ll use a source Data Pool called events
with two columns:
_propel_received_at
(TIMESTAMP)_propel_payload
(JSON)To replicate the examples, create a Webhook Data Pool with just the _propel_received_at
and _propel_payload
columns. Then click on the Data Pool, click on “Schema” tab and paste the event below to create sample records.
The JSON events in the _propel_payload
column are of the form:
For the API examples, you can copy and paste them to the API Playground.
The following Materialized View flattens the JSON into individual columns. In Propel, you can access nested JSON keys by using dot notation, as shown in the example below. We are also using the parseDateTimeBestEffort
function to parse the timestamp from a string to ClickHouse timestamp.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
The following Materialized View flattens a JSON array into rows.
Given a table TacoOrders
with the following schema:
With the following data:
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | orderDate |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | orderDate |
Given an additional table stores
with two columns,
The Materialized view below performs a JOIN to enrich the event with the store name.
Materialized Views trigger off the left-most table of the join which is considered the source Data Pool. The Materialized View will pull values from right-side tables in the join but will not trigger if those tables change.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
This Materialized View calculates the total price multiplying the taco_count
times the price
column.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
The Materialized View below incrementally aggregates the number of tacos sold and sales by customer_id
and month
. This Materialized View uses the SummingMergeTree table engine to incrementally aggregate rows as they are written. To learn more, read our guide on How to select a table engine and sorting key.
Destination Data Pool | |
---|---|
Table Engine | SummingMergeTree |
Sorting Key | month |
Destination Data Pool | |
---|---|
Table Engine | SummingMergeTree |
Sorting Key | month |
The Materialized View below creates a destination Data Pool with a different sorting key. It sorts the rows by the checkout_time
column instead of the _propel_received_at
column of the source Data Pool.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | checkout_time |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | checkout_time |
The Materialized View below filters out rows older than 2024.
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
Destination Data Pool | |
---|---|
Table Engine | MergeTree |
Sorting Key | created_at |
The Materialized View below flattens and deduplicates events. It uses the ReplacingMergeTree table engine to duplicate events with the same sorting key. To learn more, read our guide on How to select a table engine and sorting key.
Destination Data Pool | |
---|---|
Table Engine | ReplacingMergeTree |
Sorting Key | created_at , order_ids |
Destination Data Pool | |
---|---|
Table Engine | ReplacingMergeTree |
Sorting Key | created_at , order_ids |
What is the difference between materialized views and views?
In ClickHouse, a view is a virtual table based on the result set of a SELECT statement. It is used to simplify complex queries by breaking them up into manageable parts. A view always shows up-to-date data—the query is run every time the view is referenced in a query.
On the other hand, a Materialized View is a persisted version of a SELECT query’s result set, which is automatically updated when the data underlying the query changes.
Do Materialized Views transform data in real-time or on a schedule?
Materialized Views in ClickHouse transform data in real-time. Whenever new data is inserted into the source Data Pool, the Materialized View is automatically triggered to transform the new data and write the results to the destination Data Pool.
How much do Materialized Views cost?
Materialized Views do not have a cost per se, but they incur data write costs just like any other Data Pool. Similarly, the destination Data Pools consume storage just like any other Data Pool.
What happens if I delete a Materialized View?
If you delete a Materialized View in Propel, new data will stop being inserted into the destination Data Pool. The destination Data Pool associated with it will not be automatically deleted.
Can a Materialized View be modified?
In ClickHouse, Materialized Views cannot be directly modified. If you need to change the fields or the query, you would need to create a new Materialized View.
What happens if I update or delete data in the source Data Pool with the update or delete API?
Data deleted or updated with the Batch update or delete API will not trigger the Materialized View and will not be propagated to the destination Data Pool.