Partition key

In this guide, you’ll learn how to select the optimal partition key for your table in Propel’s Serverless ClickHouse.

If your table is smaller than 10 GB, you don’t need to customize the partition key and can skip this guide. 🥳

Understanding the partition key

ClickHouse can easily handle tables with hundreds of millions of rows; however, for tables that exceed 10 GB in storage, it’s recommended to define an optional partition key that will organize the table’s rows into a few logical groups, or partitions. ClickHouse can then operate on the smaller partitions individually, instead of the entire table, speeding up data access and manipulation.

The number of partitions should not be too many, since accessing too many partitions in a single query can hurt performance or fail the query entirely. Ideally, INSERT and SELECT queries should access one or a few partitions at a time. For this reason, partition keys should be low cardinality, resulting in dozens up to hundreds of partitions, not thousands. For example, typical partition keys will partition by month, by date, or event type.

Diagram of a ClickHouse table. The table contains partitions, and each partition contains parts.

How to choose the right partition key for your table

In the following section, we will provide you with guidelines and considerations to keep in mind when selecting the most suitable partition key for your table.

First, let’s begin with some general guidelines that apply regardless of the data size and table engine:

A SELECT query should not touch more than a few dozen partitions.
The number of partitions in a table should be in the dozens or hundreds, not thousands.
A single INSERT should deliver data to one or a few partitions.

Small tables (smaller than 10 GB)

Partitioning is usually unnecessary for tables smaller than 10 gigabytes. Having no partition is better than having the wrong one.

So, if your table has 5 GB of data, for example, you can leave it empty and stop reading 🥳.

Append-only data in MergeTree tables

For larger MergeTree tables (greater than 10 GB), it’s crucial to select an appropriate partition key. As a general guideline, a single partition for MergeTree tables should be no more than 300 GB.

The following recommendations vary depending on the type of data: time series or incremental data.

Time series data

For time series data, you should choose a partition key based on your primary timestamp and data volume. Most of the time monthly partitioning is sufficient, and so Propel recommends this as a default, but if monthly data would exceed 300 GB, you should use weekly partitioning; etc.

Given a timestamp column, you can configure these partitioning schemes as follows:

Monthly: toYYYYMM(timestamp)

mutation {
  createDataPoolV2(
    input: {
      uniqueName: "events2"
      description: "Data Pool for events"
      timestamp: { columnName: "timestamp" }
      columns: [
        { columnName: "timestamp", type: TIMESTAMP, isNullable: true }
        { columnName: "customer_id", type: TIMESTAMP, isNullable: true }
        { columnName: "workspace_id", type: STRING, isNullable: true }
        { columnName: "event_type", type: STRING, isNullable: true }
        { columnName: "payload", type: JSON, isNullable: true }
      ]
      tableSettings: {
        engine: {
          mergeTree: {
            type: MERGE_TREE
          }
        }
        partitionBy: ["toYYYYMM(timestamp)"]
      }
    }
  ) {
    dataPool {
      id
      uniqueName
      description
      tableSettings {
        orderBy
      }
    }
  }
}

resource "propel_data_pool" "events2" {
  unique_name = "events2"
  description = "Data Pool for events"
  timestamp   = "timestamp"

  column {
    name     = "timestamp"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "customer_id"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "workspace_id"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "event_type"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "payload"
    type     = "JSON"
    nullable = true
  }

  table_settings {
    engine {
      merge_tree {
        type = "MERGE_TREE"
      }
    }
    partition_by = ["toYYYYMM(timestamp)"]
  }
}

Weekly: toStartOfWeek(timestamp)

mutation {
  createDataPoolV2(
    input: {
      uniqueName: "events2"
      description: "Data Pool for events"
      timestamp: { columnName: "timestamp" }
      columns: [
        { columnName: "timestamp", type: TIMESTAMP, isNullable: true }
        { columnName: "customer_id", type: TIMESTAMP, isNullable: true }
        { columnName: "workspace_id", type: STRING, isNullable: true }
        { columnName: "event_type", type: STRING, isNullable: true }
        { columnName: "payload", type: JSON, isNullable: true }
      ]
      tableSettings: {
        engine: {
          mergeTree: {
            type: MERGE_TREE
          }
        }
        partitionBy: ["toStartOfWeek(timestamp)"]
      }
    }
  ) {
    dataPool {
      id
      uniqueName
      description
      tableSettings {
        orderBy
      }
    }
  }
}

resource "propel_data_pool" "events2" {
  unique_name = "events2"
  description = "Data Pool for events"
  timestamp   = "timestamp"

  column {
    name     = "timestamp"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "customer_id"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "workspace_id"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "event_type"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "payload"
    type     = "JSON"
    nullable = true
  }

  table_settings {
    engine {
      merge_tree {
        type = "MERGE_TREE"
      }
    }
    partition_by = ["toStartOfWeek(timestamp)"]
  }
}

Daily: toDate(timestamp)

mutation {
  createDataPoolV2(
    input: {
      uniqueName: "events2"
      description: "Data Pool for events"
      timestamp: { columnName: "timestamp" }
      columns: [
        { columnName: "timestamp", type: TIMESTAMP, isNullable: true }
        { columnName: "customer_id", type: TIMESTAMP, isNullable: true }
        { columnName: "workspace_id", type: STRING, isNullable: true }
        { columnName: "event_type", type: STRING, isNullable: true }
        { columnName: "payload", type: JSON, isNullable: true }
      ]
      tableSettings: {
        engine: {
          mergeTree: {
            type: MERGE_TREE
          }
        }
        partitionBy: ["toDate(timestamp)"]
      }
    }
  ) {
    dataPool {
      id
      uniqueName
      description
      tableSettings {
        orderBy
      }
    }
  }
}

resource "propel_data_pool" "events2" {
  unique_name = "events2"
  description = "Data Pool for events"
  timestamp   = "timestamp"

  column {
    name     = "timestamp"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "customer_id"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "workspace_id"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "event_type"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "payload"
    type     = "JSON"
    nullable = true
  }

  table_settings {
    engine {
      merge_tree {
        type = "MERGE_TREE"
      }
    }
    partition_by = ["toDate(timestamp)"]
  }
}

Hourly: toStartOfHour(timestamp)

  mutation {
  createDataPoolV2(
    input: {
      uniqueName: "events2"
      description: "Data Pool for events"
      timestamp: { columnName: "timestamp" }
      columns: [
        { columnName: "timestamp", type: TIMESTAMP, isNullable: true }
        { columnName: "customer_id", type: TIMESTAMP, isNullable: true }
        { columnName: "workspace_id", type: STRING, isNullable: true }
        { columnName: "event_type", type: STRING, isNullable: true }
        { columnName: "payload", type: JSON, isNullable: true }
      ]
      tableSettings: {
        engine: {
          mergeTree: {
            type: MERGE_TREE
          }
        }
        partitionBy: ["toStartOfHour(timestamp)"]
      }
    }
  ) {
    dataPool {
      id
      uniqueName
      description
      tableSettings {
        orderBy
      }
    }
  }
}

resource "propel_data_pool" "events2" {
  unique_name = "events2"
  description = "Data Pool for events"
  timestamp   = "timestamp"

  column {
    name     = "timestamp"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "customer_id"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "workspace_id"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "event_type"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "payload"
    type     = "JSON"
    nullable = true
  }

  table_settings {
    engine {
      merge_tree {
        type = "MERGE_TREE"
      }
    }
    partition_by = ["toStartOfHour(timestamp)"]
  }
}

Incremental data

For sequential records with an increasing numeric column, you can divide the numeric column by a large number to assign it a partition. The size of the divisor should depend on how quickly the numeric column increases.

Given an increasing numeric column order_id, setting the partition key to intDiv(order_id, 1000000) will generate the following partitions:

Partition 1: order_id values 0–1,000,000
Partition 2: order_id values 1,000,000–2,000,000
etc.

mutation {
  createDataPoolV2(
    input: {
      uniqueName: "events2"
      description: "Data Pool for events"
      timestamp: { columnName: "timestamp" }
      columns: [
        { columnName: "timestamp", type: TIMESTAMP, isNullable: true }
        { columnName: "customer_id", type: TIMESTAMP, isNullable: true }
        { columnName: "workspace_id", type: STRING, isNullable: true }
        { columnName: "event_type", type: STRING, isNullable: true }
        { columnName: "payload", type: JSON, isNullable: true }
      ]
      tableSettings: {
        engine: {
          mergeTree: {
            type: MERGE_TREE
          }
        }
        partitionBy: ["intDiv(order_id, 1000000)"]
      }
    }
  ) {
    dataPool {
      id
      uniqueName
      description
      tableSettings {
        orderBy
      }
    }
  }
}

resource "propel_data_pool" "events2" {
  unique_name = "events2"
  description = "Data Pool for events"
  timestamp   = "timestamp"

  column {
    name     = "timestamp"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "customer_id"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "workspace_id"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "event_type"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "payload"
    type     = "JSON"
    nullable = true
  }

  table_settings {
    engine {
      merge_tree {
        type = "MERGE_TREE"
      }
    }
    partition_by = ["intDiv(order_id, 1000000)"]
  }
}

Mutable data in ReplacingMergeTree tables

For larger ReplacingMergeTree tables, we recommend partitions to be a bit smaller, up to 50 GB.

For instance, if you have a large “users” dimension table (greater than 10 GB) and queries always filter on a user_id UUID column, the recommended partition key would be xxHash64(user_id) % 10. This will generate up to 10 partitions, with user_ids consistently distributed throughout.

mutation {
  createDataPoolV2(
    input: {
      uniqueName: "events2"
      description: "Data Pool for events"
      timestamp: { columnName: "timestamp" }
      columns: [
        { columnName: "timestamp", type: TIMESTAMP, isNullable: true }
        { columnName: "customer_id", type: TIMESTAMP, isNullable: true }
        { columnName: "workspace_id", type: STRING, isNullable: true }
        { columnName: "event_type", type: STRING, isNullable: true }
        { columnName: "payload", type: JSON, isNullable: true }
      ]
      tableSettings: {
        engine: {
          mergeTree: {
            type: MERGE_TREE
          }
        }
        partitionBy: ["xxHash64(user_id) % 10"]
      }
    }
  ) {
    dataPool {
      id
      uniqueName
      description
      tableSettings {
        orderBy
      }
    }
  }
}

resource "propel_data_pool" "events2" {
  unique_name = "events2"
  description = "Data Pool for events"
  timestamp   = "timestamp"

  column {
    name     = "timestamp"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "customer_id"
    type     = "TIMESTAMP"
    nullable = true
  }
  column {
    name     = "workspace_id"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "event_type"
    type     = "STRING"
    nullable = true
  }
  column {
    name     = "payload"
    type     = "JSON"
    nullable = true
  }

  table_settings {
    engine {
      merge_tree {
        type = "MERGE_TREE"
      }
    }
    partition_by = ["xxHash64(user_id) % 10"]
  }
}

Aggregations in Summing- or Aggregating-MergeTree tables

For SummingMergeTree and AggregatingMergeTree tables, we recommend partitions to be up to 50 GB; however, because these tables are often small, due to aggregation, you rarely need to partition.

If your table aggregates by time, follow the instructions for time series data above. If your table aggregates by, for example, user_id, follow the instructions for mutable data above.

Frequently asked questions

Can you change the partition key of a table?

What is a part in ClickHouse?

Are parts the same as partitions?

Get Started

Manage

Identity & Access

Resources

Enterprise

Understanding the partition key

How to choose the right partition key for your table

Small tables (smaller than 10 GB)

Append-only data in MergeTree tables

Time series data

Incremental data

Mutable data in ReplacingMergeTree tables

Aggregations in Summing- or Aggregating-MergeTree tables

Frequently asked questions

Get Started

Manage

Identity & Access

Resources

Enterprise

​Understanding the partition key

​How to choose the right partition key for your table

​Small tables (smaller than 10 GB)

​Append-only data in MergeTree tables

​Time series data

​Incremental data

​Mutable data in ReplacingMergeTree tables

​Aggregations in Summing- or Aggregating-MergeTree tables

​Frequently asked questions

Understanding the partition key

How to choose the right partition key for your table

Small tables (smaller than 10 GB)

Append-only data in MergeTree tables

Time series data

Incremental data

Mutable data in ReplacingMergeTree tables

Aggregations in Summing- or Aggregating-MergeTree tables

Frequently asked questions