Perform calculations over sets of rows related to the current row.
aggregate_function(column_name)
: The aggregate function to apply.PARTITION BY grouping_column
: Divides the result set into partitions to which the window function is applied independently.ORDER BY sorting_column
: Defines the logical order of rows within each partition.ROWS or RANGE
: Specifies the frame over which the function operates relative to the current row.WINDOW clause
: Allows you to name and reuse window definitions.Frame bound | Explanation | Example |
---|---|---|
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | Includes all rows in the partition | SUM(value) OVER (PARTITION BY group) |
No bound expression, no ORDER BY | Same as above, includes all rows in the partition | AVG(price) OVER (PARTITION BY category) |
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | From start of partition to current row | SUM(sales) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) |
No bound expression, with ORDER BY | Same as above, from start of partition to current row | SUM(sales) OVER (ORDER BY date) |
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING | From current row to end of partition | MAX(price) OVER (ORDER BY date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) |
BETWEEN 1 PRECEDING AND CURRENT ROW | Sliding frame of current and previous row | AVG(value) OVER (ORDER BY time ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) |
BETWEEN 1 PRECEDING AND 1 FOLLOWING | Sliding frame of 3 rows | MEDIAN(score) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) |
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING | From current row to end of partition | LAST_VALUE(price) OVER (ORDER BY date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) |
ROW_NUMBER()
do not respect the frame and always operate on the entire partition.OVER
clause.
Returns:
A unique integer for each row within its partition. (UInt64
)
Example:
row_number()
assigns a unique rank to each taco based on its price in descending order. Note that even though “Classic Beef” and “Chicken Fiesta” have the same price, they receive different rank numbers.
Example with PARTITION BY:
row_number()
restarts the numbering for each taco category, ranking tacos by price within their respective categories.
rank()
and dense_rank()
, row_number()
always assigns unique, sequential numbers, even for rows with identical values in the ORDER BY
column.column_name
(Any): The column from which to select the first value.RESPECT NULLS
: Includes NULL values in the evaluation.IGNORE NULLS
: Skips NULL values (default behavior).first_value
identifies the most expensive taco (Supreme Taco) based on the descending order of prices.
first_value
is an alias for the any
function in ClickHouse.RESPECT NULLS
, NULL values are considered in determining the first value.RESPECT NULLS
clause.
Syntax:
column_name
: The column from which to retrieve the last value.RESPECT NULLS
: Includes NULL values in the evaluation.IGNORE NULLS
: Skips NULL values (default behavior).last_value
is used to find the cheapest taco from our taco menu. The window frame covers the entire dataset, so the last value (when ordered by price descending) is always the cheapest taco.
x
(Any data type): The column or expression to evaluate.offset
(UInt64
): The nth row to evaluate the current row against (1-based).nth_value
is used to find the third most expensive taco from a taco menu. The function returns NULL for the first two rows because there isn’t a third row yet in the ordered frame. From the third row onward, it returns ‘Gourmet Taco’, which is the taco name in the third row when ordered by price descending.
nth_value
function is particularly useful when you need to compare each row with a specific nth value within a sorted set, such as finding benchmark or threshold values in a dataset.ORDER BY
columns, they receive the same rank, and the next rank is skipped. This results in gaps in the ranking sequence.
For ranking without gaps, use the dense_rank()
function instead.
Syntax:
OVER
clause.
Returns:
A rank value for each row. [UInt64
]
Example:
rank()
function is often used for competitive rankings or when you need to identify distinct positions while preserving information about ties.ORDER BY
clause. If multiple rows have the same values, they receive the same rank, and the next rank is assigned to the next distinct value without gaps.
This function differs from rank()
in that it doesn’t leave gaps in the ranking sequence when there are ties.
Syntax:
OVER
clause.
Returns:
A ranking number for each row within its partition, without gaps. [UInt64
]
Example:
dense_rank()
is often used in scenarios where you want to assign rankings without gaps, such as in competition results or when creating tier systems based on certain criteria.column_name
(String
): The column to calculate the percent rank for.RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
.(rank - 1) / (total_rows - 1)
, where rank
is the row’s rank within the partition and total_rows
is the number of rows in the partition.percent_rank()
calculates the price percentile for each taco. The most expensive taco (Supreme Taco) has a percentile of 0, while the least expensive (Value Taco) has a percentile of 1. The other tacos are distributed between these extremes based on their prices.
lagInFrame
function is similar to the common lag
function found in databases like PostgreSQL or Snowflake, but it respects the window frame. This means it only considers rows within the defined frame, not the entire partition. To achieve equivalent behavior to the traditional lag
function use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
in the OVER
clause.
Syntax:
x
: Column or expression to evaluate.offset
((U)Int*
, optional): Number of rows to look back. Default is 1.default
(optional): Value to return if the offset exceeds the window frame boundaries. Defaults to the column’s default value.x
from the row offset
rows before the current row within the ordered frame.
Example:
This example calculates the day-to-day change in closing price for a taco restaurant’s stock:
close
is the closing price for each day.previous_day_close
uses lagInFrame
to get the previous day’s closing price.price_change
calculates the absolute change in price.percent_change
calculates the percentage change in price.leadInFrame
function is similar to the common lead
function found in databases like PostgreSQL or Snowflake, but it respects the window frame. This means it only considers rows within the defined frame, not the entire partition. To achieve equivalent behavior to the traditional lead
function use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
in the OVER
clause.
Syntax:
x
: Column or expression to evaluate.offset
(UInt*
, optional): Number of rows to look ahead. Default is 1.default
(optional): Value to return if the offset exceeds the window frame. Defaults to the column’s default value.x
evaluated at the row that is offset
rows after the current row within the ordered frame.
Example:
This example uses leadInFrame
to show the next year’s taco sales for each taco type:
next_year_sales
shows the sales for the following year for each taco type.NULL
appears when there’s no next year in the data for that taco type.