Aggregate function combinators
-If
The-If
suffix can be appended to the name of any aggregate function. This creates a conditional aggregate function that processes only the rows that meet a specified condition.
Syntax:
column
(any): The column to aggregate.condition
(UInt8
): A boolean expression that determines which rows to include in the aggregation.
total_tacos
counts all tacos.spicy_tacos
counts only the tacos whereis_spicy
is true (1).
Conditional aggregate functions allow you to calculate aggregates for multiple conditions simultaneously without using subqueries or JOINs, which can be more efficient.
-Array
The -Array suffix can be appended to any aggregate function. This modifies the function to work with array arguments instead of scalar values. Syntax:arr
(Array
): An array of values to aggregate.
sumArray
calculates the total number of tacos by summing all elements in the array.uniqArray
counts the number of unique toppings across all arrays.
When using -Array with multiple argument functions, all arguments must be arrays of equal length.
uniqArrayIf(arr, cond)
.
This combinator is particularly useful when working with nested data structures or when you need to perform aggregations on array columns without using arrayJoin
.
-Map
Appends the-Map
suffix to an aggregate function, allowing it to operate on Map
type arguments. This combinator aggregates values for each key in the map separately using the specified aggregate function.
Syntax:
map_column
(Map
): A column ofMap
type.
Map
containing the results of applying the aggregate function to each key’s values.
Example:
sumMap(sales)
calculates the total sales for each taco type within each hour.maxMap(sales)
finds the maximum sales for each taco type within each hour.
-Map
combinator is particularly useful for aggregating data stored in Map
columns without the need to unnest the map structure.
-SimpleState
Converts an aggregate function to return a SimpleAggregateFunction type instead of an AggregateFunction type. This is useful when working with AggregatingMergeTree tables. Syntax:x
: Parameters of the aggregate function.
sumSimpleState(taco_price)
returns the sum of the taco prices as a SimpleAggregateFunction.- The
toTypeName
function shows the exact type of the returned value.
-State
Returns an intermediate state of the aggregation rather than the final result. This allows you to compute partial results of an aggregation, which can then be combined or further processed later.- Useful when working with the
AggregatingMergeTree
table engine and functions likefinalizeAggregation
andrunningAccumulate
. - Can be combined with other combinators, but must come last (e.g.,
uniqIfState
).
x
: Arguments of the aggregate function.
AggregateFunction(...)
.
Example:
uniqState
returns the binary representation of the intermediate state for calculating unique taco types, which can be used for further aggregation or stored in an AggregatingMergeTree
table.
The
-State
combinator is particularly useful when you need to perform multi-stage aggregations or store partial aggregation results for later processing.-Merge
Returns the final aggregated result from anAggregateFunction(...)
type.
Syntax:
state
(AggregateFunction
): The intermediate aggregation state, typically of typeAggregateFunction
.
- We create a table
taco_order_states
to store intermediate states of taco order sums. - We insert two separate states for the same date.
- Using
sumMerge
, we combine these states to get the total number of tacos ordered on that date.
-Merge
combinator is particularly useful when working with pre-aggregated data or when performing distributed aggregations across multiple nodes.
The
-Merge
combinator can be applied to any aggregate function that supports intermediate states, not just sum
.-MergeState
Merges the intermediate aggregation states in the same way as the-Merge
combinator. However, it does not return the resulting value, but an intermediate aggregation state, similar to the -State
combinator.
Syntax:
state
(AggregateFunction
): The intermediate state of an aggregate function.
AggregateFunction
.
Example:
avgState
is used to create intermediate states of average taco prices for each taco type.avgMergeState
then merges these states, producing a new intermediate state that represents the combined data.
The result of
MergeState
is not the final aggregated value, but rather an intermediate state that can be further processed or stored. To obtain the final result, you would typically use the -Merge
combinator or the finalizeAggregation
function.-ForEach
Converts an aggregate function for tables into an aggregate function for arrays that aggregates the corresponding array items and returns an array of results. Syntax:array
(Array(T)
): An array of values to aggregate.
- We have three orders with counts for different taco types (beef, chicken, veggie).
sumForEach
adds up the counts for each taco type across all orders.- The result
[10, 9, 5]
represents the total number of beef, chicken, and veggie tacos ordered, respectively.
The -ForEach combinator can be used with various aggregate functions like
sum
, avg
, max
, min
, etc., allowing you to perform array-wise aggregations efficiently.-Distinct
Applies theDISTINCT
modifier to an aggregate function, causing it to only consider unique values in its calculations.
Syntax:
column
(any): The column or expression to aggregate.
countDistinct
counts the number of unique taco types ordered.avgDistinct
calculates the average price considering only unique prices.
sum
, avg
, count
, etc.
Using
Distinct
can significantly impact query performance, especially on large datasets, as it requires additional memory and processing to track unique values.-OrDefault
Changes the behavior of an aggregate function when there are no input values to aggregate. Syntax:x
: Parameters of the aggregate function.
- If there are values to aggregate: The result of the aggregate function.
- If there are no values to aggregate: The default value for the aggregate function’s return type.
avg(taco_price)
returnsnan
(Not a Number)avgOrDefault(taco_price)
returns0
(the default value for numeric types)
-OrDefault
with the -If
combinator. It returns the default value (0.00) because there are no spicy tacos in the subquery result.
-OrNull
Changes the behavior of an aggregate function by converting its result to the Nullable data type. If the aggregate function has no values to calculate, it returnsNULL
.
The -OrNull
combinator can be used with other combinators.
Syntax:
x
: Aggregate function parameters.
- The result of the aggregate function, converted to the Nullable data type.
NULL
, if there is nothing to aggregate.
sumOrNull
returns NULL
instead of throwing an error or returning 0.
The -OrNull
combinator can also be used with other combinators:
NULL
because the condition is_spicy
is not met for any rows.
The
-OrNull
combinator is particularly useful when you want to distinguish between “no data” (NULL
) and “zero result” (e.g., 0 for sum
, empty string for groupArray
) in your aggregate calculations.-Resample
Resamples data into intervals defined by a start, end, and step, and then applies the aggregate function to each interval. Syntax:start
(numeric): The starting value of the interval for the resampling key values. It marks the beginning of the range over which the data will be aggregatedend
(numeric): The ending value of the interval. It sets the limit up to which the data will be considered for aggregationstep
(numeric): The size of each subinterval within the specified range. The aggregate function is executed independently over each of these subintervals.resampling_key
(Column
): This is the column whose values are used to separate the data into intervals. It acts as the basis for dividing the data into the specified groupsaggFunction_params
(any):aggFunction
parameters.
aggFunction
results for each subinterval.
Example:
taco_groups
shows the tacos grouped by price range.sales_by_price_range
shows the total sales for each price range.
-Resample
combinator allows for flexible data analysis by price intervals without the need for complex subqueries or joins.
-ArgMin
Process only the rows that have the minimum value of an additional specified expression. Syntax:arg
(any type): Argument to return.val
(any type): Value to compare.
arg
corresponding to the minimum value of val
.
Example:
When combined with the sum function, it becomes sumArgMin which sums up the values in a column for the rows that have the minimum value in another specified column.
-ArgMax
Process only the rows that have the maximum value of an additional specified expression. Syntax:arg
(any type): Argument to return.val
(any type): Value to compare.
arg
corresponding to the maximum value of val
.
Example:
When combined with the sum function, it becomes sumArgMax which sums up the values in a column for the rows that have the maximum value in another specified column.