Aggregate function combinators
-If
The-If suffix can be appended to the name of any aggregate function. This creates a conditional aggregate function that processes only the rows that meet a specified condition.
Syntax:
column(any): The column to aggregate.condition(UInt8): A boolean expression that determines which rows to include in the aggregation.
total_tacoscounts all tacos.spicy_tacoscounts only the tacos whereis_spicyis true (1).
Conditional aggregate functions allow you to calculate aggregates for multiple conditions simultaneously without using subqueries or JOINs, which can be more efficient.
-Array
The -Array suffix can be appended to any aggregate function. This modifies the function to work with array arguments instead of scalar values. Syntax:arr(Array): An array of values to aggregate.
sumArraycalculates the total number of tacos by summing all elements in the array.uniqArraycounts the number of unique toppings across all arrays.
When using -Array with multiple argument functions, all arguments must be arrays of equal length.
uniqArrayIf(arr, cond).
This combinator is particularly useful when working with nested data structures or when you need to perform aggregations on array columns without using arrayJoin.
-Map
Appends the-Map suffix to an aggregate function, allowing it to operate on Map type arguments. This combinator aggregates values for each key in the map separately using the specified aggregate function.
Syntax:
map_column(Map): A column ofMaptype.
Map containing the results of applying the aggregate function to each key’s values.
Example:
sumMap(sales)calculates the total sales for each taco type within each hour.maxMap(sales)finds the maximum sales for each taco type within each hour.
-Map combinator is particularly useful for aggregating data stored in Map columns without the need to unnest the map structure.
-SimpleState
Converts an aggregate function to return a SimpleAggregateFunction type instead of an AggregateFunction type. This is useful when working with AggregatingMergeTree tables. Syntax:x: Parameters of the aggregate function.
sumSimpleState(taco_price)returns the sum of the taco prices as a SimpleAggregateFunction.- The
toTypeNamefunction shows the exact type of the returned value.
-State
Returns an intermediate state of the aggregation rather than the final result. This allows you to compute partial results of an aggregation, which can then be combined or further processed later.- Useful when working with the
AggregatingMergeTreetable engine and functions likefinalizeAggregationandrunningAccumulate. - Can be combined with other combinators, but must come last (e.g.,
uniqIfState).
x: Arguments of the aggregate function.
AggregateFunction(...).
Example:
uniqState returns the binary representation of the intermediate state for calculating unique taco types, which can be used for further aggregation or stored in an AggregatingMergeTree table.
The
-State combinator is particularly useful when you need to perform multi-stage aggregations or store partial aggregation results for later processing.-Merge
Returns the final aggregated result from anAggregateFunction(...) type.
Syntax:
state(AggregateFunction): The intermediate aggregation state, typically of typeAggregateFunction.
- We create a table
taco_order_statesto store intermediate states of taco order sums. - We insert two separate states for the same date.
- Using
sumMerge, we combine these states to get the total number of tacos ordered on that date.
-Merge combinator is particularly useful when working with pre-aggregated data or when performing distributed aggregations across multiple nodes.
The
-Merge combinator can be applied to any aggregate function that supports intermediate states, not just sum.-MergeState
Merges the intermediate aggregation states in the same way as the-Merge combinator. However, it does not return the resulting value, but an intermediate aggregation state, similar to the -State combinator.
Syntax:
state(AggregateFunction): The intermediate state of an aggregate function.
AggregateFunction.
Example:
avgStateis used to create intermediate states of average taco prices for each taco type.avgMergeStatethen merges these states, producing a new intermediate state that represents the combined data.
The result of
MergeState is not the final aggregated value, but rather an intermediate state that can be further processed or stored. To obtain the final result, you would typically use the -Merge combinator or the finalizeAggregation function.-ForEach
Converts an aggregate function for tables into an aggregate function for arrays that aggregates the corresponding array items and returns an array of results. Syntax:array(Array(T)): An array of values to aggregate.
- We have three orders with counts for different taco types (beef, chicken, veggie).
sumForEachadds up the counts for each taco type across all orders.- The result
[10, 9, 5]represents the total number of beef, chicken, and veggie tacos ordered, respectively.
The -ForEach combinator can be used with various aggregate functions like
sum, avg, max, min, etc., allowing you to perform array-wise aggregations efficiently.-Distinct
Applies theDISTINCT modifier to an aggregate function, causing it to only consider unique values in its calculations.
Syntax:
column(any): The column or expression to aggregate.
countDistinctcounts the number of unique taco types ordered.avgDistinctcalculates the average price considering only unique prices.
sum, avg, count, etc.
Using
Distinct can significantly impact query performance, especially on large datasets, as it requires additional memory and processing to track unique values.-OrDefault
Changes the behavior of an aggregate function when there are no input values to aggregate. Syntax:x: Parameters of the aggregate function.
- If there are values to aggregate: The result of the aggregate function.
- If there are no values to aggregate: The default value for the aggregate function’s return type.
avg(taco_price)returnsnan(Not a Number)avgOrDefault(taco_price)returns0(the default value for numeric types)
-OrDefault with the -If combinator. It returns the default value (0.00) because there are no spicy tacos in the subquery result.
-OrNull
Changes the behavior of an aggregate function by converting its result to the Nullable data type. If the aggregate function has no values to calculate, it returnsNULL.
The -OrNull combinator can be used with other combinators.
Syntax:
x: Aggregate function parameters.
- The result of the aggregate function, converted to the Nullable data type.
NULL, if there is nothing to aggregate.
sumOrNull returns NULL instead of throwing an error or returning 0.
The -OrNull combinator can also be used with other combinators:
NULL because the condition is_spicy is not met for any rows.
The
-OrNull combinator is particularly useful when you want to distinguish between “no data” (NULL) and “zero result” (e.g., 0 for sum, empty string for groupArray) in your aggregate calculations.-Resample
Resamples data into intervals defined by a start, end, and step, and then applies the aggregate function to each interval. Syntax:start(numeric): The starting value of the interval for the resampling key values. It marks the beginning of the range over which the data will be aggregatedend(numeric): The ending value of the interval. It sets the limit up to which the data will be considered for aggregationstep(numeric): The size of each subinterval within the specified range. The aggregate function is executed independently over each of these subintervals.resampling_key(Column): This is the column whose values are used to separate the data into intervals. It acts as the basis for dividing the data into the specified groupsaggFunction_params(any):aggFunctionparameters.
aggFunction results for each subinterval.
Example:
taco_groupsshows the tacos grouped by price range.sales_by_price_rangeshows the total sales for each price range.
-Resample combinator allows for flexible data analysis by price intervals without the need for complex subqueries or joins.
-ArgMin
Process only the rows that have the minimum value of an additional specified expression. Syntax:arg(any type): Argument to return.val(any type): Value to compare.
arg corresponding to the minimum value of val.
Example:
When combined with the sum function, it becomes sumArgMin which sums up the values in a column for the rows that have the minimum value in another specified column.
-ArgMax
Process only the rows that have the maximum value of an additional specified expression. Syntax:arg(any type): Argument to return.val(any type): Value to compare.
arg corresponding to the maximum value of val.
Example:
When combined with the sum function, it becomes sumArgMax which sums up the values in a column for the rows that have the maximum value in another specified column.