Array grouping functions

ClickHouse function reference

array_concat_agg

Concatenates arrays from all rows into a single array. Syntax:

array_concat_agg(array)

Alias:

groupArrayArray

Arguments:

array (Array): The input array to be concatenated.

Returns: An array containing all elements from all input arrays. Example:

SELECT
	array_concat_agg(taco_toppings) AS all_toppings
FROM
	(
		SELECT ['salsa', 'cheese'] AS taco_toppings
		UNION ALL
		SELECT ['lettuce', 'guacamole'] AS taco_toppings
		UNION ALL
		SELECT ['sour cream'] AS taco_toppings
	);

Result:

| all_toppings                                           |
|--------------------------------------------------------|
| ['salsa','cheese','lettuce','guacamole','sour cream']  |

In this example, array_concat_agg combines all taco_toppings arrays from different orders into a single array containing all unique toppings.

The function is case-insensitive, so ARRAY_CONCAT_AGG and array_concat_agg are equivalent.

This function is useful when you need to collect all elements from multiple arrays into a single array, which can be helpful for analysis or reporting purposes in taco-related data processing.

groupArrayArray

Concatenates arrays from all rows into a single array. Syntax:

groupArrayArray(arr)

Alias:

array_concat_agg

Arguments:

arr (Array): The input array to be concatenated.

Returns: An array containing all elements from all input arrays. Example:

SELECT
	groupArrayArray(taco_toppings) AS all_toppings
FROM
	(
		SELECT ['salsa', 'cheese'] AS taco_toppings
		UNION ALL
		SELECT ['lettuce', 'guacamole', 'onions'] AS taco_toppings
		UNION ALL
		SELECT ['cheese', 'sour cream'] AS taco_toppings
	) AS taco_orders;

Result:

| all_toppings                                           |
|--------------------------------------------------------|
| ['salsa', 'cheese', 'lettuce', 'guacamole', 'onions']  |

In this example, groupArrayArray concatenates all taco_toppings arrays from different orders into a single array containing all toppings used across all orders.

The order of elements in the resulting array is not guaranteed and may depend on the order of data processing.

groupArray

Creates an array of argument values. Values can be added to the array in any (indeterminate) order. Syntax:

groupArray(x)
groupArray(max_size)(x)

Alias:

array_agg

Arguments:

x (any data type): The values to be aggregated into an array.
max_size (optional): Limits the size of the resulting array to max_size elements.

Returns: An array containing all the values of x.

NULL values are automatically removed from the result.
The order of elements in the resulting array is not guaranteed unless used with ORDER BY in specific cases.
When max_size is specified, only the first max_size elements are included in the result.

Example:

SELECT
	taco_type,
	groupArray(topping) AS all_toppings
FROM
	(
		SELECT 'Carnitas' AS taco_type, 'salsa' AS topping
		UNION ALL
		SELECT 'Carnitas' AS taco_type, 'guacamole' AS topping
		UNION ALL
		SELECT 'Carnitas' AS taco_type, 'cheese' AS topping
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 'pineapple' AS topping
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 'onion' AS topping
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 'cilantro' AS topping
		UNION ALL
		SELECT 'Fish' AS taco_type, 'slaw' AS topping
		UNION ALL
		SELECT 'Fish' AS taco_type, 'lime' AS topping
		UNION ALL
		SELECT 'Fish' AS taco_type, 'sauce' AS topping
	) AS tacos
GROUP BY
	taco_type;

Result:

| taco_type | all_toppings                    |
|-----------|---------------------------------|
| Carnitas  | ['salsa','guacamole','cheese']  |
| Al Pastor | ['pineapple','onion','cilantro']|
| Fish      | ['slaw','lime','sauce']         |

In this example, groupArray collects all toppings for each taco type into an array. Example with max_size:

SELECT
	taco_type,
	groupArray(2)(topping) AS limited_toppings
FROM
	(
		SELECT 'Carnitas' AS taco_type, 'salsa' AS topping
		UNION ALL
		SELECT 'Carnitas' AS taco_type, 'guacamole' AS topping
		UNION ALL
		SELECT 'Carnitas' AS taco_type, 'cheese' AS topping
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 'pineapple' AS topping
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 'onion' AS topping
		UNION ALL
		SELECT 'Al Pastor' AS taco_type, 'cilantro' AS topping
		UNION ALL
		SELECT 'Fish' AS taco_type, 'slaw' AS topping
		UNION ALL
		SELECT 'Fish' AS taco_type, 'lime' AS topping
		UNION ALL
		SELECT 'Fish' AS taco_type, 'sauce' AS topping
	) AS taco_orders
GROUP BY
	taco_type;

Result:

| taco_type | limited_toppings      |
|-----------|-----------------------|
| Carnitas  | ['salsa','guacamole'] |
| Al Pastor | ['pineapple','onion'] |
| Fish      | ['slaw','lime']       |

Here, groupArray(2) limits the result to a maximum of 2 toppings per taco type. This function is useful for collecting multiple values into a single array column, which can be particularly handy for denormalization or preparing data for further array operations.

groupArrayInsertAt

Inserts a value into an array at the specified position. Syntax:

groupArrayInsertAt(default_x, size)(x, pos)

Arguments:

x (any data type): The value to be inserted.
pos (UInt32): The position at which to insert the value. Array indexing starts at 0.
default_x (optional): The default value for empty positions. Must match the data type of x.
size (UInt32, optional): The length of the resulting array. If specified, default_x must also be provided.

Returns: An array with the inserted values. Notes:

If multiple values are inserted at the same position in a single-threaded query, the first value is used.
In multi-threaded queries, the result for duplicate positions is non-deterministic.

Example:

SELECT
	groupArrayInsertAt('salsa')(taco_ingredient, ingredient_position) AS taco_recipe
FROM
	(
		SELECT 'tortilla' AS taco_ingredient, 0 AS ingredient_position
		UNION ALL
		SELECT 'beef' AS taco_ingredient, 2 AS ingredient_position
		UNION ALL
		SELECT 'cheese' AS taco_ingredient, 3 AS ingredient_position
	) AS taco_ingredients;

Result:

| taco_recipe                           |
|---------------------------------------|
| ['tortilla','salsa','beef','cheese']  |

In this example, we’re constructing a taco recipe by inserting ingredients at specific positions. ‘salsa’ is used as the default value for any empty positions. Example with size parameter:

SELECT
	groupArrayInsertAt('empty', 5)(taco_ingredient, ingredient_position) AS fixed_size_recipe
FROM
	(
		SELECT 'tortilla' AS taco_ingredient, 0 AS ingredient_position
		UNION ALL
		SELECT 'beans' AS taco_ingredient, 1 AS ingredient_position
		UNION ALL
		SELECT 'lettuce' AS taco_ingredient, 2 AS ingredient_position
		UNION ALL
		SELECT 'salsa' AS taco_ingredient, 4 AS ingredient_position
	) AS taco_ingredients;

Result:

| fixed_size_recipe                              |
|------------------------------------------------|
| ['tortilla','beans','lettuce','empty','salsa'] |

This example creates a fixed-size array of 5 elements, filling any unused positions with ‘empty’.

When using the size parameter, make sure to provide a default_x value to fill any unspecified positions.

groupArrayIntersect

Returns an intersection of given arrays (all items that are present in all input arrays). Syntax:

groupArrayIntersect(x)

Arguments:

x (Array): An array column or expression.

Returns: An array containing elements that are present in all input arrays. [Array] Example:

SELECT
	groupArrayIntersect(taco_toppings) AS common_toppings
FROM
	(
		SELECT ['cheese', 'lettuce', 'salsa'] AS taco_toppings
		UNION ALL
		SELECT ['cheese', 'beef', 'salsa'] AS taco_toppings
		UNION ALL
		SELECT ['cheese', 'chicken', 'salsa', 'guacamole'] AS taco_toppings
	);

Result:

| common_toppings     |
|---------------------|
| ['cheese','salsa']  |

In this example, we find the toppings that are common across different taco orders. The result shows that ‘cheese’ and ‘salsa’ are present in all taco orders.

This function is particularly useful when you need to find common elements across multiple arrays, such as identifying ingredients used in all recipes or features present in all product variants.

groupArrayLast

Creates an array of the last max_size values of the argument. Syntax:

groupArrayLast(max_size)(x)

Arguments:

max_size (UInt64, optional): The maximum number of elements in the resulting array. If not specified, there’s no limit on array size.
x: The value to aggregate.

Returns: An array of the last max_size values of x. Example:

SELECT
	groupArrayLast(2)(taco_id) AS last_two_tacos
FROM
	(
		SELECT 1 AS taco_id
		UNION ALL SELECT 2
		UNION ALL SELECT 3
		UNION ALL SELECT 4
		UNION ALL SELECT 5
		UNION ALL SELECT 6
		UNION ALL SELECT 7
		UNION ALL SELECT 8
		UNION ALL SELECT 9
		UNION ALL SELECT 10
	) AS taco_orders;

Result:

| last_two_tacos |
|----------------|
| [9,10]         |

In this example, groupArrayLast(2) returns an array of the last two taco IDs from the taco_orders table.

The order of elements in the resulting array depends on the order of data processing, which may be non-deterministic.
For a guaranteed order, use ORDER BY in a subquery, but be aware this may impact performance for large datasets.
groupArrayLast(1)(x) is equivalent to [anyLast(x)].

Comparison with groupArray: While groupArray collects elements from the beginning, groupArrayLast collects from the end:

SELECT
	groupArray(2)(taco_id) AS first_two_tacos,
	groupArrayLast(2)(taco_id) AS last_two_tacos
FROM
	(
		SELECT 1 AS taco_id
		UNION ALL SELECT 2
		UNION ALL SELECT 3
		UNION ALL SELECT 4
		UNION ALL SELECT 5
		UNION ALL SELECT 6
		UNION ALL SELECT 7
		UNION ALL SELECT 8
		UNION ALL SELECT 9
		UNION ALL SELECT 10
	) AS taco_orders;

Result:

| first_two_tacos | last_two_tacos |
|-----------------|----------------|
| [1,2]           | [9,10]         |

This function is useful when you need to keep track of the most recent values in a dataset, such as the latest taco orders or the most recent customer interactions.

groupArrayMovingAvg

Calculates the moving average of an array. Syntax:

groupArrayMovingAvg(numbers_for_summing)
groupArrayMovingAvg(window_size)(numbers_for_summing)

Arguments:

numbers_for_summing (numeric): An expression resulting in a numeric data type value.
window_size (UInt64, optional): Size of the calculation window. If not specified, the function uses a window size equal to the number of rows in the column.

Returns: An array of the same size and type as the input data.

The function uses rounding towards zero. It truncates decimal places insignificant for the resulting data type.

Example:

SELECT
  groupArrayMovingAvg(sales) AS avg_sales,
  groupArrayMovingAvg(revenue) AS avg_revenue,
  groupArrayMovingAvg(2)(sales) AS avg_sales_window2,
  groupArrayMovingAvg(2)(revenue) AS avg_revenue_window2
FROM
(
  SELECT 1 AS day, 100.5 AS sales, 150.75 AS revenue
  UNION ALL
  SELECT 2 AS day, 120.2 AS sales, 180.30 AS revenue
  UNION ALL
  SELECT 3 AS day, 95.8 AS sales, 143.70 AS revenue
  UNION ALL
  SELECT 4 AS day, 110.3 AS sales, 165.45 AS revenue
);

Result:

| avg_sales                           | avg_revenue                     | avg_sales_window2               | avg_revenue_window2              |
|-------------------------------------|---------------------------------|---------------------------------|----------------------------------|
| [100.5,110.35,105.5,106.7]          | [150.75,165.52,158.25,160.05]   | [100.5,110.35,108,103.05]       | [150.75,165.52,162,154.57]       |

In this example:

avg_sales and avg_revenue calculate the moving average using all previous values.
avg_sales_window2 and avg_revenue_window2 calculate the moving average using a window size of 2.

The function is useful for analyzing trends in time series data, such as daily taco sales or revenue figures.

groupArrayMovingSum

Calculates the moving sum of input values. Syntax:

groupArrayMovingSum(numbers_for_summing)
groupArrayMovingSum(window_size)(numbers_for_summing)

Arguments:

numbers_for_summing (numeric): An expression resulting in a numeric data type value.
window_size (numeric, optional): Size of the calculation window. If not specified, the function uses a window size equal to the number of rows in the column.

Returns: An array of the same size and type as the input data, containing the moving sums. Example: Let’s consider a table of taco sales:

SELECT
    groupArrayMovingSum(sales) AS total_sales,
    groupArrayMovingSum(2)(sales) AS two_day_sales
FROM
(
    SELECT '2023-05-01' AS day, 100 AS sales
    UNION ALL
    SELECT '2023-05-02' AS day, 150 AS sales
    UNION ALL
    SELECT '2023-05-03' AS day, 200 AS sales
    UNION ALL
    SELECT '2023-05-04' AS day, 180 AS sales
) AS taco_sales;

Result:

| total_sales         | two_day_sales       |
|---------------------|---------------------|
| [100,250,450,630]   | [100,250,350,380]   |

In this example:

total_sales shows the cumulative sum of taco sales for each day.
two_day_sales shows the sum of taco sales for the current day and the previous day (2-day moving sum).

This function is useful for analyzing trends and patterns in time-series data, such as daily taco sales or customer traffic in a taco restaurant.

groupArraySample

Creates an array of sample argument values. The size of the resulting array is limited to a specified maximum, with values selected randomly. Syntax:

groupArraySample(max_size[, seed])(x)

Arguments:

max_size (UInt64): Maximum size of the resulting array.
seed (UInt64, optional): Seed for the random number generator. Default: 123456.
x: The argument to sample (column name or expression).

Returns: An array of randomly selected x arguments. [Array] Example:

SELECT
	groupArraySample(3)(taco_filling) AS sample_fillings
FROM
	(
		SELECT 'beef' AS taco_filling
		UNION ALL
		SELECT 'chicken' AS taco_filling
		UNION ALL
		SELECT 'beans' AS taco_filling
		UNION ALL
		SELECT 'pork' AS taco_filling
		UNION ALL
		SELECT 'fish' AS taco_filling
	) AS taco_menu;

Result:

| sample_fillings            |
|----------------------------|
| ['beef','chicken','beans'] |

In this example, groupArraySample selects three random taco fillings. You can also use an expression as an argument:

SELECT
	groupArraySample(3)(concat('spicy-', taco_filling)) AS sample_spicy_fillings
FROM
	(
		SELECT 'beef' AS taco_filling
		UNION ALL
		SELECT 'chicken' AS taco_filling
		UNION ALL
		SELECT 'beans' AS taco_filling
		UNION ALL
		SELECT 'pork' AS taco_filling
		UNION ALL
		SELECT 'fish' AS taco_filling
	) AS taco_menu;

Result:

| sample_spicy_fillings                      |
|--------------------------------------------|
| ['spicy-pork','spicy-fish','spicy-shrimp'] |

This query creates an array of three randomly selected spicy taco fillings.

The function’s randomness can be controlled by specifying a seed value, which is useful for reproducibility in testing scenarios.

groupArraySorted

Returns an array with the first N items in ascending order. Syntax:

groupArraySorted(N)(column)

Arguments:

N (UInt8): The number of elements to return.
column (Any): The value to be sorted and grouped. Can be of type Integer, String, Float, or other generic types.

Returns: An array containing the first N items from the column, sorted in ascending order. Example: Get the first 5 taco names sorted alphabetically:

SELECT
	groupArraySorted(5)(taco_name) AS sorted_tacos
FROM
	(
		SELECT 'Carnitas' AS taco_name
		UNION ALL
		SELECT 'Al Pastor' AS taco_name
		UNION ALL
		SELECT 'Pescado' AS taco_name
		UNION ALL
		SELECT 'Chorizo' AS taco_name
		UNION ALL
		SELECT 'Barbacoa' AS taco_name
		UNION ALL
		SELECT 'Pollo' AS taco_name
	) AS taco_menu;

Result:

| sorted_tacos                                            |
|---------------------------------------------------------|
| ['Al Pastor','Barbacoa','Carnitas','Chorizo','Pescado'] |

In this example, groupArraySorted returns an array of the first 5 taco names, sorted in alphabetical order.

If the number of unique values in the column is less than N, the function will return all available values sorted.

groupUniqArray

Creates an array from unique argument values. Syntax:

groupUniqArray(x)

groupUniqArray(max_size)(x)

Arguments:

x (any data type): The input values.
max_size (UInt64, optional): The maximum number of elements in the resulting array.

Returns: An array of unique values. Example:

SELECT
	order_id,
	groupUniqArray(taco_type) AS unique_tacos
FROM
	(
		SELECT 1001 AS order_id, 'al pastor' AS taco_type
		UNION ALL
		SELECT 1001 AS order_id, 'carnitas' AS taco_type
		UNION ALL
		SELECT 1001 AS order_id, 'al pastor' AS taco_type
		UNION ALL
		SELECT 1002 AS order_id, 'pescado' AS taco_type
		UNION ALL
		SELECT 1002 AS order_id, 'pollo' AS taco_type
		UNION ALL
		SELECT 1002 AS order_id, 'veg' AS taco_type
		UNION ALL
		SELECT 1003 AS order_id, 'barbacoa' AS taco_type
		UNION ALL
		SELECT 1003 AS order_id, 'barbacoa' AS taco_type
	) AS taco_orders
GROUP BY
	order_id;

Result:

| order_id | unique_tacos                  |
|----------|-------------------------------|
| 1001     | ['al pastor', 'carnitas']     |
| 1002     | ['pescado', 'pollo', 'veg']   |
| 1003     | ['barbacoa']                  |

In this example, groupUniqArray creates an array of unique taco types for each order.

Memory consumption is similar to the uniqExact function.
When max_size is specified, the resulting array is limited to that number of elements. For example, groupUniqArray(1)(x) is equivalent to any(x).

This function is useful when you need to collect unique values into an array, such as gathering distinct product types per order or unique tags per user.

groupConcat

Concatenates strings from a group into a single string, with optional delimiter and limit. Syntax:

groupConcat([delimiter [, limit]])(expression)

Arguments:

expression (String): The column or expression outputting strings to concatenate.
delimiter (String, optional): The string used to separate concatenated values. Defaults to an empty string.
limit (UInt, optional): The maximum number of elements to concatenate. Excess elements are ignored.

Returns: A String containing the concatenated values.

If delimiter is specified without limit, it must be the first parameter.
If both delimiter and limit are specified, delimiter must precede limit.
If the group has no elements or only null elements, the result is a nullable string with a null value.

Example:

SELECT
	taco_order_id,
	groupConcat(', ')(topping) AS toppings,
	groupConcat(', ', 2)(topping) AS limited_toppings
FROM
	(
		SELECT 1 AS taco_order_id, 'salsa' AS topping
		UNION ALL
		SELECT 1 AS taco_order_id, 'cheese' AS topping
		UNION ALL
		SELECT 1 AS taco_order_id, 'onions' AS topping
		UNION ALL
		SELECT 2 AS taco_order_id, 'guacamole' AS topping
		UNION ALL
		SELECT 2 AS taco_order_id, 'lettuce' AS topping
		UNION ALL
		SELECT 3 AS taco_order_id, 'beef' AS topping
		UNION ALL
		SELECT 3 AS taco_order_id, 'sour cream' AS topping
	) AS taco_orders
GROUP BY
	taco_order_id;

Result:

| taco_order_id | toppings              | limited_toppings  |
|---------------|---------------------- |-------------------|
| 1             | salsa, cheese, onions | salsa, cheese     |
| 2             | guacamole, lettuce    | guacamole, lettuce|
| 3             | beef, sour cream      | beef, sour cream  |

In this example:

toppings shows all toppings for each order, separated by commas.
limited_toppings shows only the first two toppings for each order.

This function is useful for creating comma-separated lists of values from grouped rows, such as listing taco toppings for each order.

largestTriangleThreeBuckets

Applies the Largest-Triangle-Three-Buckets algorithm to downsample time series data for visualization. This function is particularly useful for reducing the number of points in a large dataset while preserving its visual characteristics. Syntax:

largestTriangleThreeBuckets(n)(x, y)

Alias:

lttb

Arguments:

x (Integer, Float, Decimal, Date, Date32, DateTime, DateTime64): The x-coordinate (typically a timestamp or sequential value).
y (Integer, Float, Decimal, Date, Date32, DateTime, DateTime64): The y-coordinate (typically the measured value).

Parameters:

n (UInt64): The number of points to return in the downsampled series.

Returns: An Array of Tuples, where each tuple contains two elements representing the x and y coordinates of a point in the downsampled series.

The function sorts the input data by the x-coordinate before applying the algorithm.
NaN values in the input series are ignored and excluded from the analysis.

Example:

SELECT
  largestTriangleThreeBuckets(3)(order_time, total_sales) AS downsampled_sales
FROM
(
  SELECT
      toDateTime('2023-05-05 10:00:00') AS order_time,
      150 AS total_sales
  UNION ALL
  SELECT
      toDateTime('2023-05-05 12:30:00') AS order_time,
      450 AS total_sales
  UNION ALL
  SELECT
      toDateTime('2023-05-05 18:00:00') AS order_time,
      800 AS total_sales
  UNION ALL
  SELECT
      toDateTime('2023-05-05 23:59:59') AS order_time,
      200 AS total_sales
) AS taco_sales
WHERE
  toDate(order_time) = '2023-05-05';

Result:

| downsampled_sales                                                                     |
|---------------------------------------------------------------------------------------|
| [('2023-05-05 10:00:00',150),('2023-05-05 18:00:00',800),('2023-05-05 23:59:59',200)] |

In this example, we downsample a day’s worth of taco sales data into 3 representative points. This can be useful for creating a quick visual summary of sales trends throughout the day without plotting every single data point. The algorithm divides the sorted series into buckets (in this case, 3) and finds the most representative point in each bucket, allowing for efficient visualization of large datasets while maintaining the overall shape of the data.

Introduction

Statements

Array grouping functions

ClickHouse function reference

array_concat_agg

groupArrayArray

groupArray

groupArrayInsertAt

groupArrayIntersect

groupArrayLast

groupArrayMovingAvg

groupArrayMovingSum

groupArraySample

groupArraySorted

groupUniqArray

groupConcat

largestTriangleThreeBuckets

Introduction

Statements

​ClickHouse function reference

​array_concat_agg

​groupArrayArray

​groupArray

​groupArrayInsertAt

​groupArrayIntersect

​groupArrayLast

​groupArrayMovingAvg

​groupArrayMovingSum

​groupArraySample

​groupArraySorted

​groupUniqArray

​groupConcat

​largestTriangleThreeBuckets

ClickHouse function reference

array_concat_agg

groupArrayArray

groupArray

groupArrayInsertAt

groupArrayIntersect

groupArrayLast

groupArrayMovingAvg

groupArrayMovingSum

groupArraySample

groupArraySorted

groupUniqArray

groupConcat

largestTriangleThreeBuckets