ClickHouse function reference
array_concat_agg
Concatenates arrays from all rows into a single array.
Syntax:
Alias:
Arguments:
array
(Array
): The input array to be concatenated.
Returns:
An array containing all elements from all input arrays.
Example:
SELECT
array_concat_agg(taco_toppings) AS all_toppings
FROM
(
SELECT ['salsa', 'cheese'] AS taco_toppings
UNION ALL
SELECT ['lettuce', 'guacamole'] AS taco_toppings
UNION ALL
SELECT ['sour cream'] AS taco_toppings
);
Result:
| all_toppings |
|--------------------------------------------------------|
| ['salsa','cheese','lettuce','guacamole','sour cream'] |
In this example, array_concat_agg
combines all taco_toppings
arrays from different orders into a single array containing all unique toppings.
The function is case-insensitive, so ARRAY_CONCAT_AGG
and array_concat_agg
are equivalent.
This function is useful when you need to collect all elements from multiple arrays into a single array, which can be helpful for analysis or reporting purposes in taco-related data processing.
groupArrayArray
Concatenates arrays from all rows into a single array.
Syntax:
Alias:
Arguments:
arr
(Array
): The input array to be concatenated.
Returns:
An array containing all elements from all input arrays.
Example:
SELECT
groupArrayArray(taco_toppings) AS all_toppings
FROM
(
SELECT ['salsa', 'cheese'] AS taco_toppings
UNION ALL
SELECT ['lettuce', 'guacamole', 'onions'] AS taco_toppings
UNION ALL
SELECT ['cheese', 'sour cream'] AS taco_toppings
) AS taco_orders;
Result:
| all_toppings |
|--------------------------------------------------------|
| ['salsa', 'cheese', 'lettuce', 'guacamole', 'onions'] |
In this example, groupArrayArray
concatenates all taco_toppings
arrays from different orders into a single array containing all toppings used across all orders.
The order of elements in the resulting array is not guaranteed and may depend on the order of data processing.
groupArray
Creates an array of argument values. Values can be added to the array in any (indeterminate) order.
Syntax:
groupArray(x)
groupArray(max_size)(x)
Alias:
Arguments:
x
(any data type): The values to be aggregated into an array.
max_size
(optional): Limits the size of the resulting array to max_size
elements.
Returns:
An array containing all the values of x
.
- NULL values are automatically removed from the result.
- The order of elements in the resulting array is not guaranteed unless used with ORDER BY in specific cases.
- When
max_size
is specified, only the first max_size
elements are included in the result.
Example:
SELECT
taco_type,
groupArray(topping) AS all_toppings
FROM
(
SELECT 'Carnitas' AS taco_type, 'salsa' AS topping
UNION ALL
SELECT 'Carnitas' AS taco_type, 'guacamole' AS topping
UNION ALL
SELECT 'Carnitas' AS taco_type, 'cheese' AS topping
UNION ALL
SELECT 'Al Pastor' AS taco_type, 'pineapple' AS topping
UNION ALL
SELECT 'Al Pastor' AS taco_type, 'onion' AS topping
UNION ALL
SELECT 'Al Pastor' AS taco_type, 'cilantro' AS topping
UNION ALL
SELECT 'Fish' AS taco_type, 'slaw' AS topping
UNION ALL
SELECT 'Fish' AS taco_type, 'lime' AS topping
UNION ALL
SELECT 'Fish' AS taco_type, 'sauce' AS topping
) AS tacos
GROUP BY
taco_type;
Result:
| taco_type | all_toppings |
|-----------|---------------------------------|
| Carnitas | ['salsa','guacamole','cheese'] |
| Al Pastor | ['pineapple','onion','cilantro']|
| Fish | ['slaw','lime','sauce'] |
In this example, groupArray
collects all toppings for each taco type into an array.
Example with max_size:
SELECT
taco_type,
groupArray(2)(topping) AS limited_toppings
FROM
(
SELECT 'Carnitas' AS taco_type, 'salsa' AS topping
UNION ALL
SELECT 'Carnitas' AS taco_type, 'guacamole' AS topping
UNION ALL
SELECT 'Carnitas' AS taco_type, 'cheese' AS topping
UNION ALL
SELECT 'Al Pastor' AS taco_type, 'pineapple' AS topping
UNION ALL
SELECT 'Al Pastor' AS taco_type, 'onion' AS topping
UNION ALL
SELECT 'Al Pastor' AS taco_type, 'cilantro' AS topping
UNION ALL
SELECT 'Fish' AS taco_type, 'slaw' AS topping
UNION ALL
SELECT 'Fish' AS taco_type, 'lime' AS topping
UNION ALL
SELECT 'Fish' AS taco_type, 'sauce' AS topping
) AS taco_orders
GROUP BY
taco_type;
Result:
| taco_type | limited_toppings |
|-----------|-----------------------|
| Carnitas | ['salsa','guacamole'] |
| Al Pastor | ['pineapple','onion'] |
| Fish | ['slaw','lime'] |
Here, groupArray(2)
limits the result to a maximum of 2 toppings per taco type.
This function is useful for collecting multiple values into a single array column, which can be particularly handy for denormalization or preparing data for further array operations.
groupArrayInsertAt
Inserts a value into an array at the specified position.
Syntax:
groupArrayInsertAt(default_x, size)(x, pos)
Arguments:
x
(any data type): The value to be inserted.
pos
(UInt32
): The position at which to insert the value. Array indexing starts at 0.
default_x
(optional): The default value for empty positions. Must match the data type of x
.
size
(UInt32
, optional): The length of the resulting array. If specified, default_x
must also be provided.
Returns:
An array with the inserted values.
Notes:
- If multiple values are inserted at the same position in a single-threaded query, the first value is used.
- In multi-threaded queries, the result for duplicate positions is non-deterministic.
Example:
SELECT
groupArrayInsertAt('salsa')(taco_ingredient, ingredient_position) AS taco_recipe
FROM
(
SELECT 'tortilla' AS taco_ingredient, 0 AS ingredient_position
UNION ALL
SELECT 'beef' AS taco_ingredient, 2 AS ingredient_position
UNION ALL
SELECT 'cheese' AS taco_ingredient, 3 AS ingredient_position
) AS taco_ingredients;
Result:
| taco_recipe |
|---------------------------------------|
| ['tortilla','salsa','beef','cheese'] |
In this example, we’re constructing a taco recipe by inserting ingredients at specific positions. ‘salsa’ is used as the default value for any empty positions.
Example with size parameter:
SELECT
groupArrayInsertAt('empty', 5)(taco_ingredient, ingredient_position) AS fixed_size_recipe
FROM
(
SELECT 'tortilla' AS taco_ingredient, 0 AS ingredient_position
UNION ALL
SELECT 'beans' AS taco_ingredient, 1 AS ingredient_position
UNION ALL
SELECT 'lettuce' AS taco_ingredient, 2 AS ingredient_position
UNION ALL
SELECT 'salsa' AS taco_ingredient, 4 AS ingredient_position
) AS taco_ingredients;
Result:
| fixed_size_recipe |
|------------------------------------------------|
| ['tortilla','beans','lettuce','empty','salsa'] |
This example creates a fixed-size array of 5 elements, filling any unused positions with ‘empty’.
When using the size
parameter, make sure to provide a default_x
value to fill any unspecified positions.
groupArrayIntersect
Returns an intersection of given arrays (all items that are present in all input arrays).
Syntax:
Arguments:
x
(Array
): An array column or expression.
Returns:
An array containing elements that are present in all input arrays. [Array
]
Example:
SELECT
groupArrayIntersect(taco_toppings) AS common_toppings
FROM
(
SELECT ['cheese', 'lettuce', 'salsa'] AS taco_toppings
UNION ALL
SELECT ['cheese', 'beef', 'salsa'] AS taco_toppings
UNION ALL
SELECT ['cheese', 'chicken', 'salsa', 'guacamole'] AS taco_toppings
);
Result:
| common_toppings |
|---------------------|
| ['cheese','salsa'] |
In this example, we find the toppings that are common across different taco orders. The result shows that ‘cheese’ and ‘salsa’ are present in all taco orders.
This function is particularly useful when you need to find common elements across multiple arrays, such as identifying ingredients used in all recipes or features present in all product variants.
groupArrayLast
Creates an array of the last max_size
values of the argument.
Syntax:
groupArrayLast(max_size)(x)
Arguments:
max_size
(UInt64
, optional): The maximum number of elements in the resulting array. If not specified, there’s no limit on array size.
x
: The value to aggregate.
Returns:
An array of the last max_size
values of x
.
Example:
SELECT
groupArrayLast(2)(taco_id) AS last_two_tacos
FROM
(
SELECT 1 AS taco_id
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
UNION ALL SELECT 10
) AS taco_orders;
Result:
| last_two_tacos |
|----------------|
| [9,10] |
In this example, groupArrayLast(2)
returns an array of the last two taco IDs from the taco_orders
table.
- The order of elements in the resulting array depends on the order of data processing, which may be non-deterministic.
- For a guaranteed order, use
ORDER BY
in a subquery, but be aware this may impact performance for large datasets.
groupArrayLast(1)(x)
is equivalent to [anyLast(x)]
.
Comparison with groupArray:
While groupArray
collects elements from the beginning, groupArrayLast
collects from the end:
SELECT
groupArray(2)(taco_id) AS first_two_tacos,
groupArrayLast(2)(taco_id) AS last_two_tacos
FROM
(
SELECT 1 AS taco_id
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
UNION ALL SELECT 10
) AS taco_orders;
Result:
| first_two_tacos | last_two_tacos |
|-----------------|----------------|
| [1,2] | [9,10] |
This function is useful when you need to keep track of the most recent values in a dataset, such as the latest taco orders or the most recent customer interactions.
groupArrayMovingAvg
Calculates the moving average of an array.
Syntax:
groupArrayMovingAvg(numbers_for_summing)
groupArrayMovingAvg(window_size)(numbers_for_summing)
Arguments:
numbers_for_summing
(numeric): An expression resulting in a numeric data type value.
window_size
(UInt64
, optional): Size of the calculation window. If not specified, the function uses a window size equal to the number of rows in the column.
Returns:
An array of the same size and type as the input data.
- The function uses rounding towards zero. It truncates decimal places insignificant for the resulting data type.
Example:
SELECT
groupArrayMovingAvg(sales) AS avg_sales,
groupArrayMovingAvg(revenue) AS avg_revenue,
groupArrayMovingAvg(2)(sales) AS avg_sales_window2,
groupArrayMovingAvg(2)(revenue) AS avg_revenue_window2
FROM
(
SELECT 1 AS day, 100.5 AS sales, 150.75 AS revenue
UNION ALL
SELECT 2 AS day, 120.2 AS sales, 180.30 AS revenue
UNION ALL
SELECT 3 AS day, 95.8 AS sales, 143.70 AS revenue
UNION ALL
SELECT 4 AS day, 110.3 AS sales, 165.45 AS revenue
);
Result:
| avg_sales | avg_revenue | avg_sales_window2 | avg_revenue_window2 |
|-------------------------------------|---------------------------------|---------------------------------|----------------------------------|
| [100.5,110.35,105.5,106.7] | [150.75,165.52,158.25,160.05] | [100.5,110.35,108,103.05] | [150.75,165.52,162,154.57] |
In this example:
avg_sales
and avg_revenue
calculate the moving average using all previous values.
avg_sales_window2
and avg_revenue_window2
calculate the moving average using a window size of 2.
The function is useful for analyzing trends in time series data, such as daily taco sales or revenue figures.
groupArrayMovingSum
Calculates the moving sum of input values.
Syntax:
groupArrayMovingSum(numbers_for_summing)
groupArrayMovingSum(window_size)(numbers_for_summing)
Arguments:
numbers_for_summing
(numeric): An expression resulting in a numeric data type value.
window_size
(numeric, optional): Size of the calculation window. If not specified, the function uses a window size equal to the number of rows in the column.
Returns:
An array of the same size and type as the input data, containing the moving sums.
Example:
Let’s consider a table of taco sales:
SELECT
groupArrayMovingSum(sales) AS total_sales,
groupArrayMovingSum(2)(sales) AS two_day_sales
FROM
(
SELECT '2023-05-01' AS day, 100 AS sales
UNION ALL
SELECT '2023-05-02' AS day, 150 AS sales
UNION ALL
SELECT '2023-05-03' AS day, 200 AS sales
UNION ALL
SELECT '2023-05-04' AS day, 180 AS sales
) AS taco_sales;
Result:
| total_sales | two_day_sales |
|---------------------|---------------------|
| [100,250,450,630] | [100,250,350,380] |
In this example:
total_sales
shows the cumulative sum of taco sales for each day.
two_day_sales
shows the sum of taco sales for the current day and the previous day (2-day moving sum).
This function is useful for analyzing trends and patterns in time-series data, such as daily taco sales or customer traffic in a taco restaurant.
groupArraySample
Creates an array of sample argument values. The size of the resulting array is limited to a specified maximum, with values selected randomly.
Syntax:
groupArraySample(max_size[, seed])(x)
Arguments:
max_size
(UInt64
): Maximum size of the resulting array.
seed
(UInt64
, optional): Seed for the random number generator. Default: 123456.
x
: The argument to sample (column name or expression).
Returns:
An array of randomly selected x
arguments. [Array
]
Example:
SELECT
groupArraySample(3)(taco_filling) AS sample_fillings
FROM
(
SELECT 'beef' AS taco_filling
UNION ALL
SELECT 'chicken' AS taco_filling
UNION ALL
SELECT 'beans' AS taco_filling
UNION ALL
SELECT 'pork' AS taco_filling
UNION ALL
SELECT 'fish' AS taco_filling
) AS taco_menu;
Result:
| sample_fillings |
|----------------------------|
| ['beef','chicken','beans'] |
In this example, groupArraySample
selects three random taco fillings.
You can also use an expression as an argument:
SELECT
groupArraySample(3)(concat('spicy-', taco_filling)) AS sample_spicy_fillings
FROM
(
SELECT 'beef' AS taco_filling
UNION ALL
SELECT 'chicken' AS taco_filling
UNION ALL
SELECT 'beans' AS taco_filling
UNION ALL
SELECT 'pork' AS taco_filling
UNION ALL
SELECT 'fish' AS taco_filling
) AS taco_menu;
Result:
| sample_spicy_fillings |
|--------------------------------------------|
| ['spicy-pork','spicy-fish','spicy-shrimp'] |
This query creates an array of three randomly selected spicy taco fillings.
The function’s randomness can be controlled by specifying a seed value, which is useful for reproducibility in testing scenarios.
groupArraySorted
Returns an array with the first N items in ascending order.
Syntax:
groupArraySorted(N)(column)
Arguments:
N
(UInt8
): The number of elements to return.
column
(Any): The value to be sorted and grouped. Can be of type Integer, String, Float, or other generic types.
Returns:
An array containing the first N items from the column, sorted in ascending order.
Example:
Get the first 5 taco names sorted alphabetically:
SELECT
groupArraySorted(5)(taco_name) AS sorted_tacos
FROM
(
SELECT 'Carnitas' AS taco_name
UNION ALL
SELECT 'Al Pastor' AS taco_name
UNION ALL
SELECT 'Pescado' AS taco_name
UNION ALL
SELECT 'Chorizo' AS taco_name
UNION ALL
SELECT 'Barbacoa' AS taco_name
UNION ALL
SELECT 'Pollo' AS taco_name
) AS taco_menu;
Result:
| sorted_tacos |
|---------------------------------------------------------|
| ['Al Pastor','Barbacoa','Carnitas','Chorizo','Pescado'] |
In this example, groupArraySorted
returns an array of the first 5 taco names, sorted in alphabetical order.
If the number of unique values in the column is less than N, the function will return all available values sorted.
groupUniqArray
Creates an array from unique argument values.
Syntax:
or
groupUniqArray(max_size)(x)
Arguments:
x
(any data type): The input values.
max_size
(UInt64
, optional): The maximum number of elements in the resulting array.
Returns:
An array of unique values.
Example:
SELECT
order_id,
groupUniqArray(taco_type) AS unique_tacos
FROM
(
SELECT 1001 AS order_id, 'al pastor' AS taco_type
UNION ALL
SELECT 1001 AS order_id, 'carnitas' AS taco_type
UNION ALL
SELECT 1001 AS order_id, 'al pastor' AS taco_type
UNION ALL
SELECT 1002 AS order_id, 'pescado' AS taco_type
UNION ALL
SELECT 1002 AS order_id, 'pollo' AS taco_type
UNION ALL
SELECT 1002 AS order_id, 'veg' AS taco_type
UNION ALL
SELECT 1003 AS order_id, 'barbacoa' AS taco_type
UNION ALL
SELECT 1003 AS order_id, 'barbacoa' AS taco_type
) AS taco_orders
GROUP BY
order_id;
Result:
| order_id | unique_tacos |
|----------|-------------------------------|
| 1001 | ['al pastor', 'carnitas'] |
| 1002 | ['pescado', 'pollo', 'veg'] |
| 1003 | ['barbacoa'] |
In this example, groupUniqArray
creates an array of unique taco types for each order.
- Memory consumption is similar to the
uniqExact
function.
- When
max_size
is specified, the resulting array is limited to that number of elements. For example, groupUniqArray(1)(x)
is equivalent to any(x)
.
This function is useful when you need to collect unique values into an array, such as gathering distinct product types per order or unique tags per user.
groupConcat
Concatenates strings from a group into a single string, with optional delimiter and limit.
Syntax:
groupConcat([delimiter [, limit]])(expression)
Arguments:
expression
(String
): The column or expression outputting strings to concatenate.
delimiter
(String
, optional): The string used to separate concatenated values. Defaults to an empty string.
limit
(UInt
, optional): The maximum number of elements to concatenate. Excess elements are ignored.
Returns:
A String
containing the concatenated values.
- If
delimiter
is specified without limit
, it must be the first parameter.
- If both
delimiter
and limit
are specified, delimiter
must precede limit
.
- If the group has no elements or only null elements, the result is a nullable string with a null value.
Example:
SELECT
taco_order_id,
groupConcat(', ')(topping) AS toppings,
groupConcat(', ', 2)(topping) AS limited_toppings
FROM
(
SELECT 1 AS taco_order_id, 'salsa' AS topping
UNION ALL
SELECT 1 AS taco_order_id, 'cheese' AS topping
UNION ALL
SELECT 1 AS taco_order_id, 'onions' AS topping
UNION ALL
SELECT 2 AS taco_order_id, 'guacamole' AS topping
UNION ALL
SELECT 2 AS taco_order_id, 'lettuce' AS topping
UNION ALL
SELECT 3 AS taco_order_id, 'beef' AS topping
UNION ALL
SELECT 3 AS taco_order_id, 'sour cream' AS topping
) AS taco_orders
GROUP BY
taco_order_id;
Result:
| taco_order_id | toppings | limited_toppings |
|---------------|---------------------- |-------------------|
| 1 | salsa, cheese, onions | salsa, cheese |
| 2 | guacamole, lettuce | guacamole, lettuce|
| 3 | beef, sour cream | beef, sour cream |
In this example:
toppings
shows all toppings for each order, separated by commas.
limited_toppings
shows only the first two toppings for each order.
This function is useful for creating comma-separated lists of values from grouped rows, such as listing taco toppings for each order.
largestTriangleThreeBuckets
Applies the Largest-Triangle-Three-Buckets algorithm to downsample time series data for visualization. This function is particularly useful for reducing the number of points in a large dataset while preserving its visual characteristics.
Syntax:
largestTriangleThreeBuckets(n)(x, y)
Alias:
Arguments:
x
(Integer
, Float
, Decimal
, Date
, Date32
, DateTime
, DateTime64
): The x-coordinate (typically a timestamp or sequential value).
y
(Integer
, Float
, Decimal
, Date
, Date32
, DateTime
, DateTime64
): The y-coordinate (typically the measured value).
Parameters:
n
(UInt64
): The number of points to return in the downsampled series.
Returns:
An Array
of Tuple
s, where each tuple contains two elements representing the x and y coordinates of a point in the downsampled series.
- The function sorts the input data by the x-coordinate before applying the algorithm.
- NaN values in the input series are ignored and excluded from the analysis.
Example:
SELECT
largestTriangleThreeBuckets(3)(order_time, total_sales) AS downsampled_sales
FROM
(
SELECT
toDateTime('2023-05-05 10:00:00') AS order_time,
150 AS total_sales
UNION ALL
SELECT
toDateTime('2023-05-05 12:30:00') AS order_time,
450 AS total_sales
UNION ALL
SELECT
toDateTime('2023-05-05 18:00:00') AS order_time,
800 AS total_sales
UNION ALL
SELECT
toDateTime('2023-05-05 23:59:59') AS order_time,
200 AS total_sales
) AS taco_sales
WHERE
toDate(order_time) = '2023-05-05';
Result:
| downsampled_sales |
|---------------------------------------------------------------------------------------|
| [('2023-05-05 10:00:00',150),('2023-05-05 18:00:00',800),('2023-05-05 23:59:59',200)] |
In this example, we downsample a day’s worth of taco sales data into 3 representative points. This can be useful for creating a quick visual summary of sales trends throughout the day without plotting every single data point.
The algorithm divides the sorted series into buckets (in this case, 3) and finds the most representative point in each bucket, allowing for efficient visualization of large datasets while maintaining the overall shape of the data.