arrayJoin function is unique in its behavior compared to other SQL functions in ClickHouse.
Purpose and behavior
- Transforms a single row with an array into multiple rows
- Each new row contains one element from the original array
- Other column values are duplicated for each new row
Basic usage
Example 1: Unpack an array into separate rows
-
Array creation:
['salsa', 'guacamole', 'cheese'] AS toppingscreates an array and aliases it astoppings. -
arrayJoinfunction:arrayJoin(toppings) AS toppingtakes thetoppingsarray and “unpacks” it. This is the key operation that transforms a single row with an array into multiple rows, each containing one element from the array. -
Additional columns:
'Taco'is a literal string value that will be repeated in each resulting row.toppingsis the original array, which will also be repeated in each row.
arrayJoin can be used to expand array data into individual rows, which is useful for various data analysis tasks, such as:
- Analyzing individual array elements
- Creating more granular data from aggregated array information
- Preparing data for joins or further processing that requires non-array formats
Example 2: Count orders with specific toppings
-
Subquery:
This creates a single row with an array column named
toppingscontaining three elements. -
arrayJoin(toppings): This function unpacks thetoppingsarray into separate rows. So now we have: -
WHEREclause:This filters the unpacked rows to include only ‘salsa’ and ‘guacamole’, excluding ‘cheese’. -
SELECT sum(1) AS taco_orders: This counts the number of rows that pass the WHERE condition.sum(1)is a common way to count rows in ClickHouse.
arrayJoin can be used to transform array data into individual rows, which can then be filtered and counted.
Example 3: Combine toppings and sauces
arrayJoin functions together to create combinations of taco toppings and sauces. Here’s what it does:
- It starts with two arrays:
taco_toppingsandsauces. - The
arrayJoinfunction is applied to both arrays, creating a row for each combination of topping and sauce. - The
sum(1)aggregation counts the occurrences of each combination. - The result is a table showing all possible combinations of toppings and sauces, with each combination appearing once (
taco_orders = 1).
Using multiple
arrayJoin functions with the same expression may not produce the expected results due to optimizations. In such cases, consider modifying the repeated array expression with extra operations that do not affect the join result, such as arrayJoin(arraySort(arr)) or arrayJoin(arrayConcat(arr, [])).Example 4: Create a Cartesian product of ingredients
arrayJoin function in ClickHouse. Let’s break it down:
-
The query starts with a subquery that creates an array of taco ingredients:
-
The main query then uses
arrayJointwice to create all possible pairs of ingredients: -
The first
arrayJoincreates a row for each ingredient in thefirst_ingredientcolumn. -
The second
arrayJoinis slightly modified:This modification is crucial. If we usedarrayJoin(taco_ingredients)directly, ClickHouse might optimize it away, resulting in an unexpected output. By usingarrayConcat(taco_ingredients, []), we force ClickHouse to re-evaluate the array for each row, ensuring we get all combinations. - The result is a Cartesian product of the ingredients with themselves. Each ingredient is paired with every other ingredient (including itself), resulting in 6 x 6 = 36 rows.
Example 5: Taco orders with fillings and sauces
Example 6: Alternative syntax using arrayZip
ARRAY JOIN and arrayZip with arrayJoin to combine multiple arrays. The query counts taco orders with specific combinations of fillings and sauces.
In Example 5, we use the ARRAY JOIN syntax to combine the fillings and sauces arrays. This creates a row for each corresponding pair of filling and sauce.
In Example 6, we achieve the same result using arrayZip to combine the arrays into tuples, then use arrayJoin to expand these tuples into individual rows.
Both methods produce the same result: a count of taco orders for each unique combination of filling and sauce.