Unpack arrays into separate rows.
arrayJoin
function is unique in its behavior compared to other SQL functions in ClickHouse.
['salsa', 'guacamole', 'cheese'] AS toppings
creates an array and aliases it as toppings
.
arrayJoin
function:
arrayJoin(toppings) AS topping
takes the toppings
array and “unpacks” it. This is the key operation that transforms a single row with an array into multiple rows, each containing one element from the array.
'Taco'
is a literal string value that will be repeated in each resulting row.toppings
is the original array, which will also be repeated in each row.arrayJoin
can be used to expand array data into individual rows, which is useful for various data analysis tasks, such as:
toppings
containing three elements.
arrayJoin(toppings)
:
This function unpacks the toppings
array into separate rows. So now we have:
WHERE
clause:
SELECT sum(1) AS taco_orders
:
This counts the number of rows that pass the WHERE condition. sum(1)
is a common way to count rows in ClickHouse.
arrayJoin
can be used to transform array data into individual rows, which can then be filtered and counted.
arrayJoin
functions together to create combinations of taco toppings and sauces. Here’s what it does:
taco_toppings
and sauces
.arrayJoin
function is applied to both arrays, creating a row for each combination of topping and sauce.sum(1)
aggregation counts the occurrences of each combination.taco_orders = 1
).arrayJoin
functions with the same expression may not produce the expected results due to optimizations. In such cases, consider modifying the repeated array expression with extra operations that do not affect the join result, such as arrayJoin(arraySort(arr))
or arrayJoin(arrayConcat(arr, []))
.arrayJoin
function in ClickHouse. Let’s break it down:
arrayJoin
twice to create all possible pairs of ingredients:
arrayJoin
creates a row for each ingredient in the first_ingredient
column.
arrayJoin
is slightly modified:
arrayJoin(taco_ingredients)
directly, ClickHouse might optimize it away, resulting in an unexpected output. By using arrayConcat(taco_ingredients, [])
, we force ClickHouse to re-evaluate the array for each row, ensuring we get all combinations.
ARRAY JOIN
and arrayZip
with arrayJoin
to combine multiple arrays. The query counts taco orders with specific combinations of fillings and sauces.
In Example 5, we use the ARRAY JOIN
syntax to combine the fillings
and sauces
arrays. This creates a row for each corresponding pair of filling and sauce.
In Example 6, we achieve the same result using arrayZip
to combine the arrays into tuples, then use arrayJoin
to expand these tuples into individual rows.
Both methods produce the same result: a count of taco orders for each unique combination of filling and sauce.