Funnel analysis in ClickHouse

Understanding user journeys better.

Funnel analysis in ClickHouse

Propel

Get started with Propel's Serverless ClickHouse forever-free plan today. Propel is the only Serverless ClickHouse with a true pay-per-query pricing and instant auto-scaling.

Funnel analysis is a technique used to understand how users progress through a series of steps, often in a sales or conversion process. The goal is to determine where users drop off, helping you identify bottlenecks and opportunities to optimize the flow. When it comes to large-scale user event data, ClickHouse provides robust tools to perform funnel analysis due to its speed, efficiency, and real-time capabilities. This post will show you how to do a funnel analysis in ClickHouse, focusing on a practical, real-world example.

How to perform funnel analysis in ClickHouse

The most effective way to perform funnel analysis in ClickHouse is by using the windowFunnel function. This function allows you to see how many users go through a predefined series of events within a specific time window. Let’s look at how to do this with an example.

Funnel analysis example

Imagine you run an online taco shop and want to understand how users move through the following funnel:

  1. Viewed Taco Menu
  2. Added Taco to Cart
  3. Purchased Taco

You have a table named taco_events that tracks user behavior, with the following columns:

  • user_id: The unique identifier for the user.
  • event_time: Timestamp of when the event occurred.
  • event_type: Type of event (e.g., view_menu, add_to_cart, purchase).

To analyze how many users complete all three steps of the funnel within a two-hour window, you can use the windowFunnel function as follows:

Creating the dataset

First, create the table to store the taco events:

CREATE TABLE taco_events (
  user_id UInt32,
  event_type String,
  event_time DateTime
) ENGINE = MergeTree()
ORDER BY (user_id, event_time);

Now let’s assume your table contains the following data:

| user_id | event_type | event_time           |
|---------|------------|----------------------|
| 1       | view_menu  | 2024-10-01 12:00:00  |
| 1       | add_to_cart| 2024-10-01 12:10:00  |
| 1       | purchase   | 2024-10-01 12:20:00  |
| 2       | view_menu  | 2024-10-01 13:00:00  |
| 2       | add_to_cart| 2024-10-01 13:30:00  |
| 3       | view_menu  | 2024-10-01 14:00:00  |
| 3       | add_to_cart| 2024-10-01 14:30:00  |
| 3       | purchase   | 2024-10-01 15:45:00  |
| 4       | view_menu  | 2024-10-01 16:00:00  |
| 4       | add_to_cart| 2024-10-01 16:20:00  |
| 5       | view_menu  | 2024-10-01 17:00:00  |
| 5       | add_to_cart| 2024-10-01 17:30:00  |
| 5       | purchase   | 2024-10-01 17:45:00  |
| 6       | view_menu  | 2024-10-01 18:00:00  |

Querying the funnel

Use the windowFunnel() function to analyze the funnel:

SELECT
  user_id,
  windowFunnel (7200) (
    event_time,
    event_type = 'view_menu',
    event_type = 'add_to_cart',
    event_type = 'purchase'
  ) AS funnel_stage
FROM
  taco_events
GROUP BY
  user_id;

Explanation:

  • windowFunnel(7200): This function looks at events within a 7200-second (2-hour) window to track the progression through the funnel.
  • event_time: The timestamp column used to order events.
  • event_type = 'view_menu', event_type = 'add_to_cart', event_type = 'purchase' Specifies the steps in the funnel: viewing the taco menu, adding a taco to the cart, and making a purchase.

The query will return the following:

| user_id | funnel_stage |
|---------|--------------|
| 1       | 3            |
| 2       | 2            |
| 3       | 2            |
| 4       | 2            |
| 5       | 3            |
| 6       | 1            |

Interpretation:

  • User 1: Completed all three stages (viewed menu, added to cart, purchased) within the 2-hour window, so the funnel_stage is 3.
  • User 2: Viewed the menu and added to the cart but did not purchase within the 2-hour window, so the funnel_stage is 2.
  • User 3: Viewed the menu and added to the cart, but the purchase occurred outside the 2-hour window, resulting in a funnel_stage of 2.
  • User 4: Viewed the menu and added to the cart but did not purchase within the 2-hour window, resulting in a funnel_stage of 2.
  • User 5: Completed all three stages (viewed menu, added to cart, purchased) within the 2-hour window, so the funnel_stage is 3.
  • User 6: Only viewed the menu, resulting in a funnel_stage of 1.

This result shows you which users completed each stage of the funnel and highlights where users are dropping off. In this case, Users 2, 3, and 4 added items to their carts but did not complete the purchase within the desired timeframe, and User 6 only viewed the menu.

Counting users per stage

To understand how many users are at each stage of the funnel, you can use the following query:

SELECT
  funnel_stage,
  COUNT(*) AS user_count
FROM
  (
    SELECT
      user_id,
      windowFunnel (7200) (
        event_time,
        event_type = 'view_menu',
        event_type = 'add_to_cart',
        event_type = 'purchase'
      ) AS funnel_stage
    FROM
      taco_events
    GROUP BY
      user_id
  ) AS funnel_results
GROUP BY
  funnel_stage;

Explanation:

  • This query first calculates the funnel_stage for each user in the subquery.
  • Then it counts how many users are at each funnel_stage.

Query results for users per stage:

| funnel_stage | user_count  |
|--------------|-------------|
| 1            | 1           |
| 2            | 3           |
| 3            | 2           |

When to use sequenceMatch vs windowFunnel

ClickHouse provides two powerful functions for analyzing user behavior in sequences of events: windowFunnel and sequenceMatch. While both can be used for similar purposes, each function is better suited for specific types of analysis.

  • Use windowFunnel when you need to track users through a fixed sequence of events within a specific time window. This function is great for scenarios where you are interested in understanding conversion funnels, such as tracking how many users move through different steps in a purchasing process, as we saw in our taco example. windowFunnel is optimized for scenarios where each step must happen in a specific order within a limited timeframe.
  • Use sequenceMatch when you need more flexibility in the pattern of events you are tracking. This function allows for more complex conditions and expressions, making it ideal if your analysis needs to account for optional steps, different possible event orders, or more nuanced patterns of user behavior. sequenceMatch is particularly useful for identifying more complex user journeys, such as detecting various behavioral patterns that may include loops or alternative paths.

For example, if you want to track users who may view the taco menu, optionally add tacos to their cart, and then purchase tacos, sequenceMatch would be a better choice. It allows you to capture multiple potential sequences, such as users who skip the cart step and purchase directly.

The choice between windowFunnel and sequenceMatch depends on your specific analytical needs. If your funnel involves a straightforward sequence with a clear time boundary, windowFunnel is your best option. However, if your analysis needs to capture flexible, non-linear paths or multiple optional steps, sequenceMatch is more suitable.

Conclusion

Funnel analysis in ClickHouse is a highly effective way to understand user behavior and optimize your processes. By leveraging windowFunnel, you can easily track user journeys like our taco example, identifying where drop-offs occur and improving the experience to maximize conversions. ClickHouse's efficiency allows you to perform large-scale, real-time funnel analysis, ensuring you always have the insights you need to your user events.

Get started with Propel's Serverless ClickHouse forever-free plan today. Propel is the only Serverless ClickHouse with a true pay-per-query pricing and instant auto-scaling. Contact us to learn more about our volume-based discounts. Visit our pricing page for details.

Related posts

What's new in ClickHouse version 24.10

This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

What's new in ClickHouse version 24.9

This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Flattening DynamoDB JSON in ClickHouse

This is some text inside of a div block.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Start shipping today

Deliver the analytics your customers have been asking for.