ClickHouse function reference
position
Returns the position (in bytes, starting from 1) of a substring in a string.
Syntax
position(haystack, needle[, start_pos])
Alias:
position(needle IN haystack)
Arguments
haystack
(String
): String to search in.
needle
(String
): Substring to search for.
start_pos
(UInt
, optional): Position (1-based) in haystack
to start the search.
Returns
- Starting position in bytes (counting from 1) if the substring was found. (
UInt64
)
- 0 if the substring was not found. (
UInt64
)
If needle
is empty, these rules apply:
- If no
start_pos
specified: return 1
- If
start_pos = 0
: return 1
- If
start_pos >= 1
and start_pos <= length(haystack) + 1
: return start_pos
- Otherwise: return 0
Example
SELECT
position('Carne asada tacos are delicious!', 'taco'),
position('Carne asada tacos are delicious!', 'burrito'),
position('Carne asada tacos are delicious!', 'a', 10);
Result:
| position('Carne asada tacos are delicious!', 'taco') | position('Carne asada tacos are delicious!', 'burrito') | position('Carne asada tacos are delicious!', 'a', 10) |
|------------------------------------------------------|---------------------------------------------------------|-------------------------------------------------------|
| 13 | 0 | 15 |
In this example:
- ‘taco’ is found at position 13
- ‘burrito’ is not found, so 0 is returned
- Starting from position 10, the next ‘a’ is found at position 15
Example with empty needle:
SELECT
position('Taco Tuesday', ''),
position('Taco Tuesday', '', 0),
position('Taco Tuesday', '', 1),
position('Taco Tuesday', '', 5),
position('Taco Tuesday', '', 13),
position('Taco Tuesday', '', 14);
Result:
| position('Taco Tuesday', '') | position('Taco Tuesday', '', 0) | position('Taco Tuesday', '', 1) | position('Taco Tuesday', '', 5) | position('Taco Tuesday', '', 13) | position('Taco Tuesday', '', 14) |
|------------------------------|---------------------------------|---------------------------------|---------------------------------|----------------------------------|----------------------------------|
| 1 | 1 | 1 | 5 | 13 | 0 |
This example demonstrates the behavior with an empty needle string and various start positions.
locate
Returns the position of a substring within a string.
Syntax:
locate(needle, haystack[, start_pos])
Arguments:
needle
(String
): Substring to search for.
haystack
(String
): String to search in.
start_pos
(UInt
, optional): Position to start the search from (1-based).
Returns:
- The position (in bytes) of the first occurrence of
needle
in haystack
, starting from 1.
- 0 if the substring is not found.
- If
needle
is empty, the function follows these rules:
- If no
start_pos
is specified: returns 1
- If
start_pos = 0
: returns 1
- If
start_pos >= 1
and start_pos <= length(haystack) + 1
: returns start_pos
- Otherwise: returns 0
Example:
SELECT
locate('salsa', 'spicy salsa verde'),
locate('guac', 'spicy salsa verde'),
locate('a', 'spicy salsa verde', 8)
Result:
| locate('salsa', 'spicy salsa verde') | locate('guac', 'spicy salsa verde') | locate('a', 'spicy salsa verde', 8) |
|--------------------------------------|-------------------------------------|-------------------------------------|
| 7 | 0 | 13 |
In this example:
- ‘salsa’ is found at position 7 in the string.
- ‘guac’ is not found, so it returns 0.
- ‘a’ is found at position 13 when starting the search from position 8.
The locate
function is case-sensitive. For case-insensitive searches, consider using locateCaseInsensitive
or locateCaseInsensitiveUTF8
functions.
positionCaseInsensitive
Returns the position (in bytes, starting from 1) of a substring in a string, ignoring case.
Syntax
positionCaseInsensitive(haystack, needle[, start_pos])
Arguments
haystack
(String
): String to search in.
needle
(String
): Substring to search for.
start_pos
(UInt
, optional): Position to start the search from (1-based).
Returns
- The position (in bytes) of the found substring, starting from 1.
- 0 if the substring was not found.
Example
SELECT
positionCaseInsensitive('Taco Tuesday', 'taco');
Result:
| positionCaseInsensitive('Taco Tuesday', 'taco') |
|-------------------------------------------------|
| 1 |
This function performs a case-insensitive search, so ‘taco’ matches ‘Taco’ at position 1.
Example with start_pos
SELECT
positionCaseInsensitive('Taco Tuesday Taco Thursday', 'taco', 7);
Result:
| positionCaseInsensitive('Taco Tuesday Taco Thursday', 'taco', 7) |
|------------------------------------------------------------------|
| 14 |
Here, the search starts from position 7, finding the second occurrence of ‘Taco’ at position 14.
The function follows the same rules for empty substrings as the position
function. If the needle
is an empty string, these rules apply:
- If no
start_pos
was specified: return 1
- If
start_pos = 0
: return 1
- If
start_pos >= 1
and start_pos <= length(haystack) + 1
: return start_pos
- Otherwise: return 0
This function is useful when you need to find a substring within a larger string without worrying about the case of the characters, such as searching for ingredient names in a recipe database regardless of capitalization.
positionUTF8
Returns the position (in Unicode code points, starting from 1) of a substring in a string, assuming both the substring and the string are UTF-8 encoded.
Syntax
positionUTF8(haystack, needle[, start_pos])
Arguments
haystack
(String
): The string to search in.
needle
(String
): The substring to search for.
start_pos
(UInt
, optional): The position (1-based) in haystack
at which the search starts.
Returns
- The starting position in Unicode code points (1-based) if the substring was found.
- 0 if the substring was not found.
Example
SELECT
positionUTF8('Jalapeño and Habanero Tacos', 'ñ');
Result:
| positionUTF8('Jalapeño and Habanero Tacos', 'ñ') |
|--------------------------------------------------|
| 6 |
In this example, positionUTF8
correctly counts the Unicode character ‘ñ’ as a single code point, returning its position as 6.
If the needle
substring is empty, these rules apply:
- If no
start_pos
was specified: return 1
- If
start_pos = 0
: return 1
- If
start_pos >= 1
and start_pos <= length(haystack) + 1
: return start_pos
- Otherwise: return 0
positionCaseInsensitiveUTF8
Returns the position (in Unicode code points) of a substring in a string, ignoring case and assuming UTF-8 encoding.
Syntax
positionCaseInsensitiveUTF8(haystack, needle[, start_pos])
Arguments
haystack
(String
): The string to search in.
needle
(String
): The substring to search for.
start_pos
(UInt
, optional): The position to start searching from (1-based).
Returns
- The position (in Unicode code points) of the first occurrence of
needle
in haystack
starting from start_pos
, if found.
- 0, if
needle
is not found.
Type: UInt64
Example
SELECT
positionCaseInsensitiveUTF8('Jalapeño and Habanero Tacos', 'ñO');
Result:
| positionCaseInsensitiveUTF8('Jalapeño and Habanero Tacos', 'ñO') |
|------------------------------------------------------------------|
| 6 |
This example finds the position of ‘ñO’ (case-insensitive) in the UTF-8 encoded string, returning 6 as the position of ‘ño’ in ‘Jalapeño’.
- The function is case-insensitive, so ‘A’ and ‘a’ are considered the same.
- The function counts positions in Unicode code points, not bytes.
- If
needle
is empty, the function returns 1.
- If
start_pos
is 0, it’s treated as 1.
- If
start_pos
is greater than the length of haystack
, the function returns 0.
This function is useful for searching within text that may contain non-ASCII characters and where case should be ignored, such as when dealing with user-generated content or multilingual data.
multiSearchAllPositions
Returns an array of positions (in bytes, starting from 1) of substrings in a string.
Syntax
multiSearchAllPositions(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): String to search in.
needle
(Array(String)
): Array of substrings to search for.
Returns:
- Array of positions.
- For each needle:
- If found, the position (in bytes from 1) of the first occurrence.
- If not found, 0.
Type: Array(UInt64)
Example
Query:
SELECT
multiSearchAllPositions(
'Carne asada tacos are delicious!',
['taco', 'salsa', 'guacamole']
) AS positions
Result:
| positions |
|---------------|
| [13,0,0] |
In this example:
- ‘taco’ is found at position 13
- ‘salsa’ and ‘guacamole’ are not found, so their positions are 0
The function supports up to 2^8 needles.
multiSearchAllPositionsCaseInsensitive
Returns an array of positions (in bytes, starting from 1) of all occurrences of the needle substrings in the haystack string, ignoring case.
Syntax
multiSearchAllPositionsCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): String in which to search.
needle
(Array(String)
): Array of substrings to search for.
Returns:
- Array of the same size as the
needle
array, where each element is:
- The position (in bytes) of the corresponding needle in haystack, if found.
- 0 if the substring was not found.
[Array(UInt64)
]
Example
SELECT
multiSearchAllPositionsCaseInsensitive(
'The quick brown fox jumps over the lazy dog',
['THE', 'dog', 'quick', 'TACO']
) AS positions
Result:
| positions |
|---------------|
| [1, 41, 5, 0] |
In this example:
- ‘THE’ is found at position 1 (case-insensitive match with ‘The’)
- ‘dog’ is found at position 41
- ‘quick’ is found at position 5
- ‘TACO’ is not found, so its position is 0
This function is useful for simultaneously searching for multiple substrings in a given text, ignoring case differences. It can be particularly handy in text analysis, log parsing, or when working with user-generated content where case sensitivity is not important.
The function supports up to 2^8 (256) needles.
multiSearchAllPositionsUTF8
Searches for multiple substrings in a UTF-8 encoded string and returns an array of positions.
Syntax
multiSearchAllPositionsUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): UTF-8 encoded string in which to search.
needle
(Array(String)
): Array of UTF-8 encoded substrings to search for.
Returns
- An array of the same size as the
needle
array, where each element is:
- The position (in bytes, starting from 1) of the corresponding
needle
if found.
- 0 if the substring was not found.
[Array(UInt64)
]
Example
SELECT
multiSearchAllPositionsUTF8(
'Spicy jalapeño and habanero tacos',
['jalapeño', 'habanero', 'salsa']
);
Result:
| multiSearchAllPositionsUTF8('Spicy jalapeño and habanero tacos', ['jalapeño', 'habanero', 'salsa']) |
|-----------------------------------------------------------------------------------------------------|
| [7,21,0] |
In this example:
- ‘jalapeño’ is found at position 7
- ‘habanero’ is found at position 21
- ‘salsa’ is not found, so its position is 0
- This function is UTF-8 aware, so it correctly handles multi-byte characters.
- The function supports up to 2^8 needles.
- If any non-constant
haystack
or needle
arguments are more than 32KB in size, the function will throw an exception.
multiSearchAllPositionsCaseInsensitiveUTF8
Returns an array of positions (in bytes, starting from 1) of all occurrences of multiple needle substrings in a haystack string. The search is case-insensitive and assumes UTF-8 encoding.
Syntax
multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): UTF-8 encoded string in which the search is performed.
needle
(Array(String)
): Array of UTF-8 encoded substrings to be searched.
Returns
- Array of arrays, where each inner array contains:
- The starting positions (in bytes, counting from 1) of the corresponding needle if found.
- 0 if the needle was not found.
Type: Array(Array(UInt64))
Example
SELECT
multiSearchAllPositionsCaseInsensitiveUTF8(
'Spicy Taco Tuesday',
['taco', 'SPICY', 'wednesday']
) AS positions
Result:
| positions |
|---------------|
| [[7],[1],[0]] |
In this example:
- ‘taco’ is found at position 7 (case-insensitive)
- ‘SPICY’ is found at position 1 (case-insensitive)
- ‘wednesday’ is not found, so 0 is returned
- This function supports up to 2^8 needles.
- The function is optimized for searching multiple substrings in a single pass through the haystack.
- It’s particularly useful for tasks like keyword matching or filtering based on multiple criteria.
multiSearchFirstPosition
Returns the position of the leftmost occurrence of any of the substrings in the haystack string.
Syntax
multiSearchFirstPosition(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): String in which to search.
needle
(Array(String)
): Array of substrings to search for.
Returns
- The position (in bytes) of the leftmost occurrence of any needle substring in the haystack string.
- 0 if none of the needles are found.
Type: UInt64
Example
SELECT
multiSearchFirstPosition(
'I love tacos and burritos',
['tacos', 'burritos', 'nachos']
) AS position
Result:
| position |
|----------|
| 8 |
In this example, ‘tacos’ is the leftmost occurrence among the search terms, starting at position 8 in the haystack string.
This function is case-sensitive. For case-insensitive searches, use multiSearchFirstPositionCaseInsensitive
.
multiSearchFirstPositionCaseInsensitive
Returns the position of the first occurrence of any of the specified substrings in the given string, ignoring case. The search is performed case-insensitively.
Syntax
multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): The string to search in.
needle
(Array(String)
): Array of substrings to search for.
Returns
- The position (in bytes) of the first occurrence of any matched substring (1-based).
- 0 if none of the substrings are found.
Type: UInt64
Example
SELECT
multiSearchFirstPositionCaseInsensitive(
'Crunchy Taco Supreme with Hot Sauce',
['TACO', 'sauce', 'CHEESE']
) AS position
Result:
| position |
|----------|
| 9 |
In this example, ‘TACO’ is found first at position 9, ignoring case.
This function is useful for efficiently finding the first occurrence of multiple substrings in a given text, without being sensitive to letter case.
multiSearchFirstPositionUTF8
Returns the position of the first occurrence of any of the substrings in the string, assuming UTF-8 encoding.
Syntax
multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): The UTF-8 encoded string to search in.
needle
(Array(String)
): An array of UTF-8 encoded substrings to search for.
Returns
- The position (in bytes) of the leftmost match of any of the needles within the haystack.
- 0 if none of the needles are found.
Type: UInt64
Example
SELECT
multiSearchFirstPositionUTF8(
'Spicy jalapeño and habanero tacos',
['jalapeño', 'habanero', 'salsa']
) AS position
Result:
| position |
|----------|
| 7 |
In this example, ‘jalapeño’ is found at position 7 (counting bytes, not characters) in the UTF-8 encoded string.
- This function is case-sensitive.
- If you need case-insensitive search, use
multiSearchFirstPositionCaseInsensitiveUTF8
instead.
- The function returns the position in bytes, not in characters, which is important for UTF-8 strings where characters can occupy multiple bytes.
- The function supports up to 2^8 needles in the array.
multiSearchFirstPositionCaseInsensitiveUTF8
Returns the position of the first occurrence of any of the substrings in the string, ignoring case and assuming UTF-8 encoding.
Syntax
multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): The UTF-8 encoded string to search in.
needle
(Array(String)
): Array of UTF-8 encoded substrings to search for.
Returns:
- The position (in bytes) of the leftmost match of any of the needles.
- 0 if none of the needles are found.
Type: UInt64
Example
SELECT
multiSearchFirstPositionCaseInsensitiveUTF8(
'Carnitas Tacos, Carne Asada Tacos, Pescado Tacos',
['TACO', 'BURRITO']
) AS position
Result:
| position |
|----------|
| 9 |
This function finds the first occurrence of either ‘TACO’ or ‘BURRITO’ (case-insensitive) in the UTF-8 encoded string of taco types. It returns 9, which is the byte position of the first ‘Tacos’ in ‘Carnitas Tacos’.
- The function is case-insensitive, so ‘TACO’ matches ‘Tacos’.
- The search is performed on UTF-8 encoded strings, so it correctly handles multi-byte characters.
- If none of the needles are found, the function returns 0.
- The function supports up to 2^8 needles in the array.
This function is useful for efficiently finding the first occurrence of multiple substrings in a larger string, especially when dealing with case-insensitive, UTF-8 encoded text.
multiSearchFirstIndex
Returns the index of the first matching needle string found in the haystack string.
Syntax
multiSearchFirstIndex(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): String to search in.
needle
(Array(String)
): Array of strings to search for.
Returns
- The index (1-based) of the first matching needle string.
- 0 if none of the needle strings are found.
Type: UInt8
Example
SELECT
multiSearchFirstIndex(
'Carne asada tacos are the best',
['tacos', 'burritos', 'quesadillas']
) AS position
Result:
| position |
|----------|
| 1 |
In this example, ‘tacos’ is the first needle found in the haystack, so the function returns 1 (the index of ‘tacos’ in the needle array).
- The search is case-sensitive.
- If you need case-insensitive search, use
multiSearchFirstIndexCaseInsensitive
instead.
- For UTF-8 encoded strings, use
multiSearchFirstIndexUTF8
.
- The function returns as soon as it finds the first match, which can be more efficient than searching for all occurrences.
multiSearchFirstIndexCaseInsensitive
Returns the index of the first matched needle in the haystack string, ignoring case. The index is 1-based, meaning the first position is 1, not 0.
Syntax
multiSearchFirstIndexCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): The string to search in.
needle
(Array(String)
): Array of substrings to search for.
Returns
- The index of the first matched needle (1-based).
- 0 if none of the needles are found.
Type: UInt8
Example
SELECT
multiSearchFirstIndexCaseInsensitive(
'Crunchy Taco Supreme', ['TACO', 'burrito', 'nacho']
) AS position
Result:
| position |
|----------|
| 1 |
In this example, ‘TACO’ is the first needle that matches (case-insensitively) in the haystack, and its index is 1.
multiSearchFirstIndexUTF8
Returns the index of the first found needle in the haystack string, assuming both are UTF-8 encoded. The index is 1-based, meaning the first position is 1, not 0.
Syntax
multiSearchFirstIndexUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): UTF-8 encoded string in which to search.
needle
(Array(String)
): Array of UTF-8 encoded substrings to search for.
Returns
- The index (starting from 1) of the leftmost found needle.
- 0 if none of the needles are found.
Type: UInt8
Example
SELECT
multiSearchFirstIndexUTF8(
'Spicy jalapeño and habanero tacos',
['jalapeño', 'habanero', 'salsa']
) AS position
Result:
| position |
|----------|
| 1 |
In this example, ‘jalapeño’ is the first needle found in the haystack, so the function returns 1.
multiSearchFirstIndexCaseInsensitiveUTF8
Returns the index of the first found needle from the array in the haystack string, ignoring case and assuming UTF-8 encoding.
Syntax
multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): UTF-8 encoded string in which to search.
needle
(Array(String)
): Array of UTF-8 encoded strings to search for.
Returns:
- Index (starting from 1) of the leftmost found needle.
- 0 if none of the needles are found.
Type: UInt8
Example
Query:
SELECT
multiSearchFirstIndexCaseInsensitiveUTF8(
'Spicy Taco Tuesday',
['TACO', 'burrito', 'NACHO']
) AS position
Result:
| position |
|----------|
| 1 |
In this example, ‘TACO’ (index 1) is found first in the haystack, ignoring case.
This function is useful for efficiently finding the first occurrence of multiple strings within a larger text, while ignoring case differences and properly handling UTF-8 encoded strings.
multiSearchAny
Returns 1 if at least one of the strings in the needles
array matches the haystack
string, and 0 otherwise.
Syntax
multiSearchAny(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): String in which to search.
needles
(Array(String)
): Array of substrings to search for.
Returns
- 1 if there was at least one match.
- 0 if there was no match.
Type: UInt8
Example
SELECT
multiSearchAny(
'Carne asada tacos are the best',
['burrito', 'taco', 'quesadilla']
) AS position
Result:
| position |
|----------|
| 1 |
In this example, the function returns 1 because the substring ‘taco’ is found in the haystack string.
multiSearchAnyCaseInsensitive
Returns 1 if at least one of the strings in the needles
array matches the haystack
string case-insensitively, and 0 otherwise.
Syntax
multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): String in which to search.
needles
(Array(String)
): Array of strings to search for.
Returns
- 1 if there was at least one case-insensitive match.
- 0 if there was no match.
Type: UInt8
Example
Query:
SELECT
multiSearchAnyCaseInsensitive(
'Crunchy Taco Supreme',
['TACO', 'burrito']
) AS position
Result:
| position |
|----------|
| 1 |
In this example, the function returns 1 because ‘TACO’ matches ‘Taco’ in the haystack string, ignoring case.
This function is useful for efficient case-insensitive searches of multiple substrings within a larger string. It’s particularly handy when dealing with user input or when case sensitivity is not important in your search criteria.
multiSearchAnyUTF8
Checks if any of the substrings from the needles
array are present in the haystack
string. Assumes both haystack
and needles
are UTF-8 encoded strings.
Syntax
multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): UTF-8 encoded string to search in.
needles
(Array(String)
): Array of UTF-8 encoded substrings to search for.
Returns
- 1 if at least one match is found.
- 0 if no matches are found.
Type: UInt8
Example
Query:
SELECT
multiSearchAnyUTF8(
'Carne asada tacos are the best',
['burrito', 'taco', 'quesadilla']
) AS position
Result:
| position |
|----------|
| 1 |
In this example, the function returns 1 because the substring ‘taco’ is found in the haystack string.
multiSearchAnyCaseInsensitiveUTF8
Checks if any of the UTF-8 encoded substrings are present in the given UTF-8 encoded string, ignoring case.
Syntax
multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments
haystack
(String
): UTF-8 encoded string in which to search.
needle
(Array(String)
): Array of UTF-8 encoded substrings to search for.
Returns
- 1 if any of the substrings are found (ignoring case).
- 0 if none of the substrings are found.
Type: UInt8
Example
Query:
SELECT
multiSearchAnyCaseInsensitiveUTF8(
'Crunchy Taco Supreme',
['SUPREME', 'burrito', 'nacho']
) AS position
Result:
| position |
|----------|
| 1 |
In this example, the function returns 1 because ‘SUPREME’ is found in the haystack ‘Crunchy Taco Supreme’, ignoring case.
This function is case-insensitive and works with UTF-8 encoded strings. It’s useful for checking the presence of multiple substrings in a given text, regardless of their case or Unicode representation.
match
Checks whether a string matches a regular expression pattern.
Syntax
Alias:
Arguments
haystack
(String
): The string to search in.
pattern
(String
): The regular expression pattern with re2 syntax.
Returns
1
if the string matches the pattern.
0
if the string does not match the pattern.
[UInt8
]
- The function uses UTF-8 encoding. The regular expression must not contain null bytes.
- Unlike re2’s default behavior,
.
matches line breaks. To disable this, prepend the pattern with (?-s)
.
- For simple substring searches, consider using
like
or position
functions for better performance.
Example
SELECT
match('Crunchy Taco Supreme', '.*Taco.*') AS has_taco,
match('Burrito Bowl', '.*Taco.*') AS no_taco;
Result:
| has_taco | no_taco |
|----------|---------|
| 1 | 0 |
This example checks if the menu items contain the word “Taco”.
Alias
The REGEXP
operator is an alias for match
:
multiMatchAllIndices
Returns an array of all indices that match the haystack string using multiple regular expression patterns.
Syntax
multiMatchAllIndices(haystack, [pattern1, pattern2, ..., patternN])
Arguments
haystack
(String
): String to search in.
pattern1, pattern2, ..., patternN
(Array(String)
): Regular expressions with re2 syntax.
Returns
- Array of indices (1-based) of patterns that match the haystack, in the order they appear in the haystack. (
Array(UInt64)
)
Example
SELECT
multiMatchAllIndices(
'The quick brown fox jumps over the lazy dog',
['quick', 'brown', 'lazy', 'taco']
) AS matched_indices;
Result:
| matched_indices |
|-----------------|
| [1,2,8] |
In this example:
- Index 1 corresponds to ‘quick’
- Index 2 corresponds to ‘brown’
- Index 8 corresponds to ‘lazy’
- ‘taco’ is not found, so it’s not included in the result
Extracts a fragment of a string using a regular expression. If the haystack does not match the pattern, an empty string is returned.
For regex without subpatterns, the function uses the fragment that matches the entire regex. Otherwise, it uses the fragment that matches the first subpattern.
Syntax:
extract(haystack, pattern)
Arguments:
haystack
(String
): String in which the search is performed.
pattern
(String
): Regular expression with re2 syntax.
Returns:
- The matched fragment, or an empty string if no match is found. [
String
]
Example:
SELECT
extract('Spicy beef taco: $5.99', '\\$\\d+\\.\\d+') AS price;
Result:
| price |
|-------|
| $5.99 |
In this example, extract
finds the price pattern in the taco description string.
If you only want to search for substrings in a string, consider using functions like position
or like
instead, as they are generally faster than regular expression-based functions.
Extracts all fragments of a string that match a regular expression pattern.
Syntax:
extractAll(haystack, pattern)
Arguments:
haystack
(String
) - The string to search in.
pattern
(String
) - The regular expression pattern to match.
Returns:
An array of strings containing all matches of the pattern in the haystack. If no matches are found, an empty array is returned. [Array(String)
]
Example:
SELECT
extractAll('I love tacos, burritos, and quesadillas', '\\w+dillas?') AS mexican_dishes;
Result:
| mexican_dishes |
|-----------------|
| ['quesadillas'] |
In this example, extractAll
finds all words ending in “dillas” or “dilla” in the given string about Mexican food.
The regular expression pattern uses \w+
to match one or more word characters, followed by “dillas” with an optional “s” at the end. The backslashes are escaped in the SQL string.
Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.
Syntax:
extractAllGroupsHorizontal(haystack, pattern)
Arguments:
haystack
(String
) — Input string.
pattern
(String
) — Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown.
Returns:
- Array of arrays of matches. (
Array
)
If haystack
does not match the pattern regex, an array of empty arrays is returned.
Example:
SELECT
extractAllGroupsHorizontal(
'Carne asada: $10, Al pastor: $12, Chorizo: $11',
'(\w+\s?\w+):\s?\$(\d+)'
) AS taco_info;
Result:
| taco_info |
|----------------------------------------------------------|
| [['Carne asada','Al pastor','Chorizo'],['10','12','11']] |
In this example:
- The first array contains all matched taco names:
['Carne asada','Al pastor','Chorizo']
- The second array contains all matched prices:
['10','12','11']
This function is useful for extracting structured data from strings, such as parsing menu items and prices from a text description.
Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the haystack.
Syntax:
extractAllGroupsVertical(haystack, pattern)
Arguments:
haystack
(String
): Input string.
pattern
(String
): Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown.
Returns:
Array of arrays of matches. (Array
)
If haystack does not match the pattern regex, an empty array is returned.
Example:
SELECT
extractAllGroupsVertical(
'Taco Tuesday: $5 tacos, $3 guacamole, $2 salsa',
'(\$\d+)\s+(\w+)'
) AS taco_info;
Result:
| taco_info |
|-------------------------------------------------------------------------------------------|
| [['$5','tacos'],['$3','guacamole'],['$2','salsa']] |
In this example:
- The function extracts all price-item pairs from the taco special menu.
- Each inner array contains the price (first group) and the item name (second group).
- The results are grouped vertically, meaning each inner array represents a complete match of both groups.
This function is particularly useful when you need to extract and group multiple related pieces of information from a string using a regular expression with multiple capture groups.
like
Checks if a string matches a pattern.
Syntax
Alias:
haystack LIKE pattern
(operator)
Arguments
haystack
(String
): The string to search in.
pattern
(String
): The LIKE pattern to match against.
Returns
1
if the string matches the pattern, 0
otherwise. [UInt8
]
- The
LIKE
pattern can contain the following special characters:
%
matches any number of characters (including zero characters)
_
matches any single character
\
escapes special characters (%
, _
, and \
)
- Matching is based on UTF-8 encoding
- For patterns of the form
%needle%
, the function is as fast as the position
function
- Other patterns are internally converted to regular expressions
Example
SELECT
'Carne Asada' AS taco_type,
like(taco_type, '%Asada') AS ends_with_asada,
like(taco_type, 'Carne%') AS starts_with_carne,
like(taco_type, '%rne A%') AS contains_rne_a
Result:
| taco_type | ends_with_asada | starts_with_carne | contains_rne_a |
|-------------|-----------------|-------------------|----------------|
| Carne Asada | 1 | 1 | 1 |
In this example, we check if the taco type ‘Carne Asada’ matches various LIKE patterns.
To match against literal %
, _
, and \
characters, prepend them with a backslash: \%
, \_
, and \\
. Note that in SQL strings, backslashes must be escaped as well, so you would actually write \\%
, \\_
, and \\\\
.
notLike
Returns whether a string does not match a pattern. This function is the negation of the like
function.
Syntax:
notLike(haystack, pattern)
Alias:
haystack NOT LIKE pattern
(operator)
Arguments:
haystack
(String
): The string to search in.
pattern
(String
): The pattern to search for, which can contain the following wildcards:
%
— Matches any number of characters (including zero characters)
_
— Matches any single character
Returns:
1
if the string does not match the pattern.
0
if the string matches the pattern.
[UInt8
]
Example:
SELECT
notLike('Spicy Beef Taco', 'Spicy%') AS not_spicy,
notLike('Veggie Taco', 'Spicy%') AS not_spicy_veggie;
Result:
| not_spicy | not_spicy_veggie |
|-----------|------------------|
| 0 | 1 |
In this example:
'Spicy Beef Taco'
matches the pattern 'Spicy%'
, so notLike
returns 0
.
'Veggie Taco'
does not match the pattern 'Spicy%'
, so notLike
returns 1
.
This function is useful for filtering out rows that do not match a certain pattern.
The notLike
function is case-sensitive. For case-insensitive matching, use notILike
.
ilike
Performs a case-insensitive pattern matching operation.
Syntax:
Alias:
haystack ILIKE pattern
(operator)
Arguments:
haystack
(String
): The string to search in.
pattern
(String
): The LIKE pattern to match against.
Returns:
Returns 1 if the string matches the pattern, 0 otherwise. [UInt8
]
Example:
SELECT
taco_name,
ilike(taco_name, '%spicy%') AS is_spicy
FROM
taco_menu;
Result:
| taco_name | is_spicy |
|-------------------|----------|
| Spicy Beef Taco | 1 |
| Mild Chicken Taco | 0 |
| SPICY Pork Taco | 1 |
In this example, ilike
checks if each taco_name
contains the word “spicy” (case-insensitive). The function returns 1 for names containing “spicy” in any case, and 0 otherwise.
The ilike
function is similar to like
, but it ignores case when matching. It’s particularly useful when you need to perform case-insensitive string matching operations in your queries.
The ilike
function follows the same pattern matching rules as like
, where %
matches any number of characters and _
matches any single character.
notILike
Returns whether the string haystack
does not match the case-insensitive LIKE expression pattern
.
Syntax:
notILike(haystack, pattern)
Alias:
haystack NOT ILIKE pattern
(operator)
Arguments:
haystack
(String
): The string to search in.
pattern
(String
): The LIKE expression pattern to match against.
Returns:
1
if haystack
does not match pattern
, 0
otherwise. [UInt8
]
Example:
SELECT
notILike('Spicy Taco', '%taco%') AS result;
Result:
| result |
|--------|
| 0 |
In this example, notILike
returns 0
because ‘Spicy Taco’ does match the case-insensitive pattern ‘%taco%’.
- The
ILIKE
pattern matching is case-insensitive.
%
matches any number of characters (including zero characters).
_
matches any single character.
- To escape special characters (
%
, _
, \
), precede them with a backslash (\
).
- This function is the negation of the
ilike
function.
When working with literal backslashes in SQL strings, you may need to escape them twice: once for the SQL string literal, and once for the LIKE pattern. For example, to match a literal underscore, you might use '\\\_'
.
ngramDistance
Calculates the 4-gram distance between two strings. Returns a value between 0 and 1, where a smaller value indicates more similar strings.
Syntax
ngramDistance(haystack, needle)
Arguments
haystack
(String
) — First string for comparison.
needle
(String
) — Second string for comparison.
Returns:
- A value between 0 and 1 representing the similarity between the two strings. (
Float32
)
Details
The function counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. A result closer to 0 indicates more similar strings.
This function will throw an exception if constant needle
or haystack
arguments are more than 32Kb in size. For non-constant arguments larger than 32Kb, the distance is always 1.
Example
SELECT
ngramDistance('Carne Asada Taco', 'Carnitas Taco');
Result:
| ngramDistance('Carne Asada Taco', 'Carnitas Taco') |
|----------------------------------------------------|
| 0.5555556 |
In this example, the 4-gram distance between “Carne Asada Taco” and “Carnitas Taco” is calculated. The result of 0.5555556 indicates a moderate level of similarity between the two taco names.
See Also
ngramDistanceCaseInsensitive
ngramDistanceUTF8
ngramDistanceCaseInsensitiveUTF8
ngramDistanceCaseInsensitive
Calculates the 4-gram distance between two strings, ignoring case. Returns a value between 0 and 1, where 0 indicates identical strings and 1 indicates completely different strings.
Syntax
ngramDistanceCaseInsensitive(haystack, needle)
Arguments
haystack
(String
): First string for comparison.
needle
(String
): Second string for comparison.
Returns
- The 4-gram distance between the strings, ignoring case. (
Float32
)
Example
SELECT
ngramDistanceCaseInsensitive('Spicy Beef Taco', 'spicy chicken taco');
Result:
| ngramDistanceCaseInsensitive('Spicy Beef Taco', 'spicy chicken taco') |
|-----------------------------------------------------------------------|
| 0.23529412 |
This example compares two taco descriptions, ignoring case. The result (0.23529412) indicates that the strings are quite similar, with the main difference being “Beef” vs “chicken”.
- The function is case-insensitive, so “Taco” and “taco” are treated as identical.
- If constant arguments are more than 32KB in size, an exception is thrown.
- For non-constant arguments larger than 32KB, the distance is always 1.
This function is useful for fuzzy string matching and comparing similar strings while ignoring differences in capitalization.
ngramDistanceUTF8
Calculates the 4-gram distance between two UTF-8 encoded strings.
Syntax:
ngramDistanceUTF8(haystack, needle)
Arguments:
haystack
(String
): First UTF-8 encoded string for comparison.
needle
(String
): Second UTF-8 encoded string for comparison.
Returns:
- A value between 0 and 1 representing the similarity between the two strings. The smaller the result, the more similar the strings are to each other. [
Float32
]
Example:
SELECT
ngramDistanceUTF8('Spicy Carnitas Taco', 'Crunchy Fish Taco');
Result:
| ngramDistanceUTF8('Spicy Carnitas Taco', 'Crunchy Fish Taco') |
|---------------------------------------------------------------|
| 0.6666667 |
This example calculates the 4-gram distance between two different taco names. The result of 0.6666667 indicates that the strings are somewhat different, but have some similarity.
Implementation details:
The function uses the 3-gram distance for UTF-8 strings. It’s not a perfectly fair n-gram distance. The function uses 2-byte hashes to hash n-grams and then calculates the symmetric difference between these hash tables – collisions may occur.
If constant haystack
or needle
arguments are more than 32KB in size, an exception will be thrown. For non-constant arguments exceeding 32KB, the distance is always 1.
This function is useful for fuzzy string matching and comparison, especially when dealing with UTF-8 encoded text such as taco menu items in different languages or with special characters.
ngramDistanceCaseInsensitiveUTF8
Calculates the 4-gram distance between two UTF-8 encoded strings, ignoring case. Returns a value between 0 and 1, where 0 indicates identical strings and 1 indicates completely different strings.
Syntax
ngramDistanceCaseInsensitiveUTF8(haystack, needle)
Arguments
haystack
(String
): First UTF-8 encoded string for comparison.
needle
(String
): Second UTF-8 encoded string for comparison.
Returns
- A value between 0 and 1 representing the similarity between the two strings. [
Float32
]
Example
SELECT
ngramDistanceCaseInsensitiveUTF8('Crunchy Taco', 'crunchy taco');
Result:
| ngramDistanceCaseInsensitiveUTF8('Crunchy Taco', 'crunchy taco') |
|------------------------------------------------------------------|
| 0 |
This example demonstrates that the function is case-insensitive, returning 0 for identical strings that differ only in case.
- This function uses the 3-gram distance for UTF-8 strings.
- The comparison is not a perfectly fair n-gram distance. It uses 2-byte hashes to hash n-grams and then calculates the symmetric difference between these hash tables, which may result in collisions.
- For case-insensitive UTF-8 comparison, it doesn’t use a fair
tolower
function. Instead, it zeroes the 5th bit (starting from zero) of each codepoint byte and the first bit of the zeroth byte if there are more than one byte. This works for Latin and most Cyrillic letters.
If you need to compare strings without considering UTF-8 encoding or case sensitivity, you can use the ngramDistance
function instead.
ngramSearch
Calculates the non-symmetric difference between a needle string and a haystack string. This function is useful for fuzzy string search.
Syntax
ngramSearch(haystack, needle)
Arguments
haystack
(String
): String in which the search is performed.
needle
(String
): String to be searched.
Returns
- A value between 0 and 1 representing the likelihood of the needle being in the haystack. The bigger the result, the more likely needle is in haystack. (
Float32
)
Example
SELECT
ngramSearch('Crunchy Taco Supreme', 'Supreme Taco');
Result:
| ngramSearch('Crunchy Taco Supreme', 'Supreme Taco') |
|-----------------------------------------------------|
| 0.6666667 |
In this example, we calculate the likelihood of finding ‘Supreme Taco’ in ‘Crunchy Taco Supreme’. The result of 0.6666667 indicates a high likelihood of a match.
The function calculates the number of n-grams from the needle minus the common number of n-grams normalized by the number of needle n-grams. It uses 4-grams by default.
ngramSearchCaseInsensitive
Calculates the non-symmetric difference between a needle string and a haystack string, ignoring case. Returns a value between 0 and 1, where a higher value indicates a higher likelihood of the needle being in the haystack.
This function is useful for fuzzy string searching.
Syntax
ngramSearchCaseInsensitive(haystack, needle)
Arguments
haystack
(String
): String in which to search.
needle
(String
): Substring to search for.
Returns:
- A value between 0 and 1. (
Float32
)
- Closer to 0: Strings are less similar.
- Closer to 1: Strings are more similar or likely to contain the needle.
Example
SELECT
ngramSearchCaseInsensitive('Crunchy Taco Supreme', 'taco');
Result:
| ngramSearchCaseInsensitive('Crunchy Taco Supreme', 'taco') |
|-----------------------------------------------------------|
| 1 |
This example shows that ‘taco’ is found in ‘Crunchy Taco Supreme’, ignoring the case difference.
- The function uses 4-grams for comparison.
- For UTF-8 strings, consider using
ngramSearchCaseInsensitiveUTF8
instead.
- If constant needle or haystack arguments are more than 32KB in size, an exception is thrown.
- For non-constant arguments larger than 32KB, the result is always 1.
See also: ngramSearch
, ngramSearchUTF8
, ngramSearchCaseInsensitiveUTF8
ngramSearchUTF8
Calculates the non-symmetric difference between a needle string and a haystack string, assuming both are UTF-8 encoded. This function is useful for fuzzy string searching.
Syntax
ngramSearchUTF8(haystack, needle)
Arguments
haystack
(String
): UTF-8 encoded string in which to search.
needle
(String
): UTF-8 encoded string to search for.
Returns
- A value between 0 and 1 representing the likelihood of the needle being in the haystack. The larger the result, the more likely the needle is present. (
Float32
)
Details
- This function uses 3-gram distances for UTF-8 strings.
- The implementation uses 2-byte hashes to hash n-grams and calculates the non-symmetric difference between these hash tables.
- Collisions may occur, which can affect the accuracy of the result.
Example
SELECT
ngramSearchUTF8('Spicy Jalapeño Taco', 'jalapeño');
Result:
| ngramSearchUTF8('Spicy Jalapeño Taco', 'jalapeño') |
|----------------------------------------------------|
| 0.75 |
In this example, the function returns a high value (0.75) indicating a strong likelihood that ‘jalapeño’ is present in the haystack string, despite the difference in case.
This function is more computationally intensive than exact string matching functions. Use it when fuzzy matching is required, such as for user input correction or similarity searches in text data.
ngramSearchCaseInsensitiveUTF8
Performs a case-insensitive n-gram search on UTF-8 encoded strings, calculating the likelihood of a needle being present in a haystack.
Syntax
ngramSearchCaseInsensitiveUTF8(haystack, needle)
Arguments
haystack
(String
): UTF-8 encoded string to search in.
needle
(String
): UTF-8 encoded substring to search for.
Returns
- A value between 0 and 1 representing the likelihood of the needle being in the haystack. (
Float32
)
The larger the result, the more likely the needle is present in the haystack.
Example
SELECT
ngramSearchCaseInsensitiveUTF8('Spicy Carnitas Tacos', 'TACO') AS search_result;
Result:
| search_result |
|---------------|
| 0.625 |
This example searches for ‘TACO’ in ‘Spicy Carnitas Tacos’, ignoring case. The result of 0.625 indicates a high likelihood of the word being present.
- This function uses 3-gram distances for UTF-8 strings.
- The case-insensitive comparison is not perfectly fair for all Unicode characters.
- For non-Latin scripts, the function may not provide accurate results in all cases.
This function is useful for fuzzy string searching in multilingual contexts where case sensitivity is not important.
countSubstrings
Returns the number of occurrences of a substring in a string.
Syntax
countSubstrings(haystack, needle[, start_pos])
Arguments
haystack
(String
): String in which to search.
needle
(String
): Substring to search for.
start_pos
(UInt
, optional): Position (1-based) in haystack
at which to start the search.
Returns
- The number of occurrences. (
UInt64
)
Examples
Count occurrences of “taco” in a string:
SELECT
countSubstrings('I love tacos. Tacos are great!', 'taco');
Result:
| countSubstrings('I love tacos. Tacos are great!', 'taco') |
|-----------------------------------------------------------|
| 2 |
Using the start_pos
argument:
SELECT
countSubstrings('Taco Tuesday: taco, taco, taco!', 'taco', 15);
Result:
| countSubstrings('Taco Tuesday: taco, taco, taco!', 'taco', 15) |
|-----------------------------------------------------------------|
| 3 |
countSubstringsCaseInsensitive
Returns how often a substring needle
occurs in a string haystack
, ignoring case.
Syntax
countSubstringsCaseInsensitive(haystack, needle[, start_pos])
Arguments
haystack
(String
): String in which the search is performed.
needle
(String
): Substring to be searched.
start_pos
(UInt
, optional): Position (1-based) in haystack
at which the search starts.
Returns
- The number of occurrences. (
UInt64
)
Examples
Query:
SELECT
countSubstringsCaseInsensitive('Crunchy Taco Supreme', 'taco');
Result:
| countSubstringsCaseInsensitive('Crunchy Taco Supreme', 'taco') |
|----------------------------------------------------------------|
| 1 |
Example with start_pos
argument:
Query:
SELECT
countSubstringsCaseInsensitive('taco_taco_burrito_taco', 'taco', 6);
Result:
| countSubstringsCaseInsensitive('taco_taco_burrito_taco', 'taco', 6) |
|----------------------------------------------------------------------|
| 2 |
In this example, the search starts from the 6th character, skipping the first occurrence of ‘taco’.
countSubstringsCaseInsensitiveUTF8
Returns how often a substring needle
occurs in a string haystack
. Ignores case and assumes that haystack
is a UTF-8 string.
Syntax
countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])
Arguments
haystack
(String
): UTF-8 string in which the search is performed.
needle
(String
): Substring to be searched.
start_pos
(UInt
, optional): Position (1-based) in haystack
at which the search starts.
Returns
- The number of occurrences. (
UInt64
)
Example
Query:
SELECT
countSubstringsCaseInsensitiveUTF8('Crunchy Taco, Soft Taco, Super Taco', 'taco');
Result:
| countSubstringsCaseInsensitiveUTF8('Crunchy Taco, Soft Taco, Super Taco', 'taco') |
|-----------------------------------------------------------------------------------|
| 3 |
Example with start_pos
argument:
Query:
SELECT
countSubstringsCaseInsensitiveUTF8('Crunchy Taco, Soft Taco, Super Taco', 'taco', 15);
Result:
| countSubstringsCaseInsensitiveUTF8('Crunchy Taco, Soft Taco, Super Taco', 'taco', 15) |
|---------------------------------------------------------------------------------------|
| 2 |
In this example, the function starts searching from the 15th character, which is after the first “Taco”, so it only counts the last two occurrences.
countMatches
Returns the number of regular expression matches for a pattern in a string.
Syntax:
countMatches(haystack, pattern)
Arguments:
haystack
(String
): The string to search in.
pattern
(String
): The regular expression with re2 syntax.
Returns:
- The number of matches. (
UInt64
)
Example:
SELECT
countMatches('Taco Tuesday: 2 tacos for $5, 3 tacos for $7', 'taco') AS match_count;
Result:
| match_count |
|-------------|
| 3 |
In this example, countMatches
finds three occurrences of the word “taco” (case-sensitive) in the given string about a Taco Tuesday special.
countMatchesCaseInsensitive
Returns the number of case-insensitive regular expression matches for a pattern in a string.
Syntax:
countMatchesCaseInsensitive(haystack, pattern)
Arguments:
haystack
(String
): The string to search in.
pattern
(String
): The regular expression with re2 syntax.
Returns:
- The number of matches. (
UInt64
)
Example:
SELECT
countMatchesCaseInsensitive('Crunchy Taco Supreme', 'taco');
Result:
| countMatchesCaseInsensitive('Crunchy Taco Supreme', 'taco') |
|-------------------------------------------------------------|
| 1 |
This function is similar to countMatches
, but it performs a case-insensitive search. It’s useful when you need to count occurrences of a pattern regardless of letter case, such as finding mentions of “taco” in menu item names.
Extracts the first string in a haystack that matches a regular expression pattern and corresponds to the specified regex group index.
Syntax:
regexpExtract(haystack, pattern[, index])
Alias:
- REGEXP_EXTRACT(haystack, pattern[, index])
Arguments:
haystack
(String
): String in which the regexp pattern will be matched.
pattern
(String
): Regular expression. Must be a constant string.
index
(UInt
or Int
, optional): An integer greater than or equal to 0, with a default value of 1. Represents which regex group to extract.
Returns:
- The extracted substring matching the specified group. (
String
)
- If
pattern
contains multiple regex groups, index
indicates which group to extract.
- An
index
of 0 means matching the entire regular expression.
Example:
SELECT
regexpExtract('Carne Asada Taco: $3.99, Fish Taco: $4.99', '(\w+)\s+Taco:\s+\$(\d+\.\d+)', 1) AS taco_type,
regexpExtract('Carne Asada Taco: $3.99, Fish Taco: $4.99', '(\w+)\s+Taco:\s+\$(\d+\.\d+)', 2) AS taco_price,
regexpExtract('Carne Asada Taco: $3.99, Fish Taco: $4.99', '(\w+)\s+Taco:\s+\$(\d+\.\d+)', 0) AS full_match;
Result:
| taco_type | taco_price | full_match |
|-------------|------------|-------------------------|
| Carne Asada | 3.99 | Carne Asada Taco: $3.99 |
In this example:
- The first
regexpExtract
call extracts the taco type (group 1).
- The second call extracts the price (group 2).
- The third call with index 0 returns the entire matched substring.
hasSubsequence
Returns whether a string contains a given subsequence.
Syntax
hasSubsequence(haystack, needle)
Arguments
haystack
(String
): String to search in.
needle
(String
): Subsequence to search for.
Returns
- 1 if
needle
is a subsequence of haystack
, 0 otherwise. [UInt8
]
A subsequence is a sequence that can be derived from the given string by deleting zero or more elements without changing the order of the remaining elements.
Example
SELECT
hasSubsequence('Taco Tuesday', 'aco');
Result:
| hasSubsequence('Taco Tuesday', 'aco') |
|---------------------------------------|
| 1 |
This example checks if ‘aco’ is a subsequence of ‘Taco Tuesday’. The result is 1 (true) because ‘aco’ can be derived from ‘Taco Tuesday’ by deleting some characters while maintaining the order of the remaining ones.
hasSubsequenceCaseInsensitive
Returns whether a string contains a given subsequence, ignoring case.
Syntax
hasSubsequenceCaseInsensitive(haystack, needle)
Arguments
haystack
(String
): The string to search in.
needle
(String
): The subsequence to search for.
Returns
1
if needle
is a subsequence of haystack
(ignoring case), 0
otherwise. [UInt8
]
Example
SELECT
hasSubsequenceCaseInsensitive('Crunchy Taco Supreme', 'taco');
Result:
| hasSubsequenceCaseInsensitive('Crunchy Taco Supreme', 'taco') |
|---------------------------------------------------------------|
| 1 |
This function is useful for case-insensitive substring matching, particularly when you need to check if a certain sequence of characters appears in a string, regardless of their case.
hasSubsequenceUTF8
Returns 1 if needle
is a subsequence of haystack
, or 0 otherwise. This function assumes haystack
and needle
are UTF-8 encoded strings. A subsequence is a sequence that can be derived from the given string by deleting zero or more elements without changing the order of the remaining elements.
Syntax
hasSubsequenceUTF8(haystack, needle)
Arguments
haystack
(String
): String in which the search is performed.
needle
(String
): Subsequence to be searched.
Returns
- 1, if
needle
is a subsequence of haystack
- 0, otherwise
Type: UInt8
Example
Query:
SELECT
hasSubsequenceUTF8('Taco Tuesday: Tasty Tacos!', 'Tasty Tacos');
Result:
| hasSubsequenceUTF8('Taco Tuesday: Tasty Tacos!', 'Tasty Tacos') |
|------------------------------------------------------------------|
| 1 |
This example checks if ‘Tasty Tacos’ is a subsequence of the given string, which it is, so the function returns 1.
hasSubsequenceCaseInsensitiveUTF8
Checks if a given substring is present as a case-insensitive subsequence in a UTF-8 encoded string.
Syntax
hasSubsequenceCaseInsensitiveUTF8(haystack, needle)
Arguments
haystack
(String
): The UTF-8 encoded string to search in.
needle
(String
): The UTF-8 encoded subsequence to search for.
Returns
1
if needle
is a case-insensitive subsequence of haystack
, 0
otherwise. [UInt8
]
Example
SELECT
hasSubsequenceCaseInsensitiveUTF8('Spicy Taco Tuesday', 'taco');
Result:
| hasSubsequenceCaseInsensitiveUTF8('Spicy Taco Tuesday', 'taco') |
|------------------------------------------------------------------|
| 1 |
This function is useful for case-insensitive substring matching in UTF-8 encoded text, such as finding menu items or ingredients in multilingual taco descriptions.
The function assumes both haystack
and needle
are valid UTF-8 encoded strings. If they are not, the behavior is undefined.
hasToken
Checks if a given token is present in a string.
Syntax
hasToken(haystack, token)
Arguments
haystack
(String
): String in which to search.
token
(String
): Token to search for. Must be a constant string.
Returns
- 1 if the token is present in the haystack.
- 0 if the token is not present.
Type: UInt8
Description
A token is defined as a maximal length substring between two non-alphanumeric ASCII characters (or the boundaries of the haystack string).
This function is optimized for performance and is supported by the tokenbf_v1
index specialization.
Example
SELECT
hasToken('Crunchy Taco Supreme', 'Taco');
Result:
| hasToken('Crunchy Taco Supreme', 'Taco') |
|------------------------------------------|
| 1 |
In this example, ‘Taco’ is found as a token within the ‘Crunchy Taco Supreme’ string, so the function returns 1.
hasTokenOrNull
Returns whether a given token is present in a string, with special handling for ill-formed tokens.
Syntax
hasTokenOrNull(haystack, token)
Arguments
haystack
(String
) — String in which to search.
token
(String
) — Token to search for. Must be a constant string.
Returns
1
if the token is present in the haystack.
0
if the token is not present in the haystack.
NULL
if the token is ill-formed.
[Nullable(UInt8)
]
Details
This function is similar to hasToken
, but returns NULL
instead of throwing an error when the token is ill-formed. A token is considered ill-formed if it contains non-alphanumeric ASCII characters.
The function is case-sensitive. For a case-insensitive version, use hasTokenCaseInsensitiveOrNull
.
This function is supported by the tokenbf_v1
index specialization.
Example
SELECT
hasTokenOrNull('Crunchy Taco Supreme', 'Taco');
Result:
| hasTokenOrNull('Crunchy Taco Supreme', 'Taco') |
|------------------------------------------------|
| 1 |
SELECT
hasTokenOrNull('Spicy Bean Burrito', 'Quesadilla');
Result:
| hasTokenOrNull('Spicy Bean Burrito', 'Quesadilla') |
|----------------------------------------------------|
| 0 |
SELECT
hasTokenOrNull('Nacho Cheese Doritos Locos Tacos', 'Doritos,Locos');
Result:
| hasTokenOrNull('Nacho Cheese Doritos Locos Tacos', 'Doritos,Locos') |
|---------------------------------------------------------------------|
| NULL |
In this last example, NULL
is returned because ‘Doritos,Locos’ is an ill-formed token due to the comma.
hasTokenCaseInsensitive
Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case.
Syntax:
hasTokenCaseInsensitive(haystack, token)
Arguments:
haystack
(String
): String in which the search is performed.
token
(String
): Maximal length substring between two non-alphanumeric ASCII characters (or boundaries of haystack).
Returns:
- 1 if the token is present in the haystack, 0 otherwise. (
UInt8
)
Implementation details:
Token must be a constant string. Supported by tokenbf_v1 index specialization.
Example:
SELECT
hasTokenCaseInsensitive('Crunchy Taco Supreme', 'taco');
Result:
| hasTokenCaseInsensitive('Crunchy Taco Supreme', 'taco')|
|--------------------------------------------------------|
| 1 |
In this example, the function returns 1 because ‘taco’ is present in the haystack ‘Crunchy Taco Supreme’, ignoring the case difference.
hasTokenCaseInsensitiveOrNull
Returns whether a given token is present in a string, ignoring case. Returns NULL
if the token is ill-formed.
Syntax
hasTokenCaseInsensitiveOrNull(haystack, token)
Arguments
haystack
(String
): String in which to search.
token
(String
): Token to search for. Must be a constant string.
Returns
1
if the token is present in the haystack (case-insensitive).
0
if the token is not present.
NULL
if the token is ill-formed.
Type: Nullable(UInt8)
- A token is defined as a maximal length substring between two non-alphanumeric ASCII characters (or the boundaries of the haystack).
- This function is supported by the
tokenbf_v1
index specialization.
Example
SELECT
hasTokenCaseInsensitiveOrNull('Crunchy Taco Supreme', 'taco');
Result:
| hasTokenCaseInsensitiveOrNull('Crunchy Taco Supreme', 'taco') |
|---------------------------------------------------------------|
| 1 |
SELECT
hasTokenCaseInsensitiveOrNull('Taco Bell Menu', 'burrito,taco');
Result:
| hasTokenCaseInsensitiveOrNull('Taco Bell Menu', 'burrito,taco') |
|------------------------------------------------------------------|
| NULL |
In this example, the second query returns NULL
because ‘burrito,taco’ is not a valid token (it contains a comma).