ClickHouse function reference
position
Returns the position (in bytes, starting from 1) of a substring in a string. Syntaxposition(needle IN haystack)
haystack
(String
): String to search in.needle
(String
): Substring to search for.start_pos
(UInt
, optional): Position (1-based) inhaystack
to start the search.
- Starting position in bytes (counting from 1) if the substring was found. (
UInt64
) - 0 if the substring was not found. (
UInt64
)
needle
is empty, these rules apply:
- If no
start_pos
specified: return 1 - If
start_pos = 0
: return 1 - If
start_pos >= 1
andstart_pos <= length(haystack) + 1
: returnstart_pos
- Otherwise: return 0
- ‘taco’ is found at position 13
- ‘burrito’ is not found, so 0 is returned
- Starting from position 10, the next ‘a’ is found at position 15
locate
Returns the position of a substring within a string. Syntax:needle
(String
): Substring to search for.haystack
(String
): String to search in.start_pos
(UInt
, optional): Position to start the search from (1-based).
- The position (in bytes) of the first occurrence of
needle
inhaystack
, starting from 1. - 0 if the substring is not found.
- If
needle
is empty, the function follows these rules:- If no
start_pos
is specified: returns 1 - If
start_pos = 0
: returns 1 - If
start_pos >= 1
andstart_pos <= length(haystack) + 1
: returnsstart_pos
- Otherwise: returns 0
- If no
- ‘salsa’ is found at position 7 in the string.
- ‘guac’ is not found, so it returns 0.
- ‘a’ is found at position 13 when starting the search from position 8.
The
locate
function is case-sensitive. For case-insensitive searches, consider using locateCaseInsensitive
or locateCaseInsensitiveUTF8
functions.positionCaseInsensitive
Returns the position (in bytes, starting from 1) of a substring in a string, ignoring case. Syntaxhaystack
(String
): String to search in.needle
(String
): Substring to search for.start_pos
(UInt
, optional): Position to start the search from (1-based).
- The position (in bytes) of the found substring, starting from 1.
- 0 if the substring was not found.
The function follows the same rules for empty substrings as the
position
function. If the needle
is an empty string, these rules apply:- If no
start_pos
was specified: return 1 - If
start_pos = 0
: return 1 - If
start_pos >= 1
andstart_pos <= length(haystack) + 1
: returnstart_pos
- Otherwise: return 0
positionUTF8
Returns the position (in Unicode code points, starting from 1) of a substring in a string, assuming both the substring and the string are UTF-8 encoded. Syntaxhaystack
(String
): The string to search in.needle
(String
): The substring to search for.start_pos
(UInt
, optional): The position (1-based) inhaystack
at which the search starts.
- The starting position in Unicode code points (1-based) if the substring was found.
- 0 if the substring was not found.
positionUTF8
correctly counts the Unicode character ‘ñ’ as a single code point, returning its position as 6.
If the
needle
substring is empty, these rules apply:- If no
start_pos
was specified: return 1 - If
start_pos = 0
: return 1 - If
start_pos >= 1
andstart_pos <= length(haystack) + 1
: returnstart_pos
- Otherwise: return 0
positionCaseInsensitiveUTF8
Returns the position (in Unicode code points) of a substring in a string, ignoring case and assuming UTF-8 encoding. Syntaxhaystack
(String
): The string to search in.needle
(String
): The substring to search for.start_pos
(UInt
, optional): The position to start searching from (1-based).
- The position (in Unicode code points) of the first occurrence of
needle
inhaystack
starting fromstart_pos
, if found. - 0, if
needle
is not found.
UInt64
Example
- The function is case-insensitive, so ‘A’ and ‘a’ are considered the same.
- The function counts positions in Unicode code points, not bytes.
- If
needle
is empty, the function returns 1. - If
start_pos
is 0, it’s treated as 1. - If
start_pos
is greater than the length ofhaystack
, the function returns 0.
multiSearchAllPositions
Returns an array of positions (in bytes, starting from 1) of substrings in a string. Syntaxhaystack
(String
): String to search in.needle
(Array(String)
): Array of substrings to search for.
- Array of positions.
- For each needle:
- If found, the position (in bytes from 1) of the first occurrence.
- If not found, 0.
Array(UInt64)
Example
Query:
- ‘taco’ is found at position 13
- ‘salsa’ and ‘guacamole’ are not found, so their positions are 0
The function supports up to 2^8 needles.
multiSearchAllPositionsCaseInsensitive
Returns an array of positions (in bytes, starting from 1) of all occurrences of the needle substrings in the haystack string, ignoring case. Syntaxhaystack
(String
): String in which to search.needle
(Array(String)
): Array of substrings to search for.
- Array of the same size as the
needle
array, where each element is:- The position (in bytes) of the corresponding needle in haystack, if found.
- 0 if the substring was not found.
Array(UInt64)
]
Example
- ‘THE’ is found at position 1 (case-insensitive match with ‘The’)
- ‘dog’ is found at position 41
- ‘quick’ is found at position 5
- ‘TACO’ is not found, so its position is 0
The function supports up to 2^8 (256) needles.
multiSearchAllPositionsUTF8
Searches for multiple substrings in a UTF-8 encoded string and returns an array of positions. Syntaxhaystack
(String
): UTF-8 encoded string in which to search.needle
(Array(String)
): Array of UTF-8 encoded substrings to search for.
- An array of the same size as the
needle
array, where each element is:- The position (in bytes, starting from 1) of the corresponding
needle
if found. - 0 if the substring was not found.
- The position (in bytes, starting from 1) of the corresponding
Array(UInt64)
]
Example
- ‘jalapeño’ is found at position 7
- ‘habanero’ is found at position 21
- ‘salsa’ is not found, so its position is 0
- This function is UTF-8 aware, so it correctly handles multi-byte characters.
- The function supports up to 2^8 needles.
- If any non-constant
haystack
orneedle
arguments are more than 32KB in size, the function will throw an exception.
multiSearchAllPositionsCaseInsensitiveUTF8
Returns an array of positions (in bytes, starting from 1) of all occurrences of multiple needle substrings in a haystack string. The search is case-insensitive and assumes UTF-8 encoding. Syntaxhaystack
(String
): UTF-8 encoded string in which the search is performed.needle
(Array(String)
): Array of UTF-8 encoded substrings to be searched.
- Array of arrays, where each inner array contains:
- The starting positions (in bytes, counting from 1) of the corresponding needle if found.
- 0 if the needle was not found.
Array(Array(UInt64))
Example
- ‘taco’ is found at position 7 (case-insensitive)
- ‘SPICY’ is found at position 1 (case-insensitive)
- ‘wednesday’ is not found, so 0 is returned
- This function supports up to 2^8 needles.
- The function is optimized for searching multiple substrings in a single pass through the haystack.
- It’s particularly useful for tasks like keyword matching or filtering based on multiple criteria.
multiSearchFirstPosition
Returns the position of the leftmost occurrence of any of the substrings in the haystack string. Syntaxhaystack
(String
): String in which to search.needle
(Array(String)
): Array of substrings to search for.
- The position (in bytes) of the leftmost occurrence of any needle substring in the haystack string.
- 0 if none of the needles are found.
UInt64
Example
This function is case-sensitive. For case-insensitive searches, use
multiSearchFirstPositionCaseInsensitive
.multiSearchFirstPositionCaseInsensitive
Returns the position of the first occurrence of any of the specified substrings in the given string, ignoring case. The search is performed case-insensitively. Syntaxhaystack
(String
): The string to search in.needle
(Array(String)
): Array of substrings to search for.
- The position (in bytes) of the first occurrence of any matched substring (1-based).
- 0 if none of the substrings are found.
UInt64
Example
This function is useful for efficiently finding the first occurrence of multiple substrings in a given text, without being sensitive to letter case.
multiSearchFirstPositionUTF8
Returns the position of the first occurrence of any of the substrings in the string, assuming UTF-8 encoding. Syntaxhaystack
(String
): The UTF-8 encoded string to search in.needle
(Array(String)
): An array of UTF-8 encoded substrings to search for.
- The position (in bytes) of the leftmost match of any of the needles within the haystack.
- 0 if none of the needles are found.
UInt64
Example
- This function is case-sensitive.
- If you need case-insensitive search, use
multiSearchFirstPositionCaseInsensitiveUTF8
instead. - The function returns the position in bytes, not in characters, which is important for UTF-8 strings where characters can occupy multiple bytes.
- The function supports up to 2^8 needles in the array.
multiSearchFirstPositionCaseInsensitiveUTF8
Returns the position of the first occurrence of any of the substrings in the string, ignoring case and assuming UTF-8 encoding. Syntaxhaystack
(String
): The UTF-8 encoded string to search in.needle
(Array(String)
): Array of UTF-8 encoded substrings to search for.
- The position (in bytes) of the leftmost match of any of the needles.
- 0 if none of the needles are found.
UInt64
Example
- The function is case-insensitive, so ‘TACO’ matches ‘Tacos’.
- The search is performed on UTF-8 encoded strings, so it correctly handles multi-byte characters.
- If none of the needles are found, the function returns 0.
- The function supports up to 2^8 needles in the array.
multiSearchFirstIndex
Returns the index of the first matching needle string found in the haystack string. Syntaxhaystack
(String
): String to search in.needle
(Array(String)
): Array of strings to search for.
- The index (1-based) of the first matching needle string.
- 0 if none of the needle strings are found.
UInt8
Example
- The search is case-sensitive.
- If you need case-insensitive search, use
multiSearchFirstIndexCaseInsensitive
instead. - For UTF-8 encoded strings, use
multiSearchFirstIndexUTF8
. - The function returns as soon as it finds the first match, which can be more efficient than searching for all occurrences.
multiSearchFirstIndexCaseInsensitive
Returns the index of the first matched needle in the haystack string, ignoring case. The index is 1-based, meaning the first position is 1, not 0. Syntaxhaystack
(String
): The string to search in.needle
(Array(String)
): Array of substrings to search for.
- The index of the first matched needle (1-based).
- 0 if none of the needles are found.
UInt8
Example
multiSearchFirstIndexUTF8
Returns the index of the first found needle in the haystack string, assuming both are UTF-8 encoded. The index is 1-based, meaning the first position is 1, not 0. Syntaxhaystack
(String
): UTF-8 encoded string in which to search.needle
(Array(String)
): Array of UTF-8 encoded substrings to search for.
- The index (starting from 1) of the leftmost found needle.
- 0 if none of the needles are found.
UInt8
Example
This function is case-sensitive. For a case-insensitive version, use
multiSearchFirstIndexCaseInsensitiveUTF8
.multiSearchFirstIndexCaseInsensitiveUTF8
Returns the index of the first found needle from the array in the haystack string, ignoring case and assuming UTF-8 encoding. Syntaxhaystack
(String
): UTF-8 encoded string in which to search.needle
(Array(String)
): Array of UTF-8 encoded strings to search for.
- Index (starting from 1) of the leftmost found needle.
- 0 if none of the needles are found.
UInt8
Example
Query:
This function is useful for efficiently finding the first occurrence of multiple strings within a larger text, while ignoring case differences and properly handling UTF-8 encoded strings.
multiSearchAny
Returns 1 if at least one of the strings in theneedles
array matches the haystack
string, and 0 otherwise.
Syntax
haystack
(String
): String in which to search.needles
(Array(String)
): Array of substrings to search for.
- 1 if there was at least one match.
- 0 if there was no match.
UInt8
Example
This function is case-sensitive. For case-insensitive searches, use
multiSearchAnyCaseInsensitive
. For UTF-8 encoded strings, use multiSearchAnyUTF8
or multiSearchAnyCaseInsensitiveUTF8
.multiSearchAnyCaseInsensitive
Returns 1 if at least one of the strings in theneedles
array matches the haystack
string case-insensitively, and 0 otherwise.
Syntax
haystack
(String
): String in which to search.needles
(Array(String)
): Array of strings to search for.
- 1 if there was at least one case-insensitive match.
- 0 if there was no match.
UInt8
Example
Query:
This function is useful for efficient case-insensitive searches of multiple substrings within a larger string. It’s particularly handy when dealing with user input or when case sensitivity is not important in your search criteria.
multiSearchAnyUTF8
Checks if any of the substrings from theneedles
array are present in the haystack
string. Assumes both haystack
and needles
are UTF-8 encoded strings.
Syntax
haystack
(String
): UTF-8 encoded string to search in.needles
(Array(String)
): Array of UTF-8 encoded substrings to search for.
- 1 if at least one match is found.
- 0 if no matches are found.
UInt8
Example
Query:
This function is case-sensitive. For a case-insensitive version, use
multiSearchAnyCaseInsensitiveUTF8
.multiSearchAnyCaseInsensitiveUTF8
Checks if any of the UTF-8 encoded substrings are present in the given UTF-8 encoded string, ignoring case. Syntaxhaystack
(String
): UTF-8 encoded string in which to search.needle
(Array(String)
): Array of UTF-8 encoded substrings to search for.
- 1 if any of the substrings are found (ignoring case).
- 0 if none of the substrings are found.
UInt8
Example
Query:
This function is case-insensitive and works with UTF-8 encoded strings. It’s useful for checking the presence of multiple substrings in a given text, regardless of their case or Unicode representation.
match
Checks whether a string matches a regular expression pattern. SyntaxREGEXP
haystack
(String
): The string to search in.pattern
(String
): The regular expression pattern with re2 syntax.
1
if the string matches the pattern.0
if the string does not match the pattern.
UInt8
]
- The function uses UTF-8 encoding. The regular expression must not contain null bytes.
- Unlike re2’s default behavior,
.
matches line breaks. To disable this, prepend the pattern with(?-s)
. - For simple substring searches, consider using
like
orposition
functions for better performance.
REGEXP
operator is an alias for match
:
multiMatchAllIndices
Returns an array of all indices that match the haystack string using multiple regular expression patterns. Syntaxhaystack
(String
): String to search in.pattern1, pattern2, ..., patternN
(Array(String)
): Regular expressions with re2 syntax.
- Array of indices (1-based) of patterns that match the haystack, in the order they appear in the haystack. (
Array(UInt64)
)
- Index 1 corresponds to ‘quick’
- Index 2 corresponds to ‘brown’
- Index 8 corresponds to ‘lazy’
- ‘taco’ is not found, so it’s not included in the result
extract
Extracts a fragment of a string using a regular expression. If the haystack does not match the pattern, an empty string is returned. For regex without subpatterns, the function uses the fragment that matches the entire regex. Otherwise, it uses the fragment that matches the first subpattern. Syntax:haystack
(String
): String in which the search is performed.pattern
(String
): Regular expression with re2 syntax.
- The matched fragment, or an empty string if no match is found. [
String
]
extract
finds the price pattern in the taco description string.
If you only want to search for substrings in a string, consider using functions like
position
or like
instead, as they are generally faster than regular expression-based functions.extractAll
Extracts all fragments of a string that match a regular expression pattern. Syntax:haystack
(String
) - The string to search in.pattern
(String
) - The regular expression pattern to match.
Array(String)
]
Example:
extractAll
finds all words ending in “dillas” or “dilla” in the given string about Mexican food.
The regular expression pattern uses
\w+
to match one or more word characters, followed by “dillas” with an optional “s” at the end. The backslashes are escaped in the SQL string.extractAllGroupsHorizontal
Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc. Syntax:haystack
(String
) — Input string.pattern
(String
) — Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown.
- Array of arrays of matches. (
Array
)
If
haystack
does not match the pattern regex, an array of empty arrays is returned.- The first array contains all matched taco names:
['Carne asada','Al pastor','Chorizo']
- The second array contains all matched prices:
['10','12','11']
extractAllGroupsVertical
Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the haystack. Syntax:haystack
(String
): Input string.pattern
(String
): Regular expression with re2 syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown.
Array
)
If haystack does not match the pattern regex, an empty array is returned.
- The function extracts all price-item pairs from the taco special menu.
- Each inner array contains the price (first group) and the item name (second group).
- The results are grouped vertically, meaning each inner array represents a complete match of both groups.
like
Checks if a string matches a pattern. Syntaxhaystack LIKE pattern
(operator)
haystack
(String
): The string to search in.pattern
(String
): The LIKE pattern to match against.
1
if the string matches the pattern,0
otherwise. [UInt8
]
- The
LIKE
pattern can contain the following special characters:%
matches any number of characters (including zero characters)_
matches any single character\
escapes special characters (%
,_
, and\
)
- Matching is based on UTF-8 encoding
- For patterns of the form
%needle%
, the function is as fast as theposition
function - Other patterns are internally converted to regular expressions
To match against literal
%
, _
, and \
characters, prepend them with a backslash: \%
, \_
, and \\
. Note that in SQL strings, backslashes must be escaped as well, so you would actually write \\%
, \\_
, and \\\\
.notLike
Returns whether a string does not match a pattern. This function is the negation of thelike
function.
Syntax:
haystack NOT LIKE pattern
(operator)
haystack
(String
): The string to search in.pattern
(String
): The pattern to search for, which can contain the following wildcards:%
— Matches any number of characters (including zero characters)_
— Matches any single character
1
if the string does not match the pattern.0
if the string matches the pattern.
UInt8
]
Example:
'Spicy Beef Taco'
matches the pattern'Spicy%'
, sonotLike
returns0
.'Veggie Taco'
does not match the pattern'Spicy%'
, sonotLike
returns1
.
The
notLike
function is case-sensitive. For case-insensitive matching, use notILike
.ilike
Performs a case-insensitive pattern matching operation. Syntax:haystack ILIKE pattern
(operator)
haystack
(String
): The string to search in.pattern
(String
): The LIKE pattern to match against.
UInt8
]
Example:
ilike
checks if each taco_name
contains the word “spicy” (case-insensitive). The function returns 1 for names containing “spicy” in any case, and 0 otherwise.
The ilike
function is similar to like
, but it ignores case when matching. It’s particularly useful when you need to perform case-insensitive string matching operations in your queries.
The
ilike
function follows the same pattern matching rules as like
, where %
matches any number of characters and _
matches any single character.notILike
Returns whether the stringhaystack
does not match the case-insensitive LIKE expression pattern
.
Syntax:
haystack NOT ILIKE pattern
(operator)
haystack
(String
): The string to search in.pattern
(String
): The LIKE expression pattern to match against.
1
ifhaystack
does not matchpattern
,0
otherwise. [UInt8
]
notILike
returns 0
because ‘Spicy Taco’ does match the case-insensitive pattern ‘%taco%’.
- The
ILIKE
pattern matching is case-insensitive. %
matches any number of characters (including zero characters)._
matches any single character.- To escape special characters (
%
,_
,\
), precede them with a backslash (\
). - This function is the negation of the
ilike
function.
When working with literal backslashes in SQL strings, you may need to escape them twice: once for the SQL string literal, and once for the LIKE pattern. For example, to match a literal underscore, you might use
'\\\_'
.ngramDistance
Calculates the 4-gram distance between two strings. Returns a value between 0 and 1, where a smaller value indicates more similar strings. Syntaxhaystack
(String
) — First string for comparison.needle
(String
) — Second string for comparison.
- A value between 0 and 1 representing the similarity between the two strings. (
Float32
)
This function will throw an exception if constant
needle
or haystack
arguments are more than 32Kb in size. For non-constant arguments larger than 32Kb, the distance is always 1.ngramDistanceCaseInsensitive
ngramDistanceUTF8
ngramDistanceCaseInsensitiveUTF8
ngramDistanceCaseInsensitive
Calculates the 4-gram distance between two strings, ignoring case. Returns a value between 0 and 1, where 0 indicates identical strings and 1 indicates completely different strings. Syntaxhaystack
(String
): First string for comparison.needle
(String
): Second string for comparison.
- The 4-gram distance between the strings, ignoring case. (
Float32
)
- The function is case-insensitive, so “Taco” and “taco” are treated as identical.
- If constant arguments are more than 32KB in size, an exception is thrown.
- For non-constant arguments larger than 32KB, the distance is always 1.
ngramDistanceUTF8
Calculates the 4-gram distance between two UTF-8 encoded strings. Syntax:haystack
(String
): First UTF-8 encoded string for comparison.needle
(String
): Second UTF-8 encoded string for comparison.
- A value between 0 and 1 representing the similarity between the two strings. The smaller the result, the more similar the strings are to each other. [
Float32
]
If constant
haystack
or needle
arguments are more than 32KB in size, an exception will be thrown. For non-constant arguments exceeding 32KB, the distance is always 1.ngramDistanceCaseInsensitiveUTF8
Calculates the 4-gram distance between two UTF-8 encoded strings, ignoring case. Returns a value between 0 and 1, where 0 indicates identical strings and 1 indicates completely different strings. Syntaxhaystack
(String
): First UTF-8 encoded string for comparison.needle
(String
): Second UTF-8 encoded string for comparison.
- A value between 0 and 1 representing the similarity between the two strings. [
Float32
]
- This function uses the 3-gram distance for UTF-8 strings.
- The comparison is not a perfectly fair n-gram distance. It uses 2-byte hashes to hash n-grams and then calculates the symmetric difference between these hash tables, which may result in collisions.
- For case-insensitive UTF-8 comparison, it doesn’t use a fair
tolower
function. Instead, it zeroes the 5th bit (starting from zero) of each codepoint byte and the first bit of the zeroth byte if there are more than one byte. This works for Latin and most Cyrillic letters.
ngramDistance
function instead.
ngramSearch
Calculates the non-symmetric difference between a needle string and a haystack string. This function is useful for fuzzy string search. Syntaxhaystack
(String
): String in which the search is performed.needle
(String
): String to be searched.
- A value between 0 and 1 representing the likelihood of the needle being in the haystack. The bigger the result, the more likely needle is in haystack. (
Float32
)
The function calculates the number of n-grams from the needle minus the common number of n-grams normalized by the number of needle n-grams. It uses 4-grams by default.
This function is part of a family of ngram functions. Case-insensitive and UTF-8 variants are available:
ngramSearchCaseInsensitive
, ngramSearchUTF8
, and ngramSearchCaseInsensitiveUTF8
.ngramSearchCaseInsensitive
Calculates the non-symmetric difference between a needle string and a haystack string, ignoring case. Returns a value between 0 and 1, where a higher value indicates a higher likelihood of the needle being in the haystack. This function is useful for fuzzy string searching. Syntaxhaystack
(String
): String in which to search.needle
(String
): Substring to search for.
- A value between 0 and 1. (
Float32
)- Closer to 0: Strings are less similar.
- Closer to 1: Strings are more similar or likely to contain the needle.
- The function uses 4-grams for comparison.
- For UTF-8 strings, consider using
ngramSearchCaseInsensitiveUTF8
instead. - If constant needle or haystack arguments are more than 32KB in size, an exception is thrown.
- For non-constant arguments larger than 32KB, the result is always 1.
ngramSearch
, ngramSearchUTF8
, ngramSearchCaseInsensitiveUTF8
ngramSearchUTF8
Calculates the non-symmetric difference between a needle string and a haystack string, assuming both are UTF-8 encoded. This function is useful for fuzzy string searching. Syntaxhaystack
(String
): UTF-8 encoded string in which to search.needle
(String
): UTF-8 encoded string to search for.
- A value between 0 and 1 representing the likelihood of the needle being in the haystack. The larger the result, the more likely the needle is present. (
Float32
)
- This function uses 3-gram distances for UTF-8 strings.
- The implementation uses 2-byte hashes to hash n-grams and calculates the non-symmetric difference between these hash tables.
- Collisions may occur, which can affect the accuracy of the result.
This function is more computationally intensive than exact string matching functions. Use it when fuzzy matching is required, such as for user input correction or similarity searches in text data.
ngramSearchCaseInsensitiveUTF8
Performs a case-insensitive n-gram search on UTF-8 encoded strings, calculating the likelihood of a needle being present in a haystack. Syntaxhaystack
(String
): UTF-8 encoded string to search in.needle
(String
): UTF-8 encoded substring to search for.
- A value between 0 and 1 representing the likelihood of the needle being in the haystack. (
Float32
)
- This function uses 3-gram distances for UTF-8 strings.
- The case-insensitive comparison is not perfectly fair for all Unicode characters.
- For non-Latin scripts, the function may not provide accurate results in all cases.
countSubstrings
Returns the number of occurrences of a substring in a string. Syntaxhaystack
(String
): String in which to search.needle
(String
): Substring to search for.start_pos
(UInt
, optional): Position (1-based) inhaystack
at which to start the search.
- The number of occurrences. (
UInt64
)
start_pos
argument:
This function is case-sensitive. For case-insensitive counting, use
countSubstringsCaseInsensitive
.countSubstringsCaseInsensitive
Returns how often a substringneedle
occurs in a string haystack
, ignoring case.
Syntax
haystack
(String
): String in which the search is performed.needle
(String
): Substring to be searched.start_pos
(UInt
, optional): Position (1-based) inhaystack
at which the search starts.
- The number of occurrences. (
UInt64
)
start_pos
argument:
Query:
countSubstringsCaseInsensitiveUTF8
Returns how often a substringneedle
occurs in a string haystack
. Ignores case and assumes that haystack
is a UTF-8 string.
Syntax
haystack
(String
): UTF-8 string in which the search is performed.needle
(String
): Substring to be searched.start_pos
(UInt
, optional): Position (1-based) inhaystack
at which the search starts.
- The number of occurrences. (
UInt64
)
start_pos
argument:
Query:
countMatches
Returns the number of regular expression matches for a pattern in a string. Syntax:haystack
(String
): The string to search in.pattern
(String
): The regular expression with re2 syntax.
- The number of matches. (
UInt64
)
countMatches
finds three occurrences of the word “taco” (case-sensitive) in the given string about a Taco Tuesday special.
The function is case-sensitive. For case-insensitive matching, use
countMatchesCaseInsensitive
.countMatchesCaseInsensitive
Returns the number of case-insensitive regular expression matches for a pattern in a string. Syntax:haystack
(String
): The string to search in.pattern
(String
): The regular expression with re2 syntax.
- The number of matches. (
UInt64
)
countMatches
, but it performs a case-insensitive search. It’s useful when you need to count occurrences of a pattern regardless of letter case, such as finding mentions of “taco” in menu item names.
regexpExtract
Extracts the first string in a haystack that matches a regular expression pattern and corresponds to the specified regex group index. Syntax:- REGEXP_EXTRACT(haystack, pattern[, index])
haystack
(String
): String in which the regexp pattern will be matched.pattern
(String
): Regular expression. Must be a constant string.index
(UInt
orInt
, optional): An integer greater than or equal to 0, with a default value of 1. Represents which regex group to extract.
- The extracted substring matching the specified group. (
String
)
- If
pattern
contains multiple regex groups,index
indicates which group to extract. - An
index
of 0 means matching the entire regular expression.
- The first
regexpExtract
call extracts the taco type (group 1). - The second call extracts the price (group 2).
- The third call with index 0 returns the entire matched substring.
hasSubsequence
Returns whether a string contains a given subsequence. Syntaxhaystack
(String
): String to search in.needle
(String
): Subsequence to search for.
- 1 if
needle
is a subsequence ofhaystack
, 0 otherwise. [UInt8
]
This function is case-sensitive. For case-insensitive matching, use
hasSubsequenceCaseInsensitive
.hasSubsequenceCaseInsensitive
Returns whether a string contains a given subsequence, ignoring case. Syntaxhaystack
(String
): The string to search in.needle
(String
): The subsequence to search for.
1
ifneedle
is a subsequence ofhaystack
(ignoring case),0
otherwise. [UInt8
]
hasSubsequenceUTF8
Returns 1 ifneedle
is a subsequence of haystack
, or 0 otherwise. This function assumes haystack
and needle
are UTF-8 encoded strings. A subsequence is a sequence that can be derived from the given string by deleting zero or more elements without changing the order of the remaining elements.
Syntax
haystack
(String
): String in which the search is performed.needle
(String
): Subsequence to be searched.
- 1, if
needle
is a subsequence ofhaystack
- 0, otherwise
UInt8
Example
Query:
hasSubsequenceCaseInsensitiveUTF8
Checks if a given substring is present as a case-insensitive subsequence in a UTF-8 encoded string. Syntaxhaystack
(String
): The UTF-8 encoded string to search in.needle
(String
): The UTF-8 encoded subsequence to search for.
1
ifneedle
is a case-insensitive subsequence ofhaystack
,0
otherwise. [UInt8
]
The function assumes both
haystack
and needle
are valid UTF-8 encoded strings. If they are not, the behavior is undefined.hasToken
Checks if a given token is present in a string. Syntaxhaystack
(String
): String in which to search.token
(String
): Token to search for. Must be a constant string.
- 1 if the token is present in the haystack.
- 0 if the token is not present.
UInt8
Description
A token is defined as a maximal length substring between two non-alphanumeric ASCII characters (or the boundaries of the haystack string).
This function is optimized for performance and is supported by the tokenbf_v1
index specialization.
Example
The function is case-sensitive. Use
hasTokenCaseInsensitive
for case-insensitive matching.hasTokenOrNull
Returns whether a given token is present in a string, with special handling for ill-formed tokens. Syntaxhaystack
(String
) — String in which to search.token
(String
) — Token to search for. Must be a constant string.
1
if the token is present in the haystack.0
if the token is not present in the haystack.NULL
if the token is ill-formed.
Nullable(UInt8)
]
Details
This function is similar to hasToken
, but returns NULL
instead of throwing an error when the token is ill-formed. A token is considered ill-formed if it contains non-alphanumeric ASCII characters.
The function is case-sensitive. For a case-insensitive version, use hasTokenCaseInsensitiveOrNull
.
This function is supported by the tokenbf_v1
index specialization.
Example
NULL
is returned because ‘Doritos,Locos’ is an ill-formed token due to the comma.
hasTokenCaseInsensitive
Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case. Syntax:haystack
(String
): String in which the search is performed.token
(String
): Maximal length substring between two non-alphanumeric ASCII characters (or boundaries of haystack).
- 1 if the token is present in the haystack, 0 otherwise. (
UInt8
)
hasTokenCaseInsensitiveOrNull
Returns whether a given token is present in a string, ignoring case. ReturnsNULL
if the token is ill-formed.
Syntax
haystack
(String
): String in which to search.token
(String
): Token to search for. Must be a constant string.
1
if the token is present in the haystack (case-insensitive).0
if the token is not present.NULL
if the token is ill-formed.
Nullable(UInt8)
- A token is defined as a maximal length substring between two non-alphanumeric ASCII characters (or the boundaries of the haystack).
- This function is supported by the
tokenbf_v1
index specialization.
NULL
because ‘burrito,taco’ is not a valid token (it contains a comma).