8.9. String Functions and Operators
String Operators
The ||
operator performs concatenation.
String Functions
Note
These functions assume that the input strings contain valid UTF-8 encoded
Unicode code points. There are no explicit checks for valid UTF-8 and
the functions may return incorrect results on invalid UTF-8.
Invalid UTF-8 data can be corrected with from_utf8()
.
Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.
The lower()
and upper()
functions do not perform
locale-sensitive, context-sensitive, or one-to-many mappings required for
some languages. Specifically, this will return incorrect results for
Lithuanian, Turkish and Azeri.
-
chr
(n) → varchar Returns the Unicode code point
n
as a single character string.
-
codepoint
(string) → integer Returns the Unicode code point of the only character of
string
.
-
concat
(string1, ..., stringN) → varchar Returns the concatenation of
string1
,string2
,...
,stringN
. This function provides the same functionality as the SQL-standard concatenation operator (||
).
-
hamming_distance
(string1, string2) → bigint Returns the Hamming distance of
string1
andstring2
, i.e. the number of positions at which the corresponding characters are different. Note that the two strings must have the same length.
-
length
(string) → bigint Returns the length of
string
in characters.
-
levenshtein_distance
(string1, string2) → bigint Returns the Levenshtein edit distance of
string1
andstring2
, i.e. the minimum number of single-character edits (insertions, deletions or substitutions) needed to changestring1
intostring2
.
-
lower
(string) → varchar Converts
string
to lowercase.
-
lpad
(string, size, padstring) → varchar Left pads
string
tosize
characters withpadstring
. Ifsize
is less than the length ofstring
, the result is truncated tosize
characters.size
must not be negative andpadstring
must be non-empty.
-
ltrim
(string) → varchar Removes leading whitespace from
string
.
-
replace
(string, search) → varchar Removes all instances of
search
fromstring
.
-
replace
(string, search, replace) → varchar Replaces all instances of
search
withreplace
instring
.If
search
is an empty string, insertsreplace
in front of every character and at the end of thestring
.
-
reverse
(string) → varchar Returns
string
with the characters in reverse order.
-
rpad
(string, size, padstring) → varchar Right pads
string
tosize
characters withpadstring
. Ifsize
is less than the length ofstring
, the result is truncated tosize
characters.size
must not be negative andpadstring
must be non-empty.
-
rtrim
(string) → varchar Removes trailing whitespace from
string
.
-
split
(string, delimiter) -> array(varchar) Splits
string
ondelimiter
and returns an array.
-
split
(string, delimiter, limit) -> array(varchar) Splits
string
ondelimiter
and returns an array of size at mostlimit
. The last element in the array always contain everything left in thestring
.limit
must be a positive number.
-
split_part
(string, delimiter, index) → varchar Splits
string
ondelimiter
and returns the fieldindex
. Field indexes start with1
. If the index is larger than than the number of fields, then null is returned.
-
split_to_map
(string, entryDelimiter, keyValueDelimiter) → map<varchar, varchar> Splits
string
byentryDelimiter
andkeyValueDelimiter
and returns a map.entryDelimiter
splitsstring
into key-value pairs.keyValueDelimiter
splits each pair into key and value. Note thatentryDelimiter
andkeyValueDelimiter
are interpreted literally, i.e., as full string matches.
-
split_to_map
(string, entryDelimiter, keyValueDelimiter, function(K, V1, V2, R)) → map<varchar, varchar> Splits
string
byentryDelimiter
andkeyValueDelimiter
and returns a map.entryDelimiter
splitsstring
into key-value pairs.keyValueDelimiter
splits each pair into key and value. Note thatentryDelimiter
andkeyValueDelimiter
are interpreted literally, i.e., as full string matches.function(K,V1,V2,R)
is invoked in cases of duplicate keys to resolve the value that should be in the map.SELECT(split_to_map(‘a:1;b:2;a:3’, ‘;’, ‘:’, (k, v1, v2) -> v1)); – {“a”: “1”, “b”: “2”} SELECT(split_to_map(‘a:1;b:2;a:3’, ‘;’, ‘:’, (k, v1, v2) -> CONCAT(v1, v2))); – {“a”: “13”, “b”: “2”}
-
split_to_multimap
(string, entryDelimiter, keyValueDelimiter) -> map(varchar, array(varchar)) Splits
string
byentryDelimiter
andkeyValueDelimiter
and returns a map containing an array of values for each unique key.entryDelimiter
splitsstring
into key-value pairs.keyValueDelimiter
splits each pair into key and value. The values for each key will be in the same order as they appeared instring
. Note thatentryDelimiter
andkeyValueDelimiter
are interpreted literally, i.e., as full string matches.
-
strpos
(string, substring) → bigint Returns the starting position of the first instance of
substring
instring
. Positions start with1
. If not found,0
is returned.
-
strpos
(string, substring, instance) → bigint Returns the position of the N-th
instance
ofsubstring
instring
.instance
must be a positive number. Positions start with1
. If not found,0
is returned.
-
strrpos
(string, substring) → bigint Returns the starting position of the last instance of
substring
instring
. Positions start with1
. If not found,0
is returned.
-
strrpos
(string, substring, instance) → bigint Returns the position of the N-th
instance
ofsubstring
instring
starting from the end of the string.instance
must be a positive number. Positions start with1
. If not found,0
is returned.
-
position
(substring IN string) → bigint Returns the starting position of the first instance of
substring
instring
. Positions start with1
. If not found,0
is returned.
-
substr
(string, start) → varchar Returns the rest of
string
from the starting positionstart
. Positions start with1
. A negative starting position is interpreted as being relative to the end of the string.
-
substr
(string, start, length) → varchar Returns a substring from
string
of lengthlength
from the starting positionstart
. Positions start with1
. A negative starting position is interpreted as being relative to the end of the string.
-
trim
(string) → varchar Removes leading and trailing whitespace from
string
.
-
upper
(string) → varchar Converts
string
to uppercase.
-
word_stem
(word) → varchar Returns the stem of
word
in the English language.
-
word_stem
(word, lang) → varchar Returns the stem of
word
in thelang
language.
Unicode Functions
-
normalize
(string) → varchar Transforms
string
with NFC normalization form.
-
normalize
(string, form) → varchar Transforms
string
with the specified normalization form.form
must be be one of the following keywords:Form Description NFD
Canonical Decomposition NFC
Canonical Decomposition, followed by Canonical Composition NFKD
Compatibility Decomposition NFKC
Compatibility Decomposition, followed by Canonical Composition Note
This SQL-standard function has special syntax and requires specifying
form
as a keyword, not as a string.
-
to_utf8
(string) → varbinary Encodes
string
into a UTF-8 varbinary representation.
-
from_utf8
(binary) → varchar Decodes a UTF-8 encoded string from
binary
. Invalid UTF-8 sequences are replaced with the Unicode replacement characterU+FFFD
.
-
from_utf8
(binary, replace) → varchar Decodes a UTF-8 encoded string from
binary
. Invalid UTF-8 sequences are replaced with replace. The replacement string replace must either be a single character or empty (in which case invalid characters are removed).