Thursday, August 07, 2025

The LENGTH() function - or - SQL is SQL so it's all interchangeable am-I-right?

Let's test the LENGTH() vs. CHAR_LENGTH() function on some emoji characters in MySQL and Postgres: 

mysql> SELECT x FROM emoji;
+--------------+
| x |
+--------------+
| 😄👋👍 |
+--------------+
1 row in set (0.02 sec) 

mysql> SELECT LENGTH(x) AS len, CHAR_LENGTH(x) AS char_len FROM emoji; +------+----------+ | len | char_len | +------+----------+ | 12 | 3 | +------+----------+ 1 row in set (0.02 sec) 

---

postgres=# SELECT x FROM emoji;
x
--------
😄👋👍
(1 row) 

postgres=# SELECT LENGTH(x) AS len, CHAR_LENGTH(x) AS char_len FROM emoji; len | char_len -----+---------- 3 | 3 (1 row) 

 

Expanded thoughts on the subject:

LENGTH() is the most common function we tend to reach for when we want the length of a string - typically we want number of characters and not bytes, at least in my experience. But LENGTH() gets number of bytes in MySQL, and on the other hand, number of characters in Postgres.

How could they be so different?!? Here's the thing: LENGTH() is not part of standard SQL, if that even matters since implementations deviate from it so much.

At least, I looked into the few documents freely available, and there IS no LENGTH() function in standard SQL. Rather, there are: CHARACTER_LENGTH() (aka CHAR_LENGTH()), OCTET_LENGTH(), and BIT_LENGTH()

If you test ONLY these standard functions (or "length expressions" as standard SQL puts it), you find the results do match between MySQL and Postgres:

mysql> SELECT CHAR_LENGTH(x), CHARACTER_LENGTH(x), BIT_LENGTH(x), OCTET_LENGTH(x) FROM emoji;
+----------------+---------------------+---------------+-----------------+
| CHAR_LENGTH(x) | CHARACTER_LENGTH(x) | BIT_LENGTH(x) | OCTET_LENGTH(x) |
+----------------+---------------------+---------------+-----------------+
|              3 |                   3 |            96 |              12 |
+----------------+---------------------+---------------+-----------------+
1 row in set (0.02 sec)


and:

postgres=# SELECT CHAR_LENGTH(x), CHARACTER_LENGTH(x), BIT_LENGTH(x), OCTET_LENGTH(x) FROM emoji;
 char_length | character_length | bit_length | octet_length
-------------+------------------+------------+--------------
           3 |                3 |         96 |           12
(1 row)


More fun facts:

Microsoft loves using its own special Microsofty style functions, for Microsofty consistency purposes, so it has no LENGTH(), only LEN(), just like the Len() function you might have used in Excel, or in Visual Basic in 1995. (See also DateDiff() and others.) 

  • Oracle has LENGTH() (number of chars in the string), and LENGTHB() (number of bytes in the string)
  • In Snowflake and Databricks/Spark SQL, LENGTH() returns number of chars.
  • So, it appears that MySQL is the odd one out.


This can be a particularly hairy gotcha if you are porting SQL queries from MySQL to another DBMS or vice versa. It may elude testing even, since most of our strings, generally speaking, tend not to include "special" characters such as Asian characters or emoji. But they do crop up on occasion. So those occasions may potentially break things. 






 

 

No comments: