MySQL Basic Data Types

Nov 08, 2023

Choosing the right data type for a column is important for designing a good MySQL table. In general, a simple and small type is the first choice since operations over a simple and small type cost fewer CPU resources and a small type also requires less space to store or cache. However, the chosen type must have a range big enough to cover all possible values generated from the application by now or in the near future.

This note is about basic data types in MySQL, namely number, string, datetime, I'll discuss its range, store requirement, precision and few best practices. Let's begin!

1 Numerical type

The numerical types in MySQL fall into one of four categories: integer, floating-point, decimal and bit.

Table 1 A simple summary for basic numerical types

1.1 Differences between floating-point type and DECIMAL

Floating-point type like FLOAT and DOUBLE store approximate numeric data values, while DECIMAL store exact numeric data values. Here is an illustrative example:


mysql> CREATE TABLE mytable(
    -> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    -> point FLOAT,
    -> salary DECIMAL(10,4)
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> insert into mytable (point, salary) VALUE (0.1, 0.1);
Query OK, 1 row affected (0.01 sec)

mysql> select point, point = 0.1, point*10, salary, salary = 0.1, salary*10 from mytable where id = 1;
+-------+-------------+--------------------+--------+--------------+-----------+
| point | point = 0.1 | point*10           | salary | salary = 0.1 | salary*10 |
+-------+-------------+--------------------+--------+--------------+-----------+
|   0.1 |           0 | 1.0000000149011612 | 0.1000 |            1 |    1.0000 |
+-------+-------------+--------------------+--------+--------------+-----------+
1 row in set (0.00 sec)

Example 1 Differences between FLOAT and DECIMAL

The reason for the excepted results(point != 0.1 and point*10=1.0000000149011612) is that the point column is a FLOAT type which is based on IEEE floating-point standard^[3] and not designed to represent exact numeric values, when 0.1 is inserted, it's stored as a value very close to 0.1(maybe 0.100000001) but never the exact 0.1.

Even though the floating-point errors is really small but not negligible in a few situations for instance finance. Under that circumstances, DECIMAL is the right type to use but not the only type. In practice, people tends to use BIGINT to store the financial data in a small unit like cent or micro-cent.

1.2 Peculiar syntax of INT(M)

The following definition of id column causes some misunderstanding.


CREATE TABLE (
    id INT(8) NOT NULL AUTO_INCREMENT,
    name CHAR(8),
    salary DECIMAL(8),
);

Example 2 peculiar syntax of INT(M)

CHAR(8) specifies a string and its maximum character number is 8
DECIMAL(8) specifies an exact number and its total number of digits is 8
INT(8) specifies an integer and its display width is 8

In other words, M in INT(M) has nothing to do with its range or storage, INT(M) is always a 4-byte integer.

2 String

Table 2 A simple Summary of string types

There are two big categories of string types: CHAR and VARCHAR for relatively short string and TEXT for relatively long string. It's easy to choose an appropriate string type based on the length requirement. But the maximum length and storage requirement is sophisticated and depend on several factors^[1]. I'll explain it in details.

2.1 Difference between CHAR and VARCHAR

CHAR is fixed-length string, all you store is string data. By contrast, VARCHAR is variable-length, the actual length is stored before the string data.

For example, if we store the string sql and hello into a CHAR(8) and VARCHAR(8), their storage layout is as following:


            0 1 2 3 4 5 6 7 
CHAR(8)    |h|e|l|l|o| | | |  <-- 8 bytes with 3 right-padded spaces
CHAR(8)    |s|q|l| | | | | |  <-- 8 bytes with 5 right-padded spaces
VARCHAR(8) |5|h|e|l|l|0|  <-- 1+5=6 bytes
VARCHAR(8) |3|s|q|l|      <-- 1+3=4 bytes
            ^
            |
        string length

Example 3 Layout of sql and hello in CHAR(8) and VARCHAR(8)

Apart from this, trailing spaces are removed from CHAR columns upon retrieval.

2.2 Maximum length and storage requirement for CHAR and VARCHAR

The maximum length of a CHAR(M) column is M character. The storage required depends on M and the character set you use, to put it simple, the number of characters(length) and the number of bytes for a single character. The length can be any value 0 to 255, the number of bytes for a single character varies over different character sets and even different within the same character set(for instance: utf-8).

The M in VARCHAR(M) is not about character numbers but specifies the maximum bytes of the string. The maximum of M is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

2.3 TEXT

The TEXT type is used to store large strings, the four TEXT types are: TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT. These four types only differ in the maximum length. The maximum length of TINYTEXT is 255=2⁸-1 and only requires 1 byte to store the length, the maximum length of TEXT is 65535=2¹⁶-1 and requires 2 bytes to store the length, and so on.

As you can see, LONGTEXT can store a string even exceeds aforementioned maximum row size 65535 due to the "external" storage area for a large string content, TEXT columns only contribute 9 to 12 bytes toward the row size limit.

Due to the possible large length of TEXT, there are some constrains for them. For instance, you must specify an index prefix length for indexes on TEXT; Only the first 1024(in default settings) bytes of the the column are used when sorting.

Due to the large length issue of TEXT, you also should take care when dealing with it. Because retrieve a row with large TEXT from disk and cache it is "expensive". Avoid it when possible, don't store large string in a database. Instead, write them to a separate object data store and use the table to track the location. Or you can put the LONGTEXT column to a different table other than the actively used table.

2.4 LONTTEXT limit

The maximum characters a LONGTEXT column can hold is subject to lots of factors: character set, the configured maximum packet size in the client/server protocol, available memory, constraints from different part of the whole system....

3 Datatime and more

Table 3 A simple summary for Data and time

A DATETIME type is good, but it doesn't contain any timezone information which is very likely to cause errors in an international circumstances. So you may choose the TIMESTAMP type since it's UTC based but the maximum supported time is roughly 2038. So in practice, you can just store a 64bit timestamp in a BIGINT type.

Other types that's not mentioned above:

Numeric bit-value: BIT
Raw binary string: BINARY, BINARY, BLOB
Spatial data type
JSON data type
...

4 Summary

Some Key takeaways from this notes:

FLOAT represents approximate value and DECIMAL represents exact value
M in int(M) specify its display width, not storage bytes
Storage for CHAR(M) is fixed-size M characters while storage for VARCHAR is variable
The limit of VARCHAR is subject to the row size limit 65535 bytes which is shared among all columns
Dealing with large TEXT is expensive, avoid it when possible

Reference

1. MySQL 8.0 Reference Manual: Data Type Storage Requirements
2. MySQL 8.0 Reference Manual: Date Types
3. Wikipeida. IEEE 754

Songziyu at Substack

Discussion about this post