MySQL Basic Data Types
Choosing the right data type for a column is important for designing a good MySQL table. In general, a simple and small type is the first choice since operations over a simple and small type cost fewer CPU resources and a small type also requires less space to store or cache. However, the chosen type must have a range big enough to cover all possible values generated from the application by now or in the near future.
This note is about basic data types in MySQL, namely number, string, datetime, I'll discuss its range, store requirement, precision and few best practices. Let's begin!
1 Numerical type
The numerical types in MySQL fall into one of four categories: integer, floating-point, decimal and bit.
Table 1 A simple summary for basic numerical types
1.1 Differences between floating-point type and DECIMAL
Floating-point type like FLOAT
and DOUBLE
store approximate numeric data values, while DECIMAL
store exact numeric data values. Here is an illustrative example:
mysql> CREATE TABLE mytable(
-> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
-> point FLOAT,
-> salary DECIMAL(10,4)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql> insert into mytable (point, salary) VALUE (0.1, 0.1);
Query OK, 1 row affected (0.01 sec)
mysql> select point, point = 0.1, point*10, salary, salary = 0.1, salary*10 from mytable where id = 1;
+-------+-------------+--------------------+--------+--------------+-----------+
| point | point = 0.1 | point*10 | salary | salary = 0.1 | salary*10 |
+-------+-------------+--------------------+--------+--------------+-----------+
| 0.1 | 0 | 1.0000000149011612 | 0.1000 | 1 | 1.0000 |
+-------+-------------+--------------------+--------+--------------+-----------+
1 row in set (0.00 sec)
Example 1 Differences between FLOAT
and DECIMAL
The reason for the excepted results(point != 0.1 and point*10=1.0000000149011612) is that the point column is a FLOAT
type which is based on IEEE floating-point standard[3] and not designed to represent exact numeric values, when 0.1 is inserted, it's stored as a value very close to 0.1(maybe 0.100000001) but never the exact 0.1.
Even though the floating-point errors is really small but not negligible in a few situations for instance finance. Under that circumstances, DECIMAL
is the right type to use but not the only type. In practice, people tends to use BIGINT
to store the financial data in a small unit like cent or micro-cent.
1.2 Peculiar syntax of INT(M)
The following definition of id column causes some misunderstanding.
CREATE TABLE (
id INT(8) NOT NULL AUTO_INCREMENT,
name CHAR(8),
salary DECIMAL(8),
);
Example 2 peculiar syntax of INT(M)
CHAR(8)
specifies a string and its maximum character number is 8DECIMAL(8)
specifies an exact number and its total number of digits is 8INT(8)
specifies an integer and its display width is 8
In other words, M
in INT(M)
has nothing to do with its range or storage, INT(M)
is always a 4-byte integer.
2 String
Table 2 A simple Summary of string types
There are two big categories of string types: CHAR
and VARCHAR
for relatively short string and TEXT
for relatively long string. It's easy to choose an appropriate string type based on the length requirement. But the maximum length and storage requirement is sophisticated and depend on several factors[1]. I'll explain it in details.
2.1 Difference between CHAR and VARCHAR
CHAR
is fixed-length string, all you store is string data. By contrast, VARCHAR
is variable-length, the actual length is stored before the string data.
For example, if we store the string sql
and hello
into a CHAR(8)
and VARCHAR(8)
, their storage layout is as following:
0 1 2 3 4 5 6 7
CHAR(8) |h|e|l|l|o| | | | <-- 8 bytes with 3 right-padded spaces
CHAR(8) |s|q|l| | | | | | <-- 8 bytes with 5 right-padded spaces
VARCHAR(8) |5|h|e|l|l|0| <-- 1+5=6 bytes
VARCHAR(8) |3|s|q|l| <-- 1+3=4 bytes
^
|
string length
Example 3 Layout of sql and hello in CHAR(8) and VARCHAR(8)
Apart from this, trailing spaces are removed from CHAR columns upon retrieval.
2.2 Maximum length and storage requirement for CHAR and VARCHAR
The maximum length of a CHAR(M)
column is M character. The storage required depends on M and the character set you use, to put it simple, the number of characters(length) and the number of bytes for a single character. The length can be any value 0 to 255, the number of bytes for a single character varies over different character sets and even different within the same character set(for instance: utf-8).
The M
in VARCHAR(M)
is not about character numbers but specifies the maximum bytes of the string. The maximum of M
is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
2.3 TEXT
The TEXT
type is used to store large strings, the four TEXT
types are: TINYTEXT
, TEXT
, MEDIUMTEXT
, LONGTEXT
. These four types only differ in the maximum length. The maximum length of TINYTEXT
is 255=28-1 and only requires 1 byte to store the length, the maximum length of TEXT
is 65535=216-1 and requires 2 bytes to store the length, and so on.
As you can see, LONGTEXT
can store a string even exceeds aforementioned maximum row size 65535 due to the "external" storage area for a large string content, TEXT
columns only contribute 9 to 12 bytes toward the row size limit.
Due to the possible large length of TEXT
, there are some constrains for them. For instance, you must specify an index prefix length for indexes on TEXT
; Only the first 1024(in default settings) bytes of the the column are used when sorting.
Due to the large length issue of TEXT
, you also should take care when dealing with it. Because retrieve a row with large TEXT
from disk and cache it is "expensive". Avoid it when possible, don't store large string in a database. Instead, write them to a separate object data store and use the table to track the location. Or you can put the LONGTEXT
column to a different table other than the actively used table.
2.4 LONTTEXT limit
The maximum characters a LONGTEXT
column can hold is subject to lots of factors: character set, the configured maximum packet size in the client/server protocol, available memory, constraints from different part of the whole system....
3 Datatime and more
Table 3 A simple summary for Data and time
A DATETIME
type is good, but it doesn't contain any timezone information which is very likely to cause errors in an international circumstances. So you may choose the TIMESTAMP
type since it's UTC based but the maximum supported time is roughly 2038. So in practice, you can just store a 64bit timestamp in a BIGINT
type.
Other types that's not mentioned above:
Numeric bit-value:
BIT
Raw binary string:
BINARY
,BINARY
,BLOB
Spatial data type
JSON data type
...
4 Summary
Some Key takeaways from this notes:
FLOAT
represents approximate value andDECIMAL
represents exact valueM in
int(M)
specify its display width, not storage bytesStorage for
CHAR(M)
is fixed-size M characters while storage forVARCHAR
is variableThe limit of
VARCHAR
is subject to the row size limit 65535 bytes which is shared among all columnsDealing with large
TEXT
is expensive, avoid it when possible
Reference
1. MySQL 8.0 Reference Manual: Data Type Storage Requirements
2. MySQL 8.0 Reference Manual: Date Types
3. Wikipeida. IEEE 754