UUIDs in MySQL are really not random

Universally Unique Identifiers (UUIDs) are great. I love how you can tell the progress of a batch job just by looking at the current UUID. If it starts with 0..., the task is less than 1/16th done. If it starts with 7d.., we're almost halfway there. At ff... we are nearing the end. The fact that you can tell this rests on two principles: 1) you sort your jobs by their uuid and 2) UUIDs are random, as in, distributed uniformly.

However, last week, I noticed a strange thing: a clearly visible pattern in the uuid column of a database table. It should be impossible, but there it was. It looked like this:

> SELECT uuid FROM example ORDER BY id;
4f95de28-0fd1-48db-ad2e-34ecd169c483
4331cb9e-1d91-11e9-be2c-45923c63e8a2
4331cc4c-1d91-11e9-be2c-45923c63e8a2
4331ccec-1d91-11e9-be2c-45923c63e8a2
4331cd7e-1d91-11e9-be2c-45923c63e8a2
c7e2f124-f6ba-4434-843f-89958a7436ec
4331ce10-1d91-11e9-be2c-45923c63e8a2
4331ce9e-1d91-11e9-be2c-45923c63e8a2
4331cf28-1d91-11e9-be2c-45923c63e8a2
4331cfaf-1d91-11e9-be2c-45923c63e8a2
4331d017-1d91-11e9-be2c-45923c63e8a2
4331d0c6-1d91-11e9-be2c-45923c63e8a2
4331d139-1d91-11e9-be2c-45923c63e8a2
4331d1a7-1d91-11e9-be2c-45923c63e8a2
4331d20e-1d91-11e9-be2c-45923c63e8a2
4331d271-1d91-11e9-be2c-45923c63e8a2
4331d2d7-1d91-11e9-be2c-45923c63e8a2
3e18f8dd-b1d3-4e16-8a81-4bdceac91772
4331d33a-1d91-11e9-be2c-45923c63e8a2
f6b8658d-846b-4418-a79c-713db516203e
4331d3a8-1d91-11e9-be2c-45923c63e8a2
4331d40d-1d91-11e9-be2c-45923c63e8a2
4331d48e-1d91-11e9-be2c-45923c63e8a2
4331d4fe-1d91-11e9-be2c-45923c63e8a2
4331d567-1d91-11e9-be2c-45923c63e8a2
4331d5cb-1d91-11e9-be2c-45923c63e8a2
4331d630-1d91-11e9-be2c-45923c63e8a2
4331d691-1d91-11e9-be2c-45923c63e8a2
14a5cb4c-f336-4240-aff3-4ffcfa8d135f
4331d6f9-1d91-11e9-be2c-45923c63e8a2
4331d762-1d91-11e9-be2c-45923c63e8a2
4331d7c8-1d91-11e9-be2c-45923c63e8a2
4331d83d-1d91-11e9-be2c-45923c63e8a2



Hm, that is very weird. Did we maybe convert an old auto-incrementing integer column into UUIDs in a very stupid way? Did we maybe use UUID version 3 or 5? Did our library corrupt the first bits of the binary representation of the UUID? After a while, I remembered that we initialized this column like so:

UPDATE example SET uuid = UUID() WHERE uuid IS NULL;

I also remembered reading this in the MySQL documentation:
"Warning: Although UUID() values are intended to be unique, they are not necessarily unguessable or unpredictable. If unpredictability is required, UUID values should be generated some other way." 
If you are like me, you won't use UUID() for secrets after reading this (and I didn't!). If you are even more like me, you will think that this is like the difference between /dev/urandom and /dev/random and that the uniform distribution law still applies here. However, to my great surprise, the UUIDs generated by UUID() are not uniform at all! The result is that a significant portion of UUIDs in our database are not uniform:

> SELECT COUNT(*), LEFT(uuid, 1) FROM example GROUP BY LEFT(uuid, 1);
+----------+---------------+
| count(*) | LEFT(guid, 1) |
+----------+---------------+
| 1943     | 0 
            |
| 1871     | 1             |
| 1913     | 2             |
| 1843     | 3             |
| 3050     | 4             |
| 1943     | 5             |
| 1889     | 6             |
| 1866     | 7             |
| 1865     | 8             |
| 1903     | 9             |
| 1868     | a             |
| 1898     | b             |
| 1854     | c             |
| 1897     | d             |
| 1941     | e             |
| 1836     | f             |
+----------+---------------+


So, the lesson for today is: take warnings in documentation seriously. If you used UUID() for data that is supposed to be secret (like passwords), you have a serious problem, as these secrets can now be easily guessed.

Edit: read this follow-up post

Popular posts from this blog

"Security Is Our Top Priority" is BS

AI programming tools should be added to the Joel Test

The long long tail of AI applications