From time to time, I have had to go back to a legacy database and for one reason or another add a non-integer surrogate key. Sometimes this has come up when working on an old API that exposes a primary key in the URL (GET /account/123
). This generally can be bad for security, as a user could attempt to access other resources by incrementing the ID (GET /account/124
).
Generating a random string as a surrogate key in code can be easy enough, by using a UUID or a class like RandomStringUtils (careful, look up the Scunthorpe problem). But if you have lots of rows in a database that need to be updated, you will likely end up backfilling values in the database itself.
With Postgres 13, this is fairly simple:
select gen_random_uuid()::varchar
from some_table
UUID |
7fc2113c-d471-4138-8e8e-b8aba6bcf3fc |
b14bdeec-81eb-466f-b5b0-a2440e05b667 |
645aa4d2-4c3e-48ee-adb5-2d28f2c33ec9 |
MySQL's UUID algorithm is unfortunately a bit more predictable. It generates values that are sequential or near-sequential when run on the same database server at approximately the same time. Only the last characters of the first grouping will change. So, if you do a substring and select 6 characters, or only select the last 12 characters, you will end up with duplicate values. The same is true for the uuid_short
function - the values will be sequential.
Some possible solutions in MySQL is to either run the UUID through a hash like sha
, or use random_bytes
. Just be sure to chose enough bytes to provide room for sufficient randomness to reduce the chance of collisions, and so that the probability of sequential IDs are low (ie: AABBCC
, AABBCD
). With hex (0-9,A-F), 6 characters allows for 16 million unique combinations, 8 over 4 billion, and 10 characters over 1 trillion.
select UUID(),
uuid_short(),
substr(sha(UUID()) from 1 for 6),
hex(random_bytes(3))
from some_table
UUID | UUID_SHORT | SHA | RANDOM_BYTES |
---|---|---|---|
d54aced3-cfd3-11ec-afbc-0242ac130002 | 27717705796681743 | 2f18bd | 43E6A4 |
d54ad161-cfd3-11ec-afbc-0242ac130002 | 27717705796681744 | 02b27c | 7BD5CD |
d54ad645-cfd3-11ec-afbc-0242ac130002 | 27717705796681745 | 85620e | 95835B |
d54ad7d7-cfd3-11ec-afbc-0242ac130002 | 27717705796681746 | 0c2a7b | 26D02E |
Just keep in mind that relying on security by obscurity, by having a long identifier in a URL that would be hard to guess, is not the only solution you should rely on. Sensitive or protected resources should still be protected by good authentication and authorization.