The number can have leading zeros (e.g. 00123).
What data type do you use?
I am very interested in your opinion on this.
Imagine that you work at a company which sells widgets. Each widget has a unique serial number. The number is a fixed length, and can contain leading zeros. That is, the following are all valid identifiers:
00001
01010
12345
What data type would you use to store these data in a database?
This is one of those strong opinions, weakly held. I'm not sure there's a right answer to it. A quick survey on Twitter0 was inconclusive:
Here, I attempt to provide some of the opinions on both sides of the argument. Feel free to supply your own arguments.
Can you do maths on it? If not, it isn't a number. It's a text string that happens to only contain numbers.
For example, asking "what is the average of these three serial numbers?" is a meaningless question. "What is this serial number divided by two?" again - that has an answer with no semantic content.
What if the serial has some semantic content in it? For example even numbers were made in one factory, odds in another. Or prime numbers are demonstration units. It might be easier to grab information from a number type. But it's probably better to pull that data into separate fields.
If this field was for the item's price, it's is always going to be a number. Sure, it might be a float
or an int
- but it'll be a number. Same if this was the item's height, width, or weight. Those values can only be numbers.
But there's nothing inherently "numbery" about a serial number. At any point, the boss could recommend adding Greek letters to it.
Given we can't do maths on it, and the "number" doesn't have any semantic content, there's no need to artificially restrict our database to a "number" type.
The same is broadly true of any constraint. The maximum length could change. The validation rules might be updated. Modern databases need to be able to cope with changes in business requirements.
I don't know of any number type which stores leading zeros. Please enlighten me if I'm wrong. That means any data retrieved from the database has to be formatted before displaying it to the user.
That also means that searching the database requires data to be pre-formatted.
All data should go through validation and sanitation before being displayed to the end user. Numbers aren't special in that regard. Similarly, data should be carefully checked before being searched for.
We all know how Microsoft Excel will look at any number and try to interpret it as a date. It also has a tendency to strip leading zeros. And some formatters will automatically add thousands separators to any number they see. Keeping the data as text reduces the risk of this happening.
Excel will mangle anything that looks even vaguely like a number. Storing as text does not guarantee that it will be interpreted as text.
OK - I think that's the majority of the argument for not treating this as a number. What are the arguments on the other side?
If we allow text in this field, what happens if someone types a letter O
rather than a number 0
? Having this be an int
prevents these errors creeping in. Yes, have checks at the front end, but this provides defence in depth.
As above. Data should be checked by the application before submitting to the database. The database should be checking the data before storing it.
Suppose we want to get all widgets with a serial number >=
a specific ID. A number type is much better than string for those sorts of operations.
This again assumes that the IDs have some semantic content. For example, 00012
was manufactured after 00008
. This may or may not be the case. Such operations are best performed on a field like "Manufactured Date".
It is quicker and computationally cheaper to store integers rather than text. Searching is faster, disk space requirements are lower.
It isn't the 1970s. We're not paying per bit. Unless we're storing billions of rows, or working on constrained hardware, this isn't a practical concern for most users.
It looks like a number. A human, on encountering one of these IDs is going to assume it is a number. Anything which makes it harder for humans to understand is going to cause problems.
Things like phone numbers look like integers, but they aren't. Momentary human confusion is preferable to mangled or imprecise data.
I lean to the side that says that this is a string with specific constraints. Namely /^[0-9]{5}$/
(please supply your own regex meme).
When a user enters a new serial number, it should be checked that it meets the constraints. If it doesn't, refuse to submit it until the user has formatted it correctly. When submitted, the database should check against the constraints, and refuse to accept non-matching strings.
The speed of searching and sorting is not meaningfully degraded by storing as a string.
But, I'm very aware that I could be wrong. This is my strong opinion, but if you can supply a better argument, I will drop my weak grip on it.
Which has obvious limitations! ↩︎
Richard says:
@edent says:
Eric Andersen says:
Daniel says: