A Brief Discussion on Handling Database String Types
I've been writing .NET articles lately and felt like branching out to a different topic.
When dealing with database strings, there are three common approaches: distinguishing between NULL and empty strings, storing everything as NULL, and setting everything to NOT NULL while storing empty strings. Each approach has its own pros and cons.
Distinguishing between NULL and Empty Strings
NULL represents that the user has never entered any data, while an empty string represents that the user has entered data, but the value is empty. The advantage of this approach is that it provides clearer semantics.
However, because this approach lacks enforcement, it requires developers to be careful in their programming to ensure that NULL and empty strings are used in the correct contexts.
Storing Everything as NULL
This approach is commonly seen among Oracle users because Oracle databases automatically convert empty strings to NULL when stored. In this case, the Oracle database has a higher level of enforcement; even if a developer tries to store an empty string, the database will automatically convert it to NULL, ensuring consistency. In other database systems, however, you might need to use CHECK constraints to enforce this.
Setting Everything to NOT NULL and Storing Empty Strings
This approach is often seen in some older database applications. Its advantages are as follows:
- Because the column is set to
NOT NULL, it forces users to store only empty strings. - In some cases with older databases, index performance for empty strings might be better than for
NULL. Of course, how different databases handle this may vary, and I am not certain if database optimization strategies forNULLhave been updated in recent years. However, in current practice,NULLstill may not effectively utilize indexes in some scenarios. For example, this MSDN article, "Filtered index you created with IS NULL predicate is not used in SQL Server," mentions one such scenario. - Without an ORM (Object-Relational Mapping),
WHERE column = NULLin a SQL statement returnsfalse, and you must useWHERE column IS NULL. This requires special handling forNULLwhen writing code to automatically generate SQL; using empty strings avoids this issue. NULLcan produce unexpected results in some conditional operations and function processing. Storing empty strings can avoid these situations. For more on these unexpected cases, you can refer to the following articles by the great Darkthread:
The disadvantage is that when a Foreign Key is a string type, if there is no value, it must be stored as NULL and cannot be an empty string. This leads to inconsistency in how that column is handled compared to other string-type columns.
Personal Preference
Nowadays, I mostly use Entity Framework for database operations, so I don't have to manually write libraries and handle NULL logic within them like I did in the past. Nevertheless, considering that NULL can cause unexpected issues, when working on my own projects, or in cases where there are no specific rules set by the company or team, I prefer to set columns to NOT NULL and store empty strings, unless it involves a Foreign Key. For string-type Foreign Keys, I try to avoid them as much as possible. Of course, when using an Oracle database, I am still forced to store NULL.
Change Log
- 2024-07-12 Initial version created.
