A Brief Discussion on Handling Database String Types
I've been writing .NET articles lately and felt like branching out to a different topic.
When dealing with database strings, common practices can be divided into three categories: distinguishing between NULL and empty strings, storing everything as NULL, and setting everything to NOT NULL while storing empty strings. Each approach has its own advantages and disadvantages.
Distinguishing between NULL and Empty Strings
NULL represents that the user has never entered any data, while an empty string represents that the user has entered data, but the value is empty. The advantage of this approach is that it provides clearer semantics.
However, because this approach lacks strict enforcement, it requires developers to be careful in their programming to ensure that NULL and empty strings are used in the correct contexts.
Storing Everything as NULL
This approach is commonly seen among Oracle users because Oracle databases automatically convert empty strings to NULL when storing them. In this case, the Oracle database provides higher enforcement; even if a developer tries to store an empty string, the database automatically converts it to NULL, ensuring consistency. In other database systems, however, you might need to use CHECK constraints to enforce this.
Setting Everything to NOT NULL and Storing Empty Strings
This approach is often seen in some older database applications. Its advantages are as follows:
- Since the column is set to
NOT NULL, it forces users to store only empty strings. - In some cases with older databases, index performance for empty strings might be better than for
NULL. Of course, different databases handle this differently, and I am not certain if there have been updates to optimization strategies forNULLin recent years. However, in current practice,NULLmay still not effectively utilize indexes. For example, this MSDN article, "Filtered index you create with IS NULL predicate is not used in SQL Server," mentions one such scenario. - Without an ORM (Object-Relational Mapping),
WHERE column = NULLin a SQL statement returnsfalse, and you must useWHERE column IS NULL. This means that when writing code to automatically generate SQL, you need to handleNULLspecifically. Using empty strings avoids this issue. NULLcan produce unexpected results in some conditional operations and function processing. Storing empty strings can avoid these situations. For more on these unexpected cases, you can refer to the following articles by the expert Darkthread:
The disadvantage is that when a Foreign Key is a string type, if there is no value, it must be stored as NULL and cannot be an empty string. This leads to inconsistency in how that column is handled compared to other string columns.
Personal Preference
Nowadays, I mostly use Entity Framework to handle database operations, so I no longer need to hand-code libraries and handle NULL values within them as I did in the past. Nevertheless, considering that NULL can cause unexpected issues, when working on my own projects or in situations where there are no specific rules set by the company or team, I prefer to set columns to NOT NULL and store empty strings, unless a Foreign Key is involved. For string-type Foreign Keys, I try to avoid them as much as possible. Of course, when using an Oracle database, I am still forced to store NULL.
Change Log
- 2024-07-12 Initial document creation.