Skip to content
View Article Network

A Brief Discussion on Handling Database String Types

I've been writing .NET articles lately and felt like branching out to a different topic.

When dealing with database strings, common practices can be divided into three categories: distinguishing between NULL and empty strings, storing everything as NULL, and setting everything to NOT NULL while storing empty strings. Each approach has its own advantages and disadvantages.

Distinguishing between NULL and Empty Strings

NULL represents that the user has never entered any data, while an empty string represents that the user has entered data, but the value is empty. The advantage of this approach is that it provides clearer semantics.

However, because this approach lacks strict enforcement, it requires developers to be careful in their programming to ensure that NULL and empty strings are used in the correct contexts.

Storing Everything as NULL

This approach is commonly seen among Oracle users because Oracle databases automatically convert empty strings to NULL when storing them. In this case, the Oracle database provides higher enforcement; even if a developer tries to store an empty string, the database automatically converts it to NULL, ensuring consistency. In other database systems, however, you might need to use CHECK constraints to enforce this.

Setting Everything to NOT NULL and Storing Empty Strings

This approach is often seen in some older database applications. Its advantages are as follows:

  • Since the column is set to NOT NULL, it forces users to store only empty strings.
  • In some cases with older databases, index performance for empty strings might be better than for NULL. Of course, different databases handle this differently, and I am not certain if there have been updates to optimization strategies for NULL in recent years. However, in current practice, NULL may still not effectively utilize indexes. For example, this MSDN article, "Filtered index you create with IS NULL predicate is not used in SQL Server," mentions one such scenario.
  • Without an ORM (Object-Relational Mapping), WHERE column = NULL in a SQL statement returns false, and you must use WHERE column IS NULL. This means that when writing code to automatically generate SQL, you need to handle NULL specifically. Using empty strings avoids this issue.
  • NULL can produce unexpected results in some conditional operations and function processing. Storing empty strings can avoid these situations. For more on these unexpected cases, you can refer to the following articles by the expert Darkthread:

The disadvantage is that when a Foreign Key is a string type, if there is no value, it must be stored as NULL and cannot be an empty string. This leads to inconsistency in how that column is handled compared to other string columns.

Personal Preference

Nowadays, I mostly use Entity Framework to handle database operations, so I no longer need to hand-code libraries and handle NULL values within them as I did in the past. Nevertheless, considering that NULL can cause unexpected issues, when working on my own projects or in situations where there are no specific rules set by the company or team, I prefer to set columns to NOT NULL and store empty strings, unless a Foreign Key is involved. For string-type Foreign Keys, I try to avoid them as much as possible. Of course, when using an Oracle database, I am still forced to store NULL.

Change Log

  • 2024-07-12 Initial document creation.