Home
Arjen's Journal - 17th March 2008
Open Query: MySQL, Open Source & other ponderings

Arjen Lentz
Date: 2008-03-17 10:42
Subject: Dealing with NULLs
Security: Public

I frequently see a question fly past on why an expression like WHERE col = NULL does not come up with the desired result, even though it superficially looks perfectly sane. To address this, we can recap some high school maths, and at the same time finally find out that some things learnt there are actually very practical and useful later in life... I mean, who would have guessed that learning about sets and Venn diagrams was really giving you the fundamentals of relational databases?

So anyway, the proper form would be WHERE col IS NULL. NULLs can't be compared in the normal way, because NULL is not a value. NULL is the absence of a value. If you show this visually, the domain of valid values for a column lies inside the circle of a Venn diagram, while the NULL is outside. It's not a member of the domain.
There are also other functions for dealing with NULLs in queries, such as IFNULL(), COALESCE(), and so on.

One of Codd's rules for relational databases deals with NULLs:

Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values, and independent of data type.
It's good practise to always specify a column to be NOT NULL, unless you really need NULLs. With MySQL, there's a real practical reason for this: it's faster. Since NULL is not a value, it can't be stored in the regular space for that particular column type. Consider a TINYINT UNSIGNED, which in its 8 bits has a range of 0-255. How or where would you indicate NULL? So, for columns that can be NULL, MySQL uses a separate bit in the row, and looks that bit first to check for NULL. By specifying NOT NULL for the column, this saves doing the extra compare operation. And thus it's quicker. It may not seem like much, but all good little habits makes for fast apps.

Just for comparison, Oracle actually takes a shortcut with this by storing NULL the same as an empty string (""). This makes Oracle break Codd's rule#3 (it's not using a distinct and separate from all regular values), so going by that assessment it can't be a relational database. Haha funny. Wikipedia's page on this already correctly notes: "the rules are so strict that all popular relational database systems fail Codd's criteria in one way or another." So there you go.

IFNULL() is a useful function: SELECT IFNULL(col,"N/A") AS val
This will output N/A if col happens to be NULL.

COALESCE() is that one brilliant function you always forget the name of... it returns the first not-NULL:
SELECT COALESCE(col1,col2,col3) AS val
This will output the value of col2 if col1 happens to be NULL; col3 doesn't matter in this case as COALECSE() just returns the first match from the list (and col2 happens to not be NULL. You wouldn't believe how useful this function is! Remember it.

4 Comments | Post A Comment | Add to Memories | Tell a Friend | Link



browse
my journal
links
July 2008
High Performance MySQL (2nd ed.)
summary