Three-Predicate Logic, or How I Learned to Stop Worrying and Love NULL
A love letter to my favourite non-value
I really like this ‘SQL Iceberg’ meme.
As strange as it sounds, it has acted like a bit of a roadmap for me – do I know what everything on the image is? Can I at least explain the concepts? Answer: Not necessarily, even years later. It can’t be taken too seriously – it is a ‘funny’ (for certain definitions of ‘funny’) meme, and includes concepts that range from trivial to dialect-specific (ALLBALLS, for instance, being a PostgreSQL-specific timestamp alias for ’00:00:00’) to simple memes.
What has always struck me, though, is how often NULL appears. Explicitly right at the surface and in the darkest depths, with specific examples scattered throughout.
I don’t really understand why.
I mean, I do – I adore NULL. This is basically a sappy love letter to NULL. It is weird and behaves in a way that feels incongruous unless you dig into it a bit more. I’m here to convince you (I guess) that NULL is perfect just as it is, and any feelings otherwise are down to not totally grasping the three-predicate logic under which SQL operates.
Three-Predicate Logic
Almost everyone is familiar with two-predicate logic. Even if ‘Boolean logic’ elicits a blank stare, the concept is intuitive – yes and no, true and false. Two outcomes to any question – it is or it isn’t.
NULL is a third predicate all on its own. It isn’t ‘nothing’ as other languages apparent analogues imply – Python’s NoneType and Haskell’s Nil both explicitly describe nothingness. It isn’t an error – you don’t get NULL when you divide by zero, you get an error, and a NULL result looks fine to TRY/CATCH unless there is an explicitly non-NULL result expected.
NULL is ‘unknown.’ Not an error, just ‘I dunno’ – an explicitly ambiguous result. So, three predicate logic – True, False, and Unknown. When we include an explicitly unknown value in a logical expression, the answer is unknown – ‘1 > NULL’ doesn’t return TRUE or FALSE, it returns NULL.
This is fine. It might feel inconsistent that we get NULL if we try to evaluate 1+NULL, but a SUM aggregation over 1 and NULL returns 1 until we look at the console, which helpfully tells us that NULL values have been eliminated by the aggregation – not inconsistent, just explicitly excluded.
NULLs in CHECK Constraints are Truthy
So, while we’ve all tripped over accidentally excluding NULL results through JOIN or WHERE clauses, we hopefully understand why – if we’re filtering WHERE value > 1, the expression will return NULL – NULL isn’t greater than 1, and also isn’t less than 1, it is just NULL.
Looking back at the meme, we see two examples of this not holding – NULL being truthy in CHECK constraints, and NULLs being equal in DISTINCT, but unequal in UNIQUE. These aren’t, I hasten to add, the fault of NULL. NULL is beautiful and perfect just as it is. This comes down to the ‘success’ criteria of checks being made in each process.
‘Truthy’ in this case claims that NULL is treated as a ‘True’ value in CHECK constraints. This isn’t actually the case, but we can see why it looks like it is. Consider the following:
Inserting any numeric value will fail – an integer cannot be both less than 5 and greater than 10. No further explanation needed. However, inserting NULL will succeed.
Why? The numbers are, again, easy – 1<5 = True, 1>10 = False, True & False = False. No mystery. But, NULL < 5 = NULL, NULL > 10 = NULL, and NULL & NULL = NULL. This is because JOIN and WHERE clauses pass values where the clause evaluates to True, but CHECK constraints pass values where the clause evaluates to ‘Not False.’
‘Not False’ does not necessarily equal True. NULL is also ‘Not False.’
This makes the ANSI standard behaviour with UNIQUE consistent with other check constraints –NULL is neither IN nor NOT IN a set of values containing NULL. NULL != NULL, so inserting a second NULL value would not evaluate to False, and therefore pass the CHECK constraint. Note the mention of the ANSI standard. ANSI SQL thinks this is fine. PostgreSQL also thinks this is fine, but provides a NULLS NOT DISTINCT clause to alter this behaviour. SQL Server takes this NULLS NOT DISTINCT view, without giving an option to change it – you cannot insert multiple NULLs into a table with a UNIQUE constraint.
DISTINCT treats NULLs as indistinct because DISTINCT is a special snowflake. UNION is part of this club as well. I don’t have a good explanation as to why.
My best guess has to do with the console message when using an aggregate function on NULL values: “Warning: Null value is eliminated by an aggregate or other SET operation.” This doesn’t come up when using either DISTINCT or UNION, nor does any other message, but DISTINCT is functionally identical to GROUPing BY all columns in the query, so I suspect there is some behind the scenes wizardry in here that I need to learn a bit more about.
tl;dr
We often think of NULL as ‘nothing’ – empty columns are NULL, and many other languages have analogues that explicitly represent ‘nothing.’ In SQL, NULL isn’t nothing – it is unknown. It is hard to compare unknown to anything. Is unknown more than 10? I don’t know. Less than 100? I don’t know. Is this unknown equal to that unknown? I don’t know.
Accepting that we just don’t know is an important part of learning and of life. Embracing the fact that your favourite database will tell you that it just doesn’t know rather than making up an answer is an equally important part of learning SQL.
In SQL as in life, ‘I don’t know’ is a perfectly good answer. Expect it. Make three-predicate logic part of your daily life. Embrace the freedom that is NULL.
If you want to learn more about three-predicate logic and how SQL works generally, I would recommend Itzik Ben-Gan’s fantastic books ‘T-SQL Fundamentals’ and ‘T-SQL Querying.’
P.S: I know I didn’t touch on the last instance of NULL in the meme - ‘null’::jsonb IS NULL = False. ‘null’ is not NULL. One is a string, the other is a beautiful void.