Tuesday, November 11, 2014

Null values can cause unexpected results in SQL subquery

I was recently using a SQL query to examine data in my PostgreSQL 9.3 database, and the results were not what I expected. It took a while to figure out that this due to using a subquery (also known as an inner query or nested query) that involved NULL values.

I have a table users, like this:


And a table addresses, like this:

In addresses, the user_id column indicates the user -- if any -- to whom the address belongs. I want to find any users who don't have an address. In this example, that would be betty: her user_id, 4, doesn't appear anywhere in addresses. I thought this query would do the job:

select * from users where user_id not in
(select distinct user_id from addresses)

But that query returns no rows! Here's why. Note that the row in addresses with address_id = 5 has a NULL user_id. Therefore, the query is equivalent to:

select * from users where user_id not in (1, 2, 3, NULL)

This, in turn, is equivalent to:

select * from users where user_id <> 1 and user_id <> 2 and user_id <> 3
and user_id <> NULL

In SQL, no comparison to NULL is ever true when using the = or <> operator. So no row matches the query. (The IS operator is proper to use for comparisons to NULL.)

We can avoid this problem by excluding NULL values in our subquery:

select * from users where user_id not in
(select distinct user_id from addresses where user_id is not null)

Or by using COALESCE to convert any NULL values to something we can compare against an integer:

select * from users where user_id not in
(select distinct coalesce(user_id, -1) from addresses)

Either solution will cause the query to output betty's row of the addresses table, as desired.