ProgBlog: November 2014

I was recently using a SQL query to examine data in my PostgreSQL 9.3 database, and the results were not what I expected. It took a while to figure out that this due to using a subquery (also known as an inner query or nested query) that involved NULL values.

I have a table users, like this:

And a table addresses, like this:

In addresses, the user_id column indicates the user -- if any -- to whom the address belongs. I want to find any users who don't have an address. In this example, that would be betty: her user_id, 4, doesn't appear anywhere in addresses. I thought this query would do the job:

select * from users where user_id not in

(select distinct user_id from addresses)

But that query returns no rows! Here's why. Note that the row in addresses with address_id = 5 has a NULL user_id. Therefore, the query is equivalent to:

select * from users where user_id not in (1, 2, 3, NULL)

This, in turn, is equivalent to:

select * from users where user_id <> 1 and user_id <> 2 and user_id <> 3

and user_id <> NULL

In SQL, no comparison to NULL is ever true when using the = or <> operator. So no row matches the query. (The IS operator is proper to use for comparisons to NULL.)

We can avoid this problem by excluding NULL values in our subquery:

select * from users where user_id not in

(select distinct user_id from addresses where user_id is not null)

Or by using COALESCE to convert any NULL values to something we can compare against an integer:

select * from users where user_id not in

(select distinct coalesce(user_id, -1) from addresses)

Either solution will cause the query to output betty's row of the addresses table, as desired.

ProgBlog

Tuesday, November 11, 2014

Null values can cause unexpected results in SQL subquery