I have a table users, like this:
And a table addresses, like this:
In addresses, the user_id column indicates the user -- if any -- to whom the address belongs. I want to find any users who don't have an address. In this example, that would be betty: her user_id, 4, doesn't appear anywhere in addresses. I thought this query would do the job:
select * from users where user_id not in
(select distinct user_id from addresses)
But that query returns no rows! Here's why. Note that the row in addresses with address_id = 5 has a NULL user_id. Therefore, the query is equivalent to:
select * from users where user_id not in (1, 2, 3, NULL)
This, in turn, is equivalent to:
select * from users where user_id <> 1 and user_id <> 2 and user_id <> 3
and user_id <> NULL
In SQL, no comparison to NULL is ever true when using the = or <> operator. So no row matches the query. (The IS operator is proper to use for comparisons to NULL.)
We can avoid this problem by excluding NULL values in our subquery:
select * from users where user_id not in
(select distinct user_id from addresses where user_id is not null)
Or by using COALESCE to convert any NULL values to something we can compare against an integer:
select * from users where user_id not in
(select distinct coalesce(user_id, -1) from addresses)
Either solution will cause the query to output betty's row of the addresses table, as desired.