The short answer is that when you specify a column from TableB in the where clause this can happen because of how nulls are handled differently between the on and where clauses. To better understand the issue, let’s look at an example.
Let’s assume you have two tables that you are joining as shown below.
| TableB | ID | PartNumber | DateCreated | |
Assume TableA has 3 rows of data and TableB has 5.
TableA rows:
ID | PartNumber |
1 | P1234 |
2 | P5678 |
3 | P9012 |
TableB rows:
ID | PartNumber | DateCreated |
1 | P1234 | 2012-02-10 |
2 | P1234 | 2012-02-09 |
3 | P5678 | 2012-02-09 |
4 | P5678 | 2012-02-10 |
5 | P9876 | 2012-02-10 |
select * from TableA as A left outer join TableB as B on (A.PartNumber= B.PartNumber)
This would give the result
A.ID | A.PartNumber | B.ID | B.PartNumber | B.DateCreated |
1 | P1234 | 1 | P1234 | 2012-02-10 |
1 | P1234 | 2 | P1234 | 2012-02-09 |
2 | P5678 | 3 | P5678 | 2012-02-09 |
2 | P5678 | 4 | P5678 | 2012-02-10 |
3 | P9012 | null | null | null |
No problems so far. Everything looks good and it straight forward.
Now we decide that actually we should only be considering records in TableB that have a DateCreated = ‘2012-02-10’.
This is where it is easy to make a mistake.
The WRONG Way
select * from TableA as A left outer join TableB as B on (A.PartNumber= B.PartNumber)
where DateCreated = ‘2012-02-10’
This will work just like this inner join
select * from TableA as A inner join TableB as B on (A.PartNumber= B.PartNumber)
where DateCreated = ‘2012-02-10’
Both return:
A.ID | A.PartNumber | B.ID | B.PartNumber | B.DateCreated |
1 | P1234 | 1 | P1234 | 2012-02-10 |
2 | P5678 | 4 | P5678 | 2012-02-10 |
Why?
Think about what you have asked SQL Server to do. You have said select all rows from TableA and any matching ones in TableB, but you have also said only show records in TableB that have the DateCreated = ‘2012-02-10’. This means that you have told it to remove any rows where DateCreated is null. Note that one row has a null for DateCreated because there is no matching record in TableB for part number P9012.
The RIGHT Ways
There are at least three ways to solve this problem.
Solution 1: Move where clause into the on clause
select * from TableA as A left outer join TableB as B on (A.PartNumber= B.PartNumber and DateCreated = ‘2012-02-10’)
I like this one because it is easy to read, though the implications of moving DateCreated to the where clause isn’t quite as clear IMHO.
Solution 2: Use an embedded select statement
select * from TableA as A left outer join (select * from TableB where DateCreated = ‘2012-02-10’) as B on (A.PartNumber= B.PartNumber)
I like this one because it is clear what is being selected in TableB, but it is a bit messier reading IMHO.
Solution 3: Add check for null in where clause
select * from TableA as A inner join TableB as B on (A.PartNumber= B.PartNumber)
where (DateCreated = ‘2012-02-10’ or DateCreated is null)
This is my least favorite because it uses an OR in the where clause which can have a big negative impact on your performance and there is no need for it. This is particularly true if it was something like DateCreated = (select max(DateCreated from TableB)).
Conclusion
I suspect I may have created this very issue many times over the years and not even realized it when doing quick adhoc queries that I didn’t really validate my data very well. Interestingly the results may seem reasonable if you are looking at a lot of rows and are expecting most cases to have matching rows in TableA. The easiest way to tell if you have this problem with your query is to check the number of rows you are getting is the same as the table on the left (TableA in this case). If you have less rows then you may have this problem (or some other problem).