Here is the article.
It turns out "where exists" is faster, because as the explain plan reveals, a join performs a Merge Join step, while "where exists" can get away with a Merge Semi Join step. What do these steps mean? Let's turn to the documentation (Source):
As for Merge Semi Join, I can't find such a clear definition in the docs, but Dan Martensen's article (that I linked to above) mentions: "It’s worth noting Where Exists doesn’t need to return a value to the parent query in the SELECT. It’s WHERE clause compares parent to subquery ids using a boolean short circuit early exit."
merge join: Each relation is sorted on the join attributes before the join starts. Then the two relations are scanned in parallel, and matching rows are combined to form join rows. This kind of join is more attractive because each relation has to be scanned only once. The required sorting might be achieved either by an explicit sort step, or by scanning the relation in the proper order using an index on the join key.
So maybe that's a clue as for what goes on under the hood -- it just checks for the matching rows' existence then quits. No data retrieval required.
Perhaps I will soon check if MySQL behaves similarly, as my work will involve more MySQL soon!
No comments:
Post a Comment