From http://www.sql-server-performance.co...sis_tuning.asp
B) Can standard JOIN's be replaced with EXISTS statements?
If you need to join tables to ensure XYZ value exists or a specific set of
rows exists, then considering porting to EXISTS or NOT EXISTS statements.
At the same time is may also alleviate the need for DISTINCT clauses.
So is a Join really potentiall slower?
The Exists clause can be faster than a join because it just searches long
enough to find the first "Hit" as opposed to returning all rows that meet
the join criteria.
Greg Jackson
PDX, Oregon
|||Could you please elaborate? An Exists will meet all the rows that meet the
query criteria. Given that the criteria should return the same data as a
join, how could it be faster?
"pdxJaxon" <GregoryAJackson@.Hotmail.com> wrote in message
news:eZWWYB7aFHA.3848@.TK2MSFTNGP10.phx.gbl...
> The Exists clause can be faster than a join because it just searches long
> enough to find the first "Hit" as opposed to returning all rows that meet
> the join criteria.
>
> Greg Jackson
> PDX, Oregon
>
|||I say
SELECT * FROM Main
WHERE Exists(SELECT 1 FROM MyJoinTable a WHERE a.foo = main.foo)
then it will only check for the first hit.
still probably not making sense.
show us YOUR query and I can probably explain a bit better
GAJ
|||create table #Table1(T1C1 int identity (1,1), T1C2 char(10))
create table #Table2(T2C1 int, T2C2 char(10))
insert into #Table1 (T1C2) values ('T1C2')
insert into #Table1 (T1C2) values ('T1C2-2')
insert into #Table2 (T2C1,T2C2) values (1,'T1C2')
select t1.T1C1,T1C2
from #Table1 t1
inner join #Table2 t2 on t1.t1c1 = t2.t2c1
or
select t1.T1C1,T1C2
from #table1 t1
where exists
(select * from #Table2 t2
where t2.T2C1 = T1.T1C1)
"pdxJaxon" <GregoryAJackson@.Hotmail.com> wrote in message
news:%23fZBCd7aFHA.464@.TK2MSFTNGP15.phx.gbl...
>I say
> SELECT * FROM Main
> WHERE Exists(SELECT 1 FROM MyJoinTable a WHERE a.foo = main.foo)
> then it will only check for the first hit.
> still probably not making sense.
> show us YOUR query and I can probably explain a bit better
>
> GAJ
>
|||that's faster than a Join.
the where clause will return TRUE as soon as it finds a SINGLE record in
TABLE 2 that mathces Table1 (It will not bother returning more than 1 record
that matches to table 1)
A join will return ALL Records in T2 that match T1. IF there are millions of
records in each table, this would be exponentially faster.
Greg Jackson
PDX, Oregon
|||On Tue, 7 Jun 2005 15:11:21 -0700, ChrisR wrote:
>From http://www.sql-server-performance.co...sis_tuning.asp
>B) Can standard JOIN's be replaced with EXISTS statements?
>If you need to join tables to ensure XYZ value exists or a specific set of
>rows exists, then considering porting to EXISTS or NOT EXISTS statements.
>At the same time is may also alleviate the need for DISTINCT clauses.
>
>So is a Join really potentiall slower?
Hi Chris,
The question is actually irrelevant. A join is different: a query such
as
SELECT a.*
FROM a
WHERE EXISTS (SELECT * FROM b WHERE b.xx = a.xx)
will never return more than one copy of each row in a. Using a join
instead, such as this:
SELECT a.*
FROM a
JOIN b ON b.xx = a.xx
might return multiple copies of rows in a for values of xx that are not
unique in table b.
If no value of xx is ever duplicated in b, then the results will be the
same, but the EXISTS version might still be faster. Since SQL Server
doesn't know that there is only one row in b with a particular value of
xx, it still has to search the rest of the table after encoutering one;
if EXISTS is used, the search is aborted after the first match.
If SQL Server knows that values of b.xx are unique (because there is a
PRIMARY KEY or UNIQUE constraint on b.xx), then the performance MIGHT be
the same - the optimizer MIGHT decide to use the knowledge of this
constraint and createt the same execution plan it'll also use for the
EXISTS version.
I have never seen or heard of any situation where the JOIN would be
fafster than the "equivalent" EXISTS version. And I don't think I ever
will.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||Hi Hugo and Chris,
"Hugo Kornelis" wrote:
> On Tue, 7 Jun 2005 15:11:21 -0700, ChrisR wrote:
>
> Hi Chris,
> The question is actually irrelevant. A join is different: a query such
> as
> SELECT a.*
> FROM a
> WHERE EXISTS (SELECT * FROM b WHERE b.xx = a.xx)
> will never return more than one copy of each row in a. Using a join
> instead, such as this:
> SELECT a.*
> FROM a
> JOIN b ON b.xx = a.xx
> might return multiple copies of rows in a for values of xx that are not
> unique in table b.
> If no value of xx is ever duplicated in b, then the results will be the
> same, but the EXISTS version might still be faster. Since SQL Server
> doesn't know that there is only one row in b with a particular value of
> xx, it still has to search the rest of the table after encoutering one;
> if EXISTS is used, the search is aborted after the first match.
> If SQL Server knows that values of b.xx are unique (because there is a
> PRIMARY KEY or UNIQUE constraint on b.xx), then the performance MIGHT be
> the same - the optimizer MIGHT decide to use the knowledge of this
> constraint and createt the same execution plan it'll also use for the
> EXISTS version.
> I have never seen or heard of any situation where the JOIN would be
> fafster than the "equivalent" EXISTS version. And I don't think I ever
> will.
>
I think you have to try the statement both ways.
I'm not sure about EXISTS, but I have a fairly complex multi-way join that
is much faster with
a LEFT JOIN b on a.id = b.id ... WHERE b.id is null
than with
WHERE NOT EXISTS (select 1 from b where a.id = b.id)
Regards,
Jonathan
|||On Thu, 9 Jun 2005 12:52:03 -0700, Jonathan Levine wrote:
(snip)
>I'm not sure about EXISTS, but I have a fairly complex multi-way join that
>is much faster with
>a LEFT JOIN b on a.id = b.id ... WHERE b.id is null
>than with
>WHERE NOT EXISTS (select 1 from b where a.id = b.id)
Hi Jonathan,
Thanks for adding this. This is not the first time I witness or hear
reports of this either. It seems that the SQL Server development team
put most effort in the optimization of join strategies, since they are
used most often in typical queries.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||Also, be careful of the Optimizer Estimated Execution Plans. I especially
see this in comparisons between LEFT OUTER JOINS versus the use of EXISTS or
NOT EXISTS.
Even though the JOIN may produce a better ESTIMATED execution cost, it will
often create a HASH table for the JOIN, where the EXISTS solution will often
use a Nested Loop JOIN. The HASH will ALWAYS create temporary object, which
will have physical impact that the Optimizer does not take into account.
You're best bet would be to SET STATISTICS time and I/O on and run them both
ways, making sure to flush the Buffer Pool before each execution. This will
give you ACTUAL run information, not just ESTIMATED.
Sincerely,
Anthony Thomas
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:mnfha11l8fukurbcocmi9rg30s47q0lj6j@.4ax.com...
On Thu, 9 Jun 2005 12:52:03 -0700, Jonathan Levine wrote:
(snip)
>I'm not sure about EXISTS, but I have a fairly complex multi-way join that
>is much faster with
>a LEFT JOIN b on a.id = b.id ... WHERE b.id is null
>than with
>WHERE NOT EXISTS (select 1 from b where a.id = b.id)
Hi Jonathan,
Thanks for adding this. This is not the first time I witness or hear
reports of this either. It seems that the SQL Server development team
put most effort in the optimization of join strategies, since they are
used most often in typical queries.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment