Friday, February 24, 2012

Is Full Text Struggling...

H there,
We have a query that is taking too long to run, which uses Full Text
(MSSQL2000).
The query below, when using this clause takes between 11 seconds and 10
minutes! QA thnks that the full text search will cost 70% of the query cost.
CONTAINS( Article_text, '("Food" AND "Supermarkets") OR ("CITIZEN CARD") OR
("CJD") OR ("E Coli") OR ("E-Coli") OR ("Food Additives") OR ("Food Safety")
OR ("Genetically Modified Foods") OR ("Kwik Save") OR ("Proof of age card")
OR ("Somerfield") OR ("Supermarkets") OR ("Wine Reviews") AND NOT ("Ahold")
AND NOT ("Beth Israel") AND NOT ("European equity preview") AND NOT
("European stocks may decline") AND NOT ("European stocks may rise") AND NOT
("mediaplex") AND NOT ("UK Stocks Factors")'))
The query below, when using this clause, takes only between 0.02 secs and 2
secs. QA thnks that the full text search will cost 50% of the query cost.
CONTAINS( Article_text, '("alcopops" AND "advertising") OR ("alcopops" AND
"culture") OR ("alcopops" AND "designated driver initiative") OR ("alcopops"
AND "drink driving") OR ("alcopops" AND "legislation") OR ("alcopops" AND
"price") OR ("alcopops" AND "pricing") OR ("alcopops" AND "underage
driving")')
Why is the second query instant and the first taking ages?
Sometimes, the longer queries such as the first one here can take *much*
longer to run (like, 5 minutes). We're hoping that we can get even the
queries with more expressions to run inside a few seconds. In fact, they did
when we had < 400,000 rows.
Is it possible that our full text indexing is just set up wrong, or that our
hardware isn't sufficient?
Some additional facts that may help...
The table (and FT index) only have 800,000 rows
We run 116 of these queries in a row, directly after each other.
We've cleared out all stop words, since we want to index on anything.
The server has 1GB RAM, single P4 processor, 8GB free space across 2 raid
disks.
Whilst the queries are running, I'm not seeing massive memory use.
Any help much appreciated. Please let me know if you need more info.
Tobes
I would suspect its all the search arguments and Boolean logic you have
which is causing the problems especially the AND NOTs.
Note that this
CONTAINS( Article_text, '("alcopops" AND "advertising") OR ("alcopops" AND
"culture") OR ("alcopops" AND "designated driver initiative") OR
("alcopops"
AND "drink driving") OR ("alcopops" AND "legislation") OR ("alcopops" AND
"price") OR ("alcopops" AND "pricing") OR ("alcopops" AND "underage
driving")')
is equivalent to the simpler
CONTAINS( Article_text, '"alcopops" AND ("advertising" OR "culture" OR
"designated driver initiative" OR "drink driving" OR "legislation" OR
"price" OR "pricing" OR "underage driving")')
Revisting the AND NOTs, basically the way this is processed is all matches
are returned for the first part
("Food" AND "Supermarkets") OR ("CITIZEN CARD") OR ("CJD") OR ("E Coli") OR
("E-Coli") OR ("Food Additives") OR ("Food Safety") OR ("Genetically
Modified Foods") OR ("Kwik Save") OR ("Proof of age card") OR
("Somerfield") OR ("Supermarkets") OR ("Wine Reviews")
and then you trim rows which contain
AND NOT ("Ahold") AND NOT ("Beth Israel") AND NOT ("European equity
preview") AND NOT ("European stocks may decline") AND NOT ("European stocks
may rise") AND NOT
("mediaplex") AND NOT ("UK Stocks Factors")'))
This trimming is very expensive.
One thing you might do is sp_fulltext_service 'resource_usage' to 5. This
might help slightly.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Tobin Harris" <tobin@._do_not_spam_tobinharris.com> wrote in message
news:42af07ca$0$2588$da0feed9@.news.zen.co.uk...
> H there,
> We have a query that is taking too long to run, which uses Full Text
> (MSSQL2000).
> The query below, when using this clause takes between 11 seconds and 10
> minutes! QA thnks that the full text search will cost 70% of the query
cost.
> CONTAINS( Article_text, '("Food" AND "Supermarkets") OR ("CITIZEN CARD")
OR
> ("CJD") OR ("E Coli") OR ("E-Coli") OR ("Food Additives") OR ("Food
Safety")
> OR ("Genetically Modified Foods") OR ("Kwik Save") OR ("Proof of age
card")
> OR ("Somerfield") OR ("Supermarkets") OR ("Wine Reviews") AND NOT
("Ahold")
> AND NOT ("Beth Israel") AND NOT ("European equity preview") AND NOT
> ("European stocks may decline") AND NOT ("European stocks may rise") AND
NOT
> ("mediaplex") AND NOT ("UK Stocks Factors")'))
> The query below, when using this clause, takes only between 0.02 secs and
2
> secs. QA thnks that the full text search will cost 50% of the query cost.
> CONTAINS( Article_text, '("alcopops" AND "advertising") OR ("alcopops" AND
> "culture") OR ("alcopops" AND "designated driver initiative") OR
("alcopops"
> AND "drink driving") OR ("alcopops" AND "legislation") OR ("alcopops" AND
> "price") OR ("alcopops" AND "pricing") OR ("alcopops" AND "underage
> driving")')
> Why is the second query instant and the first taking ages?
> Sometimes, the longer queries such as the first one here can take *much*
> longer to run (like, 5 minutes). We're hoping that we can get even the
> queries with more expressions to run inside a few seconds. In fact, they
did
> when we had < 400,000 rows.
> Is it possible that our full text indexing is just set up wrong, or that
our
> hardware isn't sufficient?
> Some additional facts that may help...
> The table (and FT index) only have 800,000 rows
> We run 116 of these queries in a row, directly after each other.
> We've cleared out all stop words, since we want to index on anything.
> The server has 1GB RAM, single P4 processor, 8GB free space across 2 raid
> disks.
> Whilst the queries are running, I'm not seeing massive memory use.
> Any help much appreciated. Please let me know if you need more info.
> Tobes
>
>
|||Hi Hilary,
Thank you for the reply. Yes, the queries run significantly quicker without
the NOTs. I had tried the other boolen expressions in a more compact form,
but that made little difference unfortunately. I guess the full text search
engine may do it's own optimising to clean up our verbose queries!
In two weeks we'll be throwing more hardware at the program (doubling ram,
increasing disk capacity, and introducing dual Xeon processors), so
hopefully that will make the situation better.
Our main problem is that we want scalability. At the moment we have 116
"projects", each with their own queries that run one by one. These are
taking hours to run (some queries quick, some looooong!). Do you think we
may benefit from running more than one query in parrallel? For example, have
two processes executing 58 queries each?
Thanks again for your help.
Tobes
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:ex7e8bQcFHA.720@.TK2MSFTNGP15.phx.gbl...
>I would suspect its all the search arguments and Boolean logic you have
> which is causing the problems especially the AND NOTs.
> Note that this
> CONTAINS( Article_text, '("alcopops" AND "advertising") OR ("alcopops" AND
> "culture") OR ("alcopops" AND "designated driver initiative") OR
> ("alcopops"
> AND "drink driving") OR ("alcopops" AND "legislation") OR ("alcopops" AND
> "price") OR ("alcopops" AND "pricing") OR ("alcopops" AND "underage
> driving")')
> is equivalent to the simpler
> CONTAINS( Article_text, '"alcopops" AND ("advertising" OR "culture" OR
> "designated driver initiative" OR "drink driving" OR "legislation" OR
> "price" OR "pricing" OR "underage driving")')
> Revisting the AND NOTs, basically the way this is processed is all matches
> are returned for the first part
> ("Food" AND "Supermarkets") OR ("CITIZEN CARD") OR ("CJD") OR ("E Coli")
> OR
> ("E-Coli") OR ("Food Additives") OR ("Food Safety") OR ("Genetically
> Modified Foods") OR ("Kwik Save") OR ("Proof of age card") OR
> ("Somerfield") OR ("Supermarkets") OR ("Wine Reviews")
> and then you trim rows which contain
> AND NOT ("Ahold") AND NOT ("Beth Israel") AND NOT ("European equity
> preview") AND NOT ("European stocks may decline") AND NOT ("European
> stocks
> may rise") AND NOT
> ("mediaplex") AND NOT ("UK Stocks Factors")'))
> This trimming is very expensive.
> One thing you might do is sp_fulltext_service 'resource_usage' to 5. This
> might help slightly.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "Tobin Harris" <tobin@._do_not_spam_tobinharris.com> wrote in message
> news:42af07ca$0$2588$da0feed9@.news.zen.co.uk...
> cost.
> OR
> Safety")
> card")
> ("Ahold")
> NOT
> 2
> ("alcopops"
> did
> our
>

No comments:

Post a Comment