Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extreamly Slow Search?

Status
Not open for further replies.

michaeljiang

Programmer
Joined
Aug 4, 2007
Messages
5
Location
US
I have the following 2 tables

table pictureinarticle{
'id' int(10) unsigned NOT NULL auto_increment,
'articleid' int(10) unsigned NOT NULL,
'pictureid' int(10) unsigned NOT NULL,
PRIMARY KEY ('id'),
KEY 'pictureid' ('picutreid', 'articleid')
}

table article{
'id' int(10) unsigned NOT NULL auto_increment,
'timestamp' timestamp(14) NOT NULL,
PRIMARY KEY ('id'),
KEY 'timestamp' ('timestamp')
}

'articleid' in table 'pictureinarticle' is PRIMARY KEY 'id' in 'article'.

Now, I want to search total number of pictures in all articles that are in a time range START and END (both of TIMESTAMP type). It is a simple search as follows:

select count(pictureid) from pictureinarticle where articleid in ( select id from article where timestamp between START and END);

PROBLEM:
- the above search took about 45 seconds to finish.

Data:
- Number of articles in [START, END] is around 260,000, returned from 'select id from article where timestamp between START and END'.
- Number of records in table 'pictureinarticle' is around 1,400,000.
- Number of records returned by above select is around 520,000.

Platform:
Linux w/ 2.8GHz Interl Xeon.

Analysis:
- select articles in [START, END] is fast, no more than 1 second: select id from article where timestamp between START and END.
-it is 'select count(a.pictureid) from pictureinarticle a, article b where a.articleid in (...)' that actually cost 110 seconds.
This seems to be due to a poor indexed table? But I did have 'articleid' indexed in table 'pictureinarticle'.

EXPLAIN above query output:

*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: pictureinarticle
type: index
possible_keys: NULL
key: pictureid
key_len: 7
ref: NULL
rows: 1342998
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: article
type: unique_subquery
possible_keys: PRIMARY,timestamp
key: PRIMARY
key_len: 3
ref: func
rows: 1
Extra: Using where

So, a 45 seconds to get around 520,000 results is rather slow. What may cause this slow search problem?

Any thoughts? Thanks!
 
Hi

That must be slow.
Code:
[b]select[/b]
count(*)

[b]from[/b] pictureinarticle [b]as[/b] pa
[b]inner join[/b] article [b]as[/b] a [b]where[/b] a.id=pa.articleid

[b]where[/b] a.timestamp [b]between[/b] [green][i]START[/i][/green] [b]and[/b] [green][i]END[/i][/green]

Feherke.
 
[q]select
count(*)

from pictureinarticle as pa
inner join article as a on a.id=pa.articleid

where a.timestamp between START and END[/q]

Thanks! In fact, I've also tried this and the following:

select count(*) from pictureinarticle pa, article a
where pa.articleid=a.id
and
a.timestamp between START and END

Both took similar amount of time, around 38 seconds, definitely better than the one posted in question. So, I guess that slow query is due to a all table scan to check a "IN" set operation. But a explicit "JOIN" does not improve much (well, yes, a 15% faster). Can we do better than this?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top