Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Index use and abuse...

Status
Not open for further replies.

marklenel

Technical User
Aug 29, 2003
46
GB
I wonder if you can help me with this one? I have two tables as follows:-


Firstly VP_JOURNEYS, about 1.5 million records, indexed on (Start, End):

CREATE TABLE VP_Journeys
(PersonID MEDIUMINT UNSIGNED NOT NULL,
Start SMALLINT UNSIGNED NOT NULL,
End SMALLINT UNSIGNED NOT NULL);
ALTER TABLE VP_Journeys ADD INDEX (Start, End);


...and RUN_JOURNEYS, about 30 thousand records, uniquely indexed on (Start, End):

CREATE TABLE Run_Journeys
(Start SMALLINT UNSIGNED NOT NULL,
End SMALLINT UNSIGNED NOT NULL,
GrossOTS DECIMAL(20,15) UNSIGNED NOT NULL,
NetOTS DECIMAL(20,15) UNSIGNED NOT NULL,
NonCoverage DECIMAL(20,15) UNSIGNED NOT NULL,
Invisibility DECIMAL(20,15) UNSIGNED NOT NULL);
ALTER TABLE Run_Journeys ADD PRIMARY KEY (Start, End);


These tables are used to populate a new table:-

INSERT INTO Run_CampaignAudience
SELECT
VP_Journeys.PersonID,
SUM(Run_Journeys.GrossOTS) AS GrossOTS,
SUM(Run_Journeys.NetOTS) AS NetOTS,
SUM(IFNULL(LN(Run_Journeys.NonCoverage),-999)) AS LnNonC,
SUM(IFNULL(LN(Run_Journeys.Invisibility),-999)) AS LnInvisibility
FROM VP_Journeys INNER JOIN Run_Journeys
ON (VP_Journeys.Start = Run_Journeys.Start)
AND (VP_Journeys.End = Run_Journeys.End)
WHERE Run_Journeys.GrossOTS IS NOT NULL
GROUP BY VP_Journeys.PersonID;


The query is running very slowly, about 2½ minutes. Using EXPLAIN on the SELECT portion of this statement suggests that it is not properly using the index in the Run_Journeys table.

Run_Journeys
possible_keys:- PRIMARY
key:- NULL
rows:- 30924
Extra:- Using where; Using temporary; Using filesort

VP_Journeys
possible_keys:- Start
key:- Start
rows:- 41
Extra:- Using where

Any clues or explanation as to why it appears not to be using the Run_Journeys index properly would be more than welcome. I'm reading Paul DuBois' book which has an *excellent* section on query optimisation, and working through a number of suggestions there (playing with KEY_BUFFER_SIZE, playing with alternate column formats, etc), but nothing has yet had the significant effect I'm hoping for.

Current unresolved thoughts are:-
? It's not just the INSERT statement (although this might need work), as the SELECT statement on its own takes a goodly amount of time
? Why is it still only using the Start column from VP_Journeys (or is that just the name of the index?)
? The query is returning more than 30% of the tables, indeed it would return near 100% of the run_journey table
 
Maybe the reason MySQL doesn't use the Run_Journeys index for your query is because it reckons it will be returning so many records that it would be faster not to use the index. However, you can force the use of a particular index by using the USE INDEX qualifier (see p. 888 of DuBois). For example:
[tt]
INSERT INTO Run_CampaignAudience
SELECT
VP_Journeys.PersonID,
SUM(Run_Journeys.GrossOTS) AS GrossOTS,
SUM(Run_Journeys.NetOTS) AS NetOTS,
SUM(IFNULL(LN(Run_Journeys.NonCoverage),-999)) AS LnNonC,
SUM(IFNULL(LN(Run_Journeys.Invisibility),-999)) AS LnInvisibility
FROM
VP_Journeys
INNER JOIN Run_Journeys USE INDEX PRIMARY
ON
(VP_Journeys.Start = Run_Journeys.Start)
AND (VP_Journeys.End = Run_Journeys.End)
WHERE Run_Journeys.GrossOTS IS NOT NULL
GROUP BY VP_Journeys.PersonID;
[/tt]

Since you did not give the VP_Journeys index a name, it uses the name of the first field used, in this case Start. If you wanted to name the index, you could have used:
[tt] ... ADD INDEX StartEnd (Start,End);[/tt]


-----
ALTER world DROP injustice, ADD peace;
 
Many thanks, Tony - you seem to be a regular troubleshooter on most of these posts! Unfortunately the query optimizer remained stubborn and refused to use the Run_Journeys index. Not to worry.

However, I have found that transferring most of these 'run-time' tables to HEAP format speeds things up significantly.

Which leads me on to my next question, for which I may do another post - is there way of measuring the table size for a MEMORY/HEAP table? It would be useful to allocate an appropriate max_heap_table_size function.

Cheers,
Mark
 
To find out the data size for a table (including a HEAP table), you can do:
[tt] SHOW TABLE STATUS like 'tblname'[/tt]
The field Data_length contains the size of the table (not including indexes).

-----
ALTER world DROP injustice, ADD peace;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top