I wonder if you can help me with this one? I have two tables as follows:-
Firstly VP_JOURNEYS, about 1.5 million records, indexed on (Start, End):
CREATE TABLE VP_Journeys
(PersonID MEDIUMINT UNSIGNED NOT NULL,
Start SMALLINT UNSIGNED NOT NULL,
End SMALLINT UNSIGNED NOT NULL);
ALTER TABLE VP_Journeys ADD INDEX (Start, End);
...and RUN_JOURNEYS, about 30 thousand records, uniquely indexed on (Start, End):
CREATE TABLE Run_Journeys
(Start SMALLINT UNSIGNED NOT NULL,
End SMALLINT UNSIGNED NOT NULL,
GrossOTS DECIMAL(20,15) UNSIGNED NOT NULL,
NetOTS DECIMAL(20,15) UNSIGNED NOT NULL,
NonCoverage DECIMAL(20,15) UNSIGNED NOT NULL,
Invisibility DECIMAL(20,15) UNSIGNED NOT NULL);
ALTER TABLE Run_Journeys ADD PRIMARY KEY (Start, End);
These tables are used to populate a new table:-
INSERT INTO Run_CampaignAudience
SELECT
VP_Journeys.PersonID,
SUM(Run_Journeys.GrossOTS) AS GrossOTS,
SUM(Run_Journeys.NetOTS) AS NetOTS,
SUM(IFNULL(LN(Run_Journeys.NonCoverage),-999)) AS LnNonC,
SUM(IFNULL(LN(Run_Journeys.Invisibility),-999)) AS LnInvisibility
FROM VP_Journeys INNER JOIN Run_Journeys
ON (VP_Journeys.Start = Run_Journeys.Start)
AND (VP_Journeys.End = Run_Journeys.End)
WHERE Run_Journeys.GrossOTS IS NOT NULL
GROUP BY VP_Journeys.PersonID;
The query is running very slowly, about 2½ minutes. Using EXPLAIN on the SELECT portion of this statement suggests that it is not properly using the index in the Run_Journeys table.
Run_Journeys
possible_keys:- PRIMARY
key:- NULL
rows:- 30924
Extra:- Using where; Using temporary; Using filesort
VP_Journeys
possible_keys:- Start
key:- Start
rows:- 41
Extra:- Using where
Any clues or explanation as to why it appears not to be using the Run_Journeys index properly would be more than welcome. I'm reading Paul DuBois' book which has an *excellent* section on query optimisation, and working through a number of suggestions there (playing with KEY_BUFFER_SIZE, playing with alternate column formats, etc), but nothing has yet had the significant effect I'm hoping for.
Current unresolved thoughts are:-
? It's not just the INSERT statement (although this might need work), as the SELECT statement on its own takes a goodly amount of time
? Why is it still only using the Start column from VP_Journeys (or is that just the name of the index?)
? The query is returning more than 30% of the tables, indeed it would return near 100% of the run_journey table
Firstly VP_JOURNEYS, about 1.5 million records, indexed on (Start, End):
CREATE TABLE VP_Journeys
(PersonID MEDIUMINT UNSIGNED NOT NULL,
Start SMALLINT UNSIGNED NOT NULL,
End SMALLINT UNSIGNED NOT NULL);
ALTER TABLE VP_Journeys ADD INDEX (Start, End);
...and RUN_JOURNEYS, about 30 thousand records, uniquely indexed on (Start, End):
CREATE TABLE Run_Journeys
(Start SMALLINT UNSIGNED NOT NULL,
End SMALLINT UNSIGNED NOT NULL,
GrossOTS DECIMAL(20,15) UNSIGNED NOT NULL,
NetOTS DECIMAL(20,15) UNSIGNED NOT NULL,
NonCoverage DECIMAL(20,15) UNSIGNED NOT NULL,
Invisibility DECIMAL(20,15) UNSIGNED NOT NULL);
ALTER TABLE Run_Journeys ADD PRIMARY KEY (Start, End);
These tables are used to populate a new table:-
INSERT INTO Run_CampaignAudience
SELECT
VP_Journeys.PersonID,
SUM(Run_Journeys.GrossOTS) AS GrossOTS,
SUM(Run_Journeys.NetOTS) AS NetOTS,
SUM(IFNULL(LN(Run_Journeys.NonCoverage),-999)) AS LnNonC,
SUM(IFNULL(LN(Run_Journeys.Invisibility),-999)) AS LnInvisibility
FROM VP_Journeys INNER JOIN Run_Journeys
ON (VP_Journeys.Start = Run_Journeys.Start)
AND (VP_Journeys.End = Run_Journeys.End)
WHERE Run_Journeys.GrossOTS IS NOT NULL
GROUP BY VP_Journeys.PersonID;
The query is running very slowly, about 2½ minutes. Using EXPLAIN on the SELECT portion of this statement suggests that it is not properly using the index in the Run_Journeys table.
Run_Journeys
possible_keys:- PRIMARY
key:- NULL
rows:- 30924
Extra:- Using where; Using temporary; Using filesort
VP_Journeys
possible_keys:- Start
key:- Start
rows:- 41
Extra:- Using where
Any clues or explanation as to why it appears not to be using the Run_Journeys index properly would be more than welcome. I'm reading Paul DuBois' book which has an *excellent* section on query optimisation, and working through a number of suggestions there (playing with KEY_BUFFER_SIZE, playing with alternate column formats, etc), but nothing has yet had the significant effect I'm hoping for.
Current unresolved thoughts are:-
? It's not just the INSERT statement (although this might need work), as the SELECT statement on its own takes a goodly amount of time
? Why is it still only using the Start column from VP_Journeys (or is that just the name of the index?)
? The query is returning more than 30% of the tables, indeed it would return near 100% of the run_journey table