I have two tables GCR_CONTACTS and GCR_EADDRESSES , which are linked on contact_id.
I need to select the first 500000 contact_id's of those contacts where there are duplicates on first name & email address (they can be different case, and have extra spaces).
I have written this query:
Contact_id is the primary key of GCR_CONTACTS.
Counts are:
GCR_CONTACTS : 10 million records
GCR_EADDRESSES : 18 million records
GCR_EADDRESSES where primary_eaddress_flag = 'Y' and eaddress_type = 'EMAIL' : 5 million records
Explain Plan (apologies for rubbish formatting):
Operation Object Name Rows Bytes Cost
SELECT STATEMENT Hint=CHOOSE 500 K 36 G
COUNT STOPKEY
FILTER
HASH JOIN 885 K 28 M 92265
TABLE ACCESS FULL GCR_CONTACTS 9 M 100 M 48082
VIEW index$_join$_003 17 M 384 M 31520
HASH JOIN 885 K 28 M 92265
INDEX FAST FULL SCAN GCR_EADDRS_CONTACT_ID_FK_I 17 M 384 M 4875
INDEX FAST FULL SCAN GCR_EADDRS_EADDRESS_I 17 M 384 M 4875
FILTER
SORT GROUP BY 886 36 K 73723
HASH JOIN 17 K 726 K 73644
VIEW index$_join$_006 1 M 51 M 21077
HASH JOIN 1 M 51 M
INDEX RANGE SCAN GCR_EADDR_FUNC2_I 1 M 51 M 9262.439453125
INDEX FAST FULL SCAN GCR_EADDRS_EADDRESS_I 1 M 51 M 9262.439453125
TABLE ACCESS FULL GCR_CONTACTS 9 M 100 M 48082
There are indexes on contact_id,eaddress_type and primary_eaddress_flag.
The query is taking far too long to exexcute, any ideas as to how to change/tune it?
I need to select the first 500000 contact_id's of those contacts where there are duplicates on first name & email address (they can be different case, and have extra spaces).
I have written this query:
Code:
select * from
(select c.contact_id,'same first_name||email'
from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e
where c.contact_id = e.contact_id
and exists
(select 'x' from
(select upper(trim(first_name))||upper(trim(eaddress)) as first_name_email
from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e
where e.contact_id = c.contact_id
and e.eaddress_type ='EMAIL' and primary_eaddress_flag = 'Y'
group by upper(trim(first_name))||upper(trim(eaddress))
having count(distinct c.contact_id) > 1) z
where z.first_name_email = upper(trim(c.first_name))||upper(trim(e.eaddress))))
where rownum <= 500000
Contact_id is the primary key of GCR_CONTACTS.
Counts are:
GCR_CONTACTS : 10 million records
GCR_EADDRESSES : 18 million records
GCR_EADDRESSES where primary_eaddress_flag = 'Y' and eaddress_type = 'EMAIL' : 5 million records
Explain Plan (apologies for rubbish formatting):
Operation Object Name Rows Bytes Cost
SELECT STATEMENT Hint=CHOOSE 500 K 36 G
COUNT STOPKEY
FILTER
HASH JOIN 885 K 28 M 92265
TABLE ACCESS FULL GCR_CONTACTS 9 M 100 M 48082
VIEW index$_join$_003 17 M 384 M 31520
HASH JOIN 885 K 28 M 92265
INDEX FAST FULL SCAN GCR_EADDRS_CONTACT_ID_FK_I 17 M 384 M 4875
INDEX FAST FULL SCAN GCR_EADDRS_EADDRESS_I 17 M 384 M 4875
FILTER
SORT GROUP BY 886 36 K 73723
HASH JOIN 17 K 726 K 73644
VIEW index$_join$_006 1 M 51 M 21077
HASH JOIN 1 M 51 M
INDEX RANGE SCAN GCR_EADDR_FUNC2_I 1 M 51 M 9262.439453125
INDEX FAST FULL SCAN GCR_EADDRS_EADDRESS_I 1 M 51 M 9262.439453125
TABLE ACCESS FULL GCR_CONTACTS 9 M 100 M 48082
There are indexes on contact_id,eaddress_type and primary_eaddress_flag.
The query is taking far too long to exexcute, any ideas as to how to change/tune it?