Help with tuning of a query 1

johnnybee · Jul 21, 2004

I have two tables GCR_CONTACTS and GCR_EADDRESSES , which are linked on contact_id.

I need to select the first 500000 contact_id's of those contacts where there are duplicates on first name & email address (they can be different case, and have extra spaces).

I have written this query:

Code:

select * from
(select c.contact_id,'same first_name||email'
from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e 
where c.contact_id = e.contact_id
and exists
(select 'x' from
(select upper(trim(first_name))||upper(trim(eaddress)) as first_name_email 
from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e 
where e.contact_id = c.contact_id
and e.eaddress_type ='EMAIL' and primary_eaddress_flag = 'Y'
group by upper(trim(first_name))||upper(trim(eaddress))
having count(distinct c.contact_id) > 1) z
where z.first_name_email = upper(trim(c.first_name))||upper(trim(e.eaddress))))
where rownum <= 500000

Contact_id is the primary key of GCR_CONTACTS.

Counts are:

GCR_CONTACTS : 10 million records
GCR_EADDRESSES : 18 million records
GCR_EADDRESSES where primary_eaddress_flag = 'Y' and eaddress_type = 'EMAIL' : 5 million records

Explain Plan (apologies for rubbish formatting):

Operation Object Name Rows Bytes Cost

SELECT STATEMENT Hint=CHOOSE 500 K 36 G
COUNT STOPKEY
FILTER
HASH JOIN 885 K 28 M 92265
TABLE ACCESS FULL GCR_CONTACTS 9 M 100 M 48082
VIEW index$_join$_003 17 M 384 M 31520
HASH JOIN 885 K 28 M 92265
INDEX FAST FULL SCAN GCR_EADDRS_CONTACT_ID_FK_I 17 M 384 M 4875
INDEX FAST FULL SCAN GCR_EADDRS_EADDRESS_I 17 M 384 M 4875
FILTER
SORT GROUP BY 886 36 K 73723
HASH JOIN 17 K 726 K 73644
VIEW index$_join$_006 1 M 51 M 21077
HASH JOIN 1 M 51 M
INDEX RANGE SCAN GCR_EADDR_FUNC2_I 1 M 51 M 9262.439453125
INDEX FAST FULL SCAN GCR_EADDRS_EADDRESS_I 1 M 51 M 9262.439453125
TABLE ACCESS FULL GCR_CONTACTS 9 M 100 M 48082

There are indexes on contact_id,eaddress_type and primary_eaddress_flag.

The query is taking far too long to exexcute, any ideas as to how to change/tune it?

LKBrwnDBA · Jul 21, 2004

Did you try something more simple, kinda like:

Code:

select contact_id, first_name_email, count(*)
  from (
select c.contact_id
     , upper(trim(first_name))||upper(trim(eaddress)) 
        as first_name_email
  from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e 
 where c.contact_id = e.contact_id
   and eaddress_type ='EMAIL' 
   and primary_eaddress_flag = 'Y')
 group by contact_id, first_name_email
having count(*) > 1 
/

Check the explain plan and see if this answers your requirement.

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

LKBrwnDBA · Jul 21, 2004

Maybe still more concise:

Code:

select * from (
select c.contact_id
     , upper(trim(first_name))||upper(trim(eaddress)) 
        as first_name_email
     , count(*)   
  from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e 
 where c.contact_id = e.contact_id
   and eaddress_type ='EMAIL' 
   and primary_eaddress_flag = 'Y'
 group by contact_id, first_name_email
having count(*) > 1 )
where rownum <= 500000
/

HTH

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

LKBrwnDBA · Jul 21, 2004

Maybe still more concise:

Code:

select * from (
select c.contact_id
     , upper(trim(first_name))||upper(trim(eaddress)) 
        as first_name_email
     , count(*)   
  from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e 
 where c.contact_id = e.contact_id
   and eaddress_type ='EMAIL' 
   and primary_eaddress_flag = 'Y'
 group by contact_id, 
     , upper(trim(first_name))||upper(trim(eaddress)) 
having count(*) > 1 )
where rownum <= 500000
/

HTH

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

johnnybee · Jul 21, 2004

Thanks for the reply, however, this does not quite solve the problem.

What I am after is those with duplicate first_name & email address, regardless of contact_id.

As you are grouping by both contact_id and first_name_email, your query will only pick out duplicates within contact_id.

Which I think is why my query ended up more complicated.

e.g. For this data...

1,'anemail@hotmail.com'
2,'anemail@hotmail.com'

I would need both contact_id's 1 & 2 returned.

LKBrwnDBA · Jul 21, 2004

Sorry, clicking on any of these does not work, only opens e-mail 'compose' window:

1,'anemail@hotmail.com'
2,'anemail@hotmail.com'

I would need both contact_id's 1 & 2 returned.

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

LKBrwnDBA · Jul 21, 2004

Ooops, thought I had to click on the example...

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

LKBrwnDBA · Jul 21, 2004

If it's a one time deal it may work faster if you do it in two parts:

a) Create an indexed table:

Code:

create table tmp_email_dups
(first_name_email 
,cnt
,constraint pk_tmp_email_dups primary key(first_name_email)
)
organization index pcttheshold 40 including cnt
overflow tablespace users nologging
as
select upper(trim(first_name))||upper(trim(eaddress)) 
          as first_name_email
     , count(*) cnt
  from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e 
 where c.contact_id = e.contact_id
   and eaddress_type ='EMAIL' 
   and primary_eaddress_flag = 'Y'
 group by upper(trim(first_name))||upper(trim(eaddress)) 
having count(*) > 1 
/

b) Join the tables:

Code:

select * from 
  from cccadm.gcr_contacts c, cccadm.gcr_eaddresses e 
     , tmp_email_dups z
 where c.contact_id = e.contact_id
   and z.first_name_email = upper(trim(c.first_name))
                          ||upper(trim(e.eaddress))
   and rownum <= 500000
/

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

johnnybee · Jul 22, 2004

Thats done the trick. Thank you and have a star

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help with tuning of a query 1

johnnybee

Programmer

LKBrwnDBA

MIS

LKBrwnDBA

MIS

LKBrwnDBA

MIS

johnnybee

Programmer

LKBrwnDBA

MIS

LKBrwnDBA

MIS

LKBrwnDBA

MIS

johnnybee

Programmer

Similar threads

Part and Inventory Search

Sponsor