Generic delete of duplicate Recs

Schweiger · Aug 14, 2002

Based on the suggestions of jimstarr in a related topic I wrote a procedure to genericaly delete duplicate Records in any table. Say, I don't want to have to know what the table contains and I don't want to analyse the indexes and so on. I just want to delete duplicate records...
My code is:

Code:

&& Deletes all duplicate Records out of the open table
&& Returns true on success else false
LOCAL AllFields,nCnt,nMax,cTableName

IF .NOT. (ALIAS() == &quot;&quot;)
	USE DBF() EXCL
	AllFields = &quot;&quot;
	nMax = FLDCOUNT()
	nCnt = 1
	DO WHILE nCnt <= nMax
		AllFields = AllFields + FIELDS(nCnt) + &quot;,&quot;	
		nCnt = nCnt + 1
	ENDDO
	IF .NOT. (AllFields == &quot;&quot;)
		AllFields = SUBSTR(AllFields,1,LEN(AllFields)-1)
	ENDIF
	
	cTableName = ALIAS()
	SELECT * FROM (cTableName) INTO CURSOR mycursor GROUP BY &AllFields
	SELECT (cTableName)
	ZAP
	APPEND FROM DBF('mycursor')
	USE DBF()
	RETURN .T.
ELSE
	RETURN .F.
ENDIF

This works fine with small tables but with mulitple fields the SELECT-Statement leads to a critical failure and vfp crashs. Think the number of allowed fields in the group by clause is restricted... Does anybody know a solution for a generic version?

thanks in advance for any suggestions
Andreas

danceman · Aug 14, 2002

I think you have a misconception about finding duplicates.

this will only find duplicates if all the fields are duplicated. most cases are to find duplicates of only certain field(s).

select col1,col3, count(col1)as dup1 ,count(col3) as dup2;
from mytable
group by col1, col3
having dup1 > 1 and/or dup2 > 1

the col1 and col3 and the count is on the fields that you are looking for duplicates. Attitude is Everything

Schweiger · Aug 14, 2002

My problem is, that I don't know which fields make up the unique key of the tables in advance. So I would have to analyse the tables to find out which fields make a record unique. But this is exactly what I don't want, because it is time consuming and the solution has only to work till the new solution gets to work. Because the whole database-concept is bad (tables with 251 fields, redundancy, no documentation, ...) I don't want to analyse it at all. But I have to delete duplicate Records of about 130 tables I don't know exactly what they contain at all...
So I thought the best would be a generic routine I can call for each table and I wouldn't have to care about any content of the tables, and I could be really sure, that I don't delete a record that isn't really exactly the same.
I agree that it isn't good programming style but I would like a quick and dirty solution.
What I've found out till now is, that my code works with 199 fields but not with 200...

thank you all the same danceman
Andreas

danceman · Aug 14, 2002

to find a duplicate record across all fields

select * ,count(*) as dup from mytable having dup > 1

will provide a list of records that all fields are duplicated.

don't remember it, but I believe there is a function that returns the indexes on a table. Attitude is Everything

Schweiger · Aug 14, 2002

I just get one single line with

Code:

select * ,count(*) as dup from mytable having dup > 1

where dup shows up with the number of records in the table.
Perhaps I missunderstand the use of having...

Andreas

danceman · Aug 14, 2002

temp = afields('mytable')

ctemp = ''
for i = 1 to temp
ctemp = ctemp + alltrim(i)+','
next

in eiather the for loop or afterwards remove the trailing comma

select * ,count(*) as dup from mytable group by &ctemp having dup > 1
Attitude is Everything

baltman · Aug 14, 2002

Danceman is on the right path... you want to concatenate every field into 'one' and then do a select distinct statement. I've never tried this with 200+ fields though...

You'd have to treat numbers as:
alltr(str(NumberField,FieldWidth,DecimalCount)))

And Dates as:
alltr(dtoc(DateField))

And Text Fields as:
alltr(TextField)

e.g.
sele dist alltr(str(Number1,FieldWidth,DecimalCount)))+;
alltr(dtoc(Date1))+;
alltr(TextField),* group by 1 from TableName;
into table NoDups

This can be created using afields() in a 'for x = 1 to afields(temparray)' loop with if statements based on:
field name(temparray(x,1))
field type (temparray(x,2))
length (temparray(x,3))
decimals (temparray(x,4))

Brian

Schweiger · Aug 19, 2002

Neither suggestion works... I allways get stuck in a general failure... So I've decided to use my code to get a subset of possible duplicates and to fill them in a temporary table. And on this table I run the routine again on the rest of the fields from 200 further. As I insert the recno of the records in my temporary table I'm able to delete the duplicate records in my original table afterwards.
Thank you all the same for the suggestions

Andreas

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Generic delete of duplicate Recs

Schweiger

Programmer

danceman

Programmer

Schweiger

Programmer

danceman

Programmer

Schweiger

Programmer

danceman

Programmer

baltman

Technical User

Schweiger

Programmer

Similar threads

Part and Inventory Search

Sponsor