Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Filtering out Duplicates from a Dataset 1

Status
Not open for further replies.

ICTECH

Technical User
Joined
Jun 5, 2002
Messages
131
Location
CA
Hi All..

I've been doing a bit of a search on the web looking for a way to filter out duplicate records in a dataset. HELP???

 
The easiest way to do it would be to change your query so that duplicate records aren't returned.

The other option would be to loop through the data table and create either a second table or a hash table (Depending on if you are only checking for a duplicate key value, or an entire duplicate row).

-Rick

----------------------

[monkey] I believe in killer coding ninja monkeys.[monkey]
 
Hi ThatRickGuy...

The fun part is that the dataset is not being filled by a query but by a scanner.. The problem is that duplicate could be scanned. Hmmm.. I guess, it would have to check for duplicate key values. If they are present, remove any additional records with the same data...
 
By scanner I'm assuming you mean a bar code reader. Sat here scratching my head wondering how you would store numerous scanned images/documents in a dataset.

If the data table is having a row added each time an item is scanned, you could check to see if that row exist prior to adding it. Use the datatable.select method to find and rows that match the row you are about to use.

Or you could set the key field of your datatable to be a primary key. then put the add row code in a try block and catch the primary key exception that would be thrown when a duplicate key was entered. That would probrably be more efficient in the long run.

-Rick

----------------------

[monkey] I believe in killer coding ninja monkeys.[monkey]
 
Hi ThatRickGuy...

Thanks, I'll try the "Try and Catch".. Thanks... Will let you know how it goes...

 
Hi ThatRickGuy..

Well I tried the "Try and Catch" and that didn't work.. An error message started popping up saying that entry was in the table..

Dop you know of any function / routen that would go throught the dataset and remove any duplicate entries?

Thanks....
 
Well I tried the "Try and Catch" and that didn't work.. An error message started popping up saying that entry was in the table..

Post your try/catch code. that is exactly what should be happening, but you don't need to display the exception.

-Rick

----------------------

[monkey] I believe in killer coding ninja monkeys.[monkey]
 
I would think that using a dataset to watch for duplicates would be considered excessive use of force. Why not just create a string of the key values and use instr() to look for duplicates?
-Karl

[red] Cursors, triggers, user-defined functions and dynamic SQL are an axis of evil![/red]
[green]Life's uncertain...eat dessert first...www.deerfieldbakery.com[/green]
 
I would think that using a dataset to watch for duplicates would be considered excessive use of force. Why not just create a string of the key values and use instr() to look for duplicates?

It's a matter of percentages. Let's say this scanning deal is similar to something like Regis or some company that inventories large retail stores/warehouses. A single person could easily scan 10,000 items in a shift. if the likelyhood of a duplicate was low, say 10%, then a proactive duplicate prvention (hashtable/string list of keys) would run 90% of the time when it wasn't needed. It would also have the overhead of 9,000 unique keys in memory. Where as the passive prevention would use no more memory, and would only run when a duplicate add was attempted.

But an exception is slower then the equivilant code for the most part. (ie: checking if something is nothing is faster then catching the exception raised by trying to use something that is nothing). So if you were only looking at a smaller sample (say 100 items) and your duplicate rate was higher (say 90%), then it would likely be more efficient to use a proactive method. Although it would still have the memory overhead, the smaller count of scans would mean that it wouldn't get as large. And since the failure rate is so high, the extra 10% of the time the proactive search runs would likely be less then the extra time taken by the exception handling.

Personally, in this case, I think exception handling will provide for a smoother, more elligant piece of code.

-Rick

----------------------

[monkey] I believe in killer coding ninja monkeys.[monkey]
 
I wouldn't consider the exception handling methodology a more efficient or elegant technique (but rather pedantic technique) even when the duplication rate is low. The absence of an exception doesn't mean that no code was run...it simply was run by another process.
The other more important issues to consider are:
[ul][li]What is the effect/implication of two scans of the same item?

[/li][li]Is there a multi-user (scanner) issue regarding duplicates? This could indeed make the exception handling approach with a persisted dataset a necessity.[/li][/ul]

Creating a non-persisted dataset will always be more resource intensive (memory and cpu cycle) than building your own string. However, should memory become an issue, I would then consider a simple 1-byte record length old-fashioned ISAM file that uses the key value as the index to locate and mark the 1-byte.
-Karl



[red] Cursors, triggers, user-defined functions and dynamic SQL are an axis of evil![/red]
[green]Life's uncertain...eat dessert first...www.deerfieldbakery.com[/green]
 
Hi ThatRickGuy & donutman...

There applications is a single user. But the user could scan (in most cases probably will) duplicate items. What I was thinking was to scan the dataset after the user had finished scanning and take out any duplicates that may be present before passing the dataset off to be processed.

Memory in the macine isn't a current concern at this time.

ThatRichGuy, here is the "try" code from my app..
try
r = DataSet1.Tables(0).NewRow()
r("PName") = SName.text
DataSet1.Tables(0).Rows.Add(r)
catch
end try

The error from the "Try" was not being displayed my any code I placed in after the catch statment.It was an unhandled exception, because there Product Name was already in the dataset and the app stopped..
 
Interesting. The duplicate key exception should be thrown at the DataSet1.Tables(0).Rows.Add(r) line. which should be caught in that try/catch block. Why it would be unhandled I don't know.

donut's plan should work if you can't get that unhandled exception to behave.

inside your loop,
use string.indexof() to determine if the key field is in a coma delimited string. (ie: if sListOfKeys.IndexOf(KeyValue)>=0)

if it's not:
add the key field to a coma delimited string
add the row to the table
if it is:
don't add the row

-Rick

----------------------

[monkey] I believe in killer coding ninja monkeys.[monkey]
 
hi ThatRichGuy..

Do you have an example.. I'm having a blond moment and can't see how to work it in... :(
 
Assuming SName.text is the primary key of the data table
Code:
dim sListOfKeys as string

...

KeyValue = SName.text

if sListOfKeys.IndexOf(KeyValue) = -1 then
 'the key was not found in the string
 sListOfKeys &= KeyValue & ","
 
 r = DataSet1.Tables(0).NewRow()
 r("PName") = KeyValue
 DataSet1.Tables(0).Rows.Add(r)
else
 'SName was already entered, do not add
end if

...

-Rick

----------------------

[monkey] I believe in killer coding ninja monkeys.[monkey]
 
Hay ThatRichGuy...

Well It looks like its working.. Thank you very much...

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top