Remove/Delete an old record that duplicates the latest

TheLazyPig · Jul 6, 2020

Hi!

I wanted to remove/delete an old record that duplicates the latest. The only difference are the amount and asof date.

Code:

SELECT * FROM region WHERE rg = 'NCR' AND LEFT(DTOS(asof),6) = '202005' ;
UNION ;
SELECT * FROM region WHERE rg = 'NCR' AND LEFT(DTOS(asof),6) = '201908' ;

And I get this result...

I wanted to remove/delete asof that are from August 2019.
So that asof May 2020 is only the existing record for region NCR.
There will be records that are not matched with the latest. It should not be deleted.

Thank you! [smile]

Olaf Doschke · Jul 6, 2020

Looks like your records don't have a primary key, an id. Anything that updates data (and deletion is just a special update), like an updatable view or any data access layer of an application framework will ask you to have a primary key to be able to address records. Just a first observation. Doesn't mean it's impossible.

I would be cautious, though. Top rows look like you always have more recent records for the same employees, but what about some that have no May 2020 record, as they left the company in between? You don't want to keep their "latest" data - in this context of just picking these two arbitrary months?

A simple way of keeping all latest net pays is to first find all those records grouped by empno with max(asof) and use them as reference:

Code:

Create Cursor crsNetpay (id int autoinc, empno I, netpay Y, asof D)
Insert into crsNetpay (empno, netpay, asof) values (1234, 34000, Date(2019,8,31))
Insert into crsNetpay (empno, netpay, asof) values (1234, 46000, Date(2019,12,31))
Insert into crsNetpay (empno, netpay, asof) values (1234, 58000, Date(2020,3,31))
Insert into crsNetpay (empno, netpay, asof) values (1234, 70000, Date(2020,3,31))
Insert into crsNetpay (empno, netpay, asof) values (1234, 78000, Date(2020,5,31))

Insert into crsNetpay (empno, netpay, asof) values (5678, 44000, Date(2019,8,31))
Insert into crsNetpay (empno, netpay, asof) values (5678, 56000, Date(2019,12,31))
Insert into crsNetpay (empno, netpay, asof) values (5678, 68000, Date(2020,3,31))
Insert into crsNetpay (empno, netpay, asof) values (5678, 80000, Date(2020,3,31))
Insert into crsNetpay (empno, netpay, asof) values (5678, 88000, Date(2020,5,31))

* your query:
SELECT * FROM crsNetpay WHERE LEFT(DTOS(asof),6) = '202005' ;
UNION ;
SELECT * FROM crsNetpay WHERE LEFT(DTOS(asof),6) = '201908' ; 
into cursor Andwhatnow

* a glimpse on the data you want to keep (latest per empno):
Select crsNetPay.* from crsNetPay ;
inner join (Select empno, Max(asof) as maxasof from crsNetPay group by empno) subquery;
on subquery.empno = crsNetPay.empno;
where crsNetPay.asof=subquery.maxasof;
into cursor tokeep

* And now delete anything else:
Delete from crsNetPay where id not in;
(Select crsNetPay.id from crsNetPay ;
inner join (Select empno, Max(asof) maxasof from crsNetPay group by empno) subquery;
on subquery.empno = crsNetPay.empno;
where crsNetPay.asof=subquery.maxasof)

* for comparison and check plan=result:
Select tokeep
Browse name o1 nowait

* the data now showing deletion marks for all deleted rows
Select crsNetPay
Go top
Set Deleted off
Browse name o2 Nowait

o1.top = 0
o1.height = 4*o1.rowheight+Sysmetric(9)
o2.top = o1.height

1. As you don't have an id, it gets much more complicated to use something like where id not in, you have to match a lot of fields. And you can't do where(field3, field5) not in Select field3, fields from...), so you need to invent some identifier of records.
2. This now needs to be limited to the two dates, if you don't want to purge all past records. Also, you want to group more complex than just by empno, including region, etc.
3. VFP is capable of such subqueries. But that mostly only came into VFP9. Better have VFP9, I guess you do.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Jul 6, 2020

The real question is: Why do you keep an old record in the same table? You can always update netpay and only have one record. You can always copy over old records into a history table if you need to conserve this data for a certain time.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

TheLazyPig · Jul 6, 2020

I don't know why but other region only have two records per empno while NCR have multiple. This project was passed by me without even teaching the process so I had to learn it myself. [sad]

Olaf Doschke · Jul 6, 2020

You take it personal, dont't take it personal. This is criticism of the situation and data, in this case, and goes to anyone it concerns. And I lay it down clearly: Tables without a key are a no go.

The reasoning can be as simple as this one region using an outdated version of the software.

Don't just mend symptoms.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

TheLazyPig · Jul 6, 2020

The exe file that I'm running suppose to migrate data from foxpro to oracle. So there's an exe file that runs the process but an error occurred because of duplicate records. It want me to fix it first before running again the program. I only encounter this problem when region NCR is being process. I already checked the data they gave me but there's no duplicate. I think the .prg I run before I run the exe file is the problem or not. [sadeyes]

Olaf Doschke · Jul 6, 2020

I can't decide that from here, but how should duplicates slip in with new netpay amounts, you won't invent random data.

If original data has no duplicates and data just before migration into Oracle has, then check out where that starts.
introduce a monitoring query finding such duplicates as

Code:

Select rg,div,sta,empno from region group by rg,div,sta,empno having count(*)>1

If there is any such record you know previous steps introduced it.

Is it really in the dbfs you insert after your own preprocessing or is it already in Oracle? Do you get duplicates in Oracle because you don't update the Oracle data but always only insert new rows?

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

AndrewMozley · Jul 6, 2020

Hello TLP

One possible method would be to create a cursor from your ‘Region’ table, and order that cursor by Employee + Date DESCENDING (taking the employee number as a character (C) field.)

Then work through the cursor, deleting all but the latest record for each employee from your original table.

Something like this :

Code:

SELECT *, RECNO() AS nRecno FROM Region INTO CURSOR cRegion
INDEX ON Empno + DTOS(Asof) DESCENDING TAG Idx1
lEmpno = "zzzz"
SCAN
   IF cRegion.Empno = lEmpno
      SELECT Region
      LOCATE FOR RECNO() = cRegion.nRecno
      DELETE
      ENDIF
   lEmpno = cRegion.Empno
   ENDSCAN

SELECT Region
*  SET DELETED OFF  -  in case you want to see all the records.
BROWSE

Olaf Doschke · Jul 6, 2020

Hello Andrew, TLP,

that code may be useful, as you can't use my query without an ID column.

Just a remark: You can use RECNO() to have a key value for the rows, which helps in general, not only with my subquery, as you see also in Andrews' scan loop. But as Andrew did you better first read all data including RECNO() and materialize it as in a materialized view, if you know that concept from Oracle. Because in complex queries with joins using RECNO(() is just a recipe for chaos, I spare you the details why the VFP SQL engine is your enemy in this, even when you try to be precise with alias names. It's okay to use it that way in a simple SELECT *, RECNO() FROM sometable, though. So you can get the data with an ID value.

That ar least works for one such session. Never be sure a RECNO() is the same for the same record in the long term, though. When you purge deleted rows from a DBF permanently, then you PACK it, and at that time RECNO()s are newly assigned.

It's surely a valid key during such processing of a full DBF exclusively for the Oracle ETL process, so you can LOCATE RECNO()=nRecno as Andrew does, because RECNO()s won't even change when you delete rows from the regions.dbf, until you not PACK the region.DBF.

And another thing:

I don't know but assume you know Oracle's SQL dialect better, then I'd always insert all rows into a separate Oracle temp table or staging table of a staging database for new/changed data, insert the DBF data fully and then decide from within the Oracle world, whether rows of the staging table should be updating existing Oracle data or be inserted into the Oracle table you want them to end in. It makes VFP just upload its data and then handle it from there with all tools available within Oracöe or whatever RDBMS.

I know Oracle offers a MERGE statement, that turns to be an update when a matching record is found or an insert if it's not matched and thus gets a new id. But from the staged data. So the system you import from will be consistent in the next data import step.

The best solution will be to introduce IDs, though.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Jul 6, 2020

Just to see for yourself, how sensitive queries with RECNO() are and can go wrong.
Especially using the explicit alias names:

Code:

Open Database (Home()+"\Samples\Northwind\northwind.dbc")
Use orders
Select *, Recno("orders") as nrecno from orders Into Cursor Allrecno1

Close tables all
Select *, Recno("orders") as nrecno from orders Into Cursor CorrectRencno

Browsing both results you'll see even though the queries are the same, when there is a workarea orders RECNO("orders") will be the record number within that workarea and it stays constant, the SQL engine will use it's own workers and will have to use another workarea name, which you never get to know.

That's why even in a simple single-table query you better use RECNO() without specifying an alias name for the table. And I think I can spare examples of multiple tables used all in well-intentioned precisely specified RECNO("aliasname") calls, you don't get the RECNO() you actually want.

If you really get DBFs for this import process, you'll be free to add a column, and then the easier way than a full table query creating a cursor is to ALTER TABLE and add an ID int column and then UPDATE thedbf SET id=RECNO() to initialize it to have a key for further processing.

Bye, Olaf

Olaf Doschke Software Engineering

https://www.doschke.name

TheLazyPig · Jul 8, 2020

Thank you for the replies.

What I did to region table...
1) Count to get Distinct rg,div,sta,empno having count > 1 and those that are count = 1
2) Then I used the Distinct rg,div,sta,empno to get their max asof date
3) I still got duplicate record because of asof date are the same while netpay amounts are different
4) Matched the duplicate record to oracle db to get the latest netpay amount (but what I did is I used the unmatch)
5) From Distinct table NOT IN unmatch. I save to new dbf file.
6) Deleted all rg NCR from region table
7) APPEND matched record to region table
8) Count if there's duplicate

I haven't try running the .exe to see if there's still duplicate.

Before deleting rg NCR I already have a backup of the region table.

Olaf Doschke · Jul 8, 2020

I can't see whether that's okay or not. Too little known about the overall process. There are so many possible detail problems like different netpay data for the same date, netpay not strictly rising, when there's a correction or payback, etc.

Am I correct in the assumption you do a repeated import into Oracle? Then the usual strategy is first update and only insert where an update did nothing. In MySQL this would involve knowing the effected rowcount of an update to react, or use MERGE, as said, after adding all dbf data into a staging database/table.

If you start on existing Oracle data you could pull all latest netpay Oracle knows and purge all data for each single region/employee, whatever combination/grouping to be outdated for this update process. And that raises the question (once more), why all data is used for the ETL process when you know last sync date and even if not, only want latest data on Oracle, then you can start only fetching latest data from the DBFs and disregard anything else.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Remove/Delete an old record that duplicates the latest

TheLazyPig

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

TheLazyPig

Programmer

Olaf Doschke

Programmer

TheLazyPig

Programmer

Olaf Doschke

Programmer

AndrewMozley

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

TheLazyPig

Programmer

Olaf Doschke

Programmer

Similar threads

Part and Inventory Search

Sponsor