Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Unmerged single records into multiple records

Status
Not open for further replies.

peac3

Technical User
Jan 17, 2009
226
0
0
AU
Hello C# gurus,

I am trying to split some of the data from one row into multiple rows with conditions.
in the data set, the original value of ActionType could be Changed, Added, or deleted.

The conditions of change are:
If ActionType = Changed, we must split it into 2 rows: ActionType = Deleted and then ActionType = Added. The Deleted row will have IdentityCode = FromIdentityCode and the Added row will have IdentityCode = ToIdentityCode
If ActionType = Added, we don't need to split the row, replace IdentityCode = ToIdentityCode.
If ActionType = Deleted, we don't need to split the row, replace IdentityCode = FromIdentityCode.
Here are the examples:

from the original dataset is as below:
Code:
RowNumber ParentID ChildID   ActionType      Date           FromIdentityCode ToIdentityCode
1         ABC        123     Change            2024-05-03              T         P
2         BCD        234     Deleted           2024-05-05              T         NULL
3         CDE        345     Change            2024-05-07              P         T
4         DEF        456     Added             2024-05-08              NULL      P

The expected result as per below:
Code:
RowNumber ParentID ChildID   IdentityCode ActionType      Date
1         ABC        123     T            Deleted    2024-05-01 
2         ABC        123     P            Added      2024-05-03 
3         BCD        234     T            Deleted    2024-05-05 
4         CDE        345     P            Deleted    2024-05-06
5         CDE        345     T            Added      2024-05-07 
6         DEF        456     P            Added      2024-05-08
Any idea on how to achieve this?

Thank you so much in advance,
 
First I have tried to do it quickly - with text files in python using pandas:

Input text file:
peac3_data.txt
Code:
RowNumber ParentID ChildID   ActionType      Date           FromIdentityCode ToIdentityCode
1         ABC        123     Change            2024-05-03              T         P
2         BCD        234     Deleted           2024-05-05              T         NULL
3         CDE        345     Change            2024-05-07              P         T
4         DEF        456     Added             2024-05-08              NULL      P

Code in python using pandas (tried it with Python 3.11.9, it does not run with Python version < 3.10 because of the Match-Case statement):

peac3.py
Code:
import pandas as pd

# read text file into input data frame
df_inp = pd.read_csv("peac3_data.txt", sep="\s+")
print("Input Data Frame:")
print(df_inp)

# create empty output data frame
df_out = pd.DataFrame(columns = [
    "RowNumber","ParentID","ChildID","IdentityCode","ActionType","Date"])
#print(df_out)

idx_out = 0
# using itertuples
for row in df_inp.itertuples():
    match row.ActionType:
        case "Added":
            df_out.loc[idx_out] = [
                idx_out + 1, row.ParentID, row.ChildID,
                row.ToIdentityCode, row.ActionType, row.Date]
            idx_out += 1
        case "Deleted":
            df_out.loc[idx_out] = [
                idx_out + 1, row.ParentID, row.ChildID,
                row.FromIdentityCode, row.ActionType, row.Date]
            idx_out += 1
        case "Change": # append 2 rows:
            df_out.loc[idx_out] = [
                idx_out + 1, row.ParentID, row.ChildID,
                row.FromIdentityCode, "Deleted", row.Date]
            idx_out += 1
            df_out.loc[idx_out] = [
                idx_out + 1, row.ParentID, row.ChildID,
                row.ToIdentityCode, "Added", row.Date]
            idx_out += 1
# end loop

print("Output Data Frame:")
print(df_out)

# write data frame into csv file
df_out.to_csv("peac3_data_out.csv", index=None, sep=";")

# write data frame into text file as formatted string
df_out.to_string("peac3_data_out.txt", index=None)

Running the code / Output:
Code:
$ python peac3.py
Input Data Frame:
   RowNumber ParentID  ChildID ActionType        Date FromIdentityCode ToIdentityCode
0          1      ABC      123     Change  2024-05-03                T              P
1          2      BCD      234    Deleted  2024-05-05                T            NaN
2          3      CDE      345     Change  2024-05-07                P              T
3          4      DEF      456      Added  2024-05-08              NaN              P
Output Data Frame:
   RowNumber ParentID  ChildID IdentityCode ActionType        Date
0          1      ABC      123            T    Deleted  2024-05-03
1          2      ABC      123            P      Added  2024-05-03
2          3      BCD      234            T    Deleted  2024-05-05
3          4      CDE      345            P    Deleted  2024-05-07
4          5      CDE      345            T      Added  2024-05-07
5          6      DEF      456            P      Added  2024-05-08
The script also produces 2 text files:
peac3_data_out.csv
peac3_data_out.txt

Similar could be done in C#
 
Hi peac3,

Here is the C# code:
Code:
var csv_input = "c:\\00_mikrom\\Work\\C#\\peac3_data.txt";
var reader = new StreamReader(csv_input);

var idx_out = 0;
var line_format = "{0,9}\t{1}\t{2}\t\t{3}\t{4}\t\t{5}";
var line_out = "";
var header = true;
while (!reader.EndOfStream)
{
    var line = reader.ReadLine();
    //Console.WriteLine(line);
    var values = line.Split(" ", StringSplitOptions.RemoveEmptyEntries);

    if (header)
    {
        line_out = "RowNumber  ParentID ChildID  IdentityCode       ActionType      Date";
        Console.WriteLine(line_out);
        header = false;
    }

    switch (values[3])
    {
        case "Added":
            line_out = string.Format(line_format, idx_out + 1, 
                values[1], values[2], values[6], values[3], values[4]);
            Console.WriteLine(line_out);
            idx_out++;
            break;
        case "Deleted":
            line_out = string.Format(line_format, idx_out + 1,
                values[1], values[2], values[5], values[3], values[4]);
            Console.WriteLine(line_out);
            idx_out++;
            break;
        case "Change": //append 2 rows:
            line_out = string.Format(line_format, idx_out + 1,
                values[1], values[2], values[5], "Deleted", values[4]);
            Console.WriteLine(line_out);
            idx_out++;
            line_out = string.Format(line_format, idx_out + 1, 
                values[1], values[2], values[6], "Added", values[4]);
            Console.WriteLine(line_out);
            idx_out++;
            break;
    }
}

Output:
Code:
RowNumber  ParentID ChildID  IdentityCode       ActionType      Date
        1       ABC     123             T       Deleted         2024-05-03
        2       ABC     123             P       Added           2024-05-03
        3       BCD     234             T       Deleted         2024-05-05
        4       CDE     345             P       Deleted         2024-05-07
        5       CDE     345             T       Added           2024-05-07
        6       DEF     456             P       Added           2024-05-08
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top