Reading through cursors for multiple blocks of data 1

stapleman527 · Dec 17, 2009

I have a project I am working on where I have a table full of client data that some clients have multiple records in, updates and such. I need to filter the data down to one record per client.

The way I need to do this is to take each group of records for a client and order them by their creation date and then starting with the first one populate all the blocks of data that I can. By blocks of data I mean for example the Name block consists of First_Name, Last_Name, Middle_Name, Name_Suffix, Name_Prefix. If the first row doesn't have this information then I want to take it from the next row, if it doesn't have it then the next, etc.If I get though all of the rows and none of them have it then I will just leave it blank.

I am new to sql server but have used Oracle in the past procedurally like this, although not very much. I figured I could just group by the ssn and count each client. If there is only one then I would just move the entire row to the filtered table, if there were multiple then I would have to read them into a cursor and go through each block of data and to find the first one that has it. Below is my pseudo code for what I was thinking. But my experience is in languages like java so I wanted to know what the best way to do this would be.

Pseudo Code:
Select SSNs into a cursor with a count
For Each Row{
if Count = 1
insert the row into filtered table
else
read all records for that ssn into another cursor
For Each Block{
For Each Row{
Read in all pieces of data for block
if data isnt empty
break
}
//variables will be left with data from the first record that has it.
}
Insert data from all blocks into filtered table
}

This seems very inefficient because you would have to read through the second cursor once for each block of data. But I don't know a better way to do it in SQL, or really if that is possible in SQL.

I am using SQL Server 2005 if that makes a difference.

Thanks,

Stapleman527

markros · Dec 17, 2009

I have an idea, but one question - do we want to take non null information from multiple rows, say, we have a record that has First_Name, but the Last_Name is in the next record. Do we want to combine the information from multiple records to fill all information or we want to find a record with most information filled?

Anyway, idea for now
SQL Server 2005 and up solution:

Code:

;with cte_Data as (select SSN, First_Name, Last_Name, Middle_Name, Name_Suffix, Name_Prefix, 
row_number() over (partition by SSN order by Case when First_Name IS NULL then 99 else 1 end, Created_Date) as RowNum from myTable)

select * from cte_Date where RowNum = 1

stapleman527 · Dec 17, 2009

No, once we find a record that has some of the information in the block we will take whatever that row has and not supplement it with any other data.

markros · Dec 17, 2009

If this is the case, then my idea should work for you as just fine.

markros · Dec 18, 2009

Are you now all set? Did this idea work for you?

stapleman527 · Dec 18, 2009

I am not exactly sure what this is doing...in the over clause the case is when First_Name is NULL what does the 99 and 1 do for you?

I ran it and I am getting data on most of them, but I want to be clear on how it works since I will have to be doing it with other blocks of data.

markros · Dec 18, 2009

The idea of using 99 and 1 is to sort (99 goes after 1). So, first I order by (if the First Name IS NULL (you can check Last Name instead if it makes more sense)) I assign them 99, so they would be sorted last, then if it's not NULL, I assign 1.

The second order by Date.

So, if finds first non null Name in the set ordered by the date.

Hopefully now it's clear for you and you can also mark the helpful response.

stapleman527 · Dec 18, 2009

This is very helpful. Just one more question. This seems to work for one block of data, but how would I do it to get the data from the other blocks? Lets say for example that I get the name information from row 1 in a set, but I need to get the address info from the 3rd row. I see how I can get each block individually but then combining them to form a complete client record.

markros · Dec 18, 2009

That may be slightly more tricky - looks like you may want to join with itself. Use the same idea for the second CTE but use row_numbering for Address instead of names and join two CTE1 (get data from each of it) based on SSN.

Hopefully it's clear enough for you to work it out - you can post back what you'll come to.

gmmastros · Dec 18, 2009

Just for kicks... try this:

Code:

Select ssn,
       Min(NullIf(First_Name, ''))  As First_Name,
       Min(NullIf(Last_Name, ''))   As Last_Name,
       Min(NullIf(Middle_Name, '')) As Middle_Name,
       Min(NullIf(Name_Suffix, '')) As Name_Suffix,
       Min(NullIf(Name_Prefix, '')) As Name_Prefix
From   YourTableName
Group By ssn

There may be some subtle problems that cause this query to "not work" for your situation. Let me explain.

If a SSN appears only once in your dataset, you will get that one row and whatever data appears in it. If there are multiple rows, things could get a little weird. But it may not be a problem for you either.

For example, suppose you have this data.

[tt]
SSN First_Name Last_Name
1 Fred
1 Flintston
1 Fredrick Flintsone
1 Flintstone
2 Betty MaidenName
2 Betty Rubble
[/tt]

The query above will return:

[tt]
1 Fred Flintsone
2 Betty MaidenName
[/tt]

So, you see... if any data appears in any column for a ssn, you will get something. But, when the data is different among the rows, the above query will return the value that is alphabetically less than all other values. For the 'Betty' data, it returned MaidenName (instead of Rubble) because MaidenName is alphabetically less than Rubble.

Of course, it's possible that this problem does not occur in your data either. I suggest you give this query a try and let us know if it works for you.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

markros · Dec 18, 2009

Hi George,

I thought about this solution, but the requirement is to get the information based on the created date (as I understood). So, I would assume that first record having non null data chronologically is good (and that's what my query is doing).

stapleman527 · Dec 18, 2009

You are correct markros, the data should be selected chronologically. I will work on joining the different data sets and see what I come up with and post it back. Thanks for all of your help.

stapleman527 · Dec 22, 2009

Just wanted to give an update of what I found to work just in case anyone else stumbled upon this and wanted to use it. I started with what markros originally suggested and added to the order by statement and joined subsequent blocks.

In the order by statement I concatenated all the fields I was looking at together to check if they were all NULL not just the one. This way if any field had a value I would pull that one. I also added an account_number to differentiate when open_dates collided. Then to join them together I used tax_id, which isn't in any of the blocks but it represents a unique identifier for each client. Below is the code for three blocks of data, there are more but that would get really long. Two of the data blocks are only one field but to use the same logic of selecting the correct record.

Code:

;with client_name (Tax_ID, Client_Type, Entity_Name, First_Name, Last_Name, Middle_Name, Name_Suffix, Name_Prefix, RowNum)
	as (select Tax_ID, Client_Type, Entity_Name, First_Name, Last_Name, Middle_Name, Name_Suffix, Name_Prefix, 
		row_number() over (partition by Tax_ID order by Case when 
		Client_Type + Entity_Name + First_Name + Last_Name + Middle_Name + Name_Suffix + Name_Prefix IS NULL then 99 else 1 end, 
		Date_Open, Account_Number) as RowNum from z_client),
  client_birth_date (Tax_ID, Birth_Date, RowNum)
	as (select Tax_ID, Birth_Date, 
		row_number() over (partition by Tax_ID order by Case when Birth_Date IS NULL then 99 else 1 end, 
    Date_Open, Account_Number) as RowNum from z_client),
  client_email (Tax_ID, Email_Address, RowNum)
    as (select Tax_ID, Email_Address, 
		row_number() over (partition by Tax_ID order by Case when Email_Address IS NULL then 99 else 1 end, 
    Date_Open, Account_Number) as RowNum from z_client)

select Client_Type, Entity_Name, First_Name, Last_Name, Middle_Name, Name_Suffix, Name_Prefix, Birth_Date, Email_Address
from client_name cn 
inner join client_birth_date cbd on cn.tax_id = cbd.tax_id 
inner join client_email ce on cn.tax_id = ce.tax_id
where cbd.RowNum = 1 and cn.RowNum = 1 and ce.RowNum = 1;

If anyone sees any flaws in this method please let me know. Thanks to everyone for their help.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Reading through cursors for multiple blocks of data 1

stapleman527

Programmer

markros

Programmer

stapleman527

Programmer

markros

Programmer

markros

Programmer

stapleman527

Programmer

markros

Programmer

stapleman527

Programmer

markros

Programmer

gmmastros

Programmer

markros

Programmer

stapleman527

Programmer

stapleman527

Programmer

Similar threads

Part and Inventory Search

Sponsor