Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Episode 3

Status
Not open for further replies.

drymnfr

Programmer
Joined
May 21, 2010
Messages
7
Location
US
I have the following table tblPurchases and structure:
Code:
tblPurchasesID	CustomerID	DatePurchased
1				177405    	4/1/2005
2				177405    	7/23/2005
3				177405    	7/30/2005
4				177405    	1/24/2006
5				177405    	6/30/2006
6				177405    	7/24/2006
7				177405    	7/29/2006
8				177405    	2/20/2007
9				177405    	3/30/2007
10				177405    	4/13/2007
11				177405    	7/10/2007
12				177405    	10/2/2007
13				177405    	11/14/2007
14				177405    	12/12/2007
15				220391    	1/19/2005
16				220391    	2/2/2005
17				220391    	2/25/2005
18				220391    	3/18/2005
19				220391    	7/5/2005
20				220391    	6/7/2006
21				220391    	7/11/2006
22				220391    	8/11/2006
23				220391    	9/25/2006
26				220391    	4/2/2007
27				220391    	5/23/2007
28				220391    	8/10/2007
29				220391    	8/24/2007
I would like to be able to do the following:

create Field - Episode
A gap of 180 or more days in between consecutive purchase dates is considered the start of a new episode.

Whereby my output file would look like this:
Code:
tblPurchasesID	CustomerID	DatePurchased	Interval	Episode
1				177405    	4/1/2005		0			1
2				177405    	7/23/2005		113			1
3				177405    	7/30/2005		7			1
4				177405    	1/24/2006		178			1
5				177405    	6/30/2006		157			1
6				177405    	7/24/2006		24			1
7				177405    	7/29/2006		5			1
8				177405    	2/20/2007		206			2
9				177405    	3/30/2007		38			2
10				177405    	4/13/2007		14			2
11				177405    	7/10/2007		88			2
12				177405    	10/2/2007		84			2
13				177405    	11/14/2007		43			2
14				177405    	12/12/2007		28			2
15				220391    	1/19/2005		0			1
16				220391    	2/2/2005		14			1
17				220391    	2/25/2005		23			1
18				220391    	3/18/2005		21			1
19				220391    	7/5/2005		109			1
20				220391    	6/7/2006		337			2
21				220391    	7/11/2006		34			2
22				220391    	8/11/2006		31			2
23				220391    	9/25/2006		45			2
26				220391    	4/2/2007		189			3
27				220391    	5/23/2007		51			3
28				220391    	8/10/2007		79			3
29				220391    	8/24/2007		14			3

Thanks in advance for any help on this.
 
use a self join, with id = id + 1, then you can just use datediff to work out the interval and case for the episode...

what I mean by self join is:

select *
from tbl a
inner join tbl b on a.id = b.id + 1



--------------------
Procrastinate Now!
 
Something tells me we are helping you with homework...

Simi
 
Thank you to both of you.

Homework?? If you consider working from home using remote login, as homework , then yes. Thanks again, with your help in this and the other thread.
 
As suggested, I did the following:

select a.tblPurchasesID, a.CustomerID, a.DatePurchased as StartDate, b.DatePurchased as EndDate, Datediff(dd,b.DatePurchased,a.DatePurchased) as Interval,
case when Datediff(dd,b.DatePurchased,a.DatePurchased) > 179 THEN
2
ELSE 1
END AS Episode
from tblPurchases as a
left outer join tblPurchases as b on a.tblPurchasesID = b.tblPurchasesID + 1 and a.CustomerID=b.CustomerID

I get the following output, wherby I am getting close to what I really want(as my OP shows above).
tblPurchasesID Customerid StartDate EndDate Interval Episode
1 177405 2005-04-01 NULL NULL 1
2 177405 2005-07-23 2005-04-01 113 1
3 177405 2005-07-30 2005-07-23 7 1
4 177405 2006-01-24 2005-07-30 178 1
5 177405 2006-06-30 2006-01-24 157 1
6 177405 2006-07-24 2006-06-30 24 1
7 177405 2006-07-29 2006-07-24 5 1
8 177405 2007-02-20 2006-07-29 206 2
9 177405 2007-03-30 2007-02-20 38 1
10 177405 2007-04-13 2007-03-30 14 1
11 177405 2007-07-10 2007-04-13 88 1
12 177405 2007-10-02 2007-07-10 84 1
13 177405 2007-11-14 2007-10-02 43 1
14 177405 2007-12-12 2007-11-14 28 1
15 220391 2005-01-19 NULL NULL 1
16 220391 2005-02-02 2005-01-19 14 1
17 220391 2005-02-25 2005-02-02 23 1
18 220391 2005-03-18 2005-02-25 21 1
19 220391 2005-07-05 2005-03-18 109 1
20 220391 2006-06-07 2005-07-05 337 2
21 220391 2006-07-11 2006-06-07 34 1
22 220391 2006-08-11 2006-07-11 31 1
23 220391 2006-09-25 2006-08-11 45 1
24 220391 2007-04-02 2006-09-25 189 2
25 220391 2007-05-23 2007-04-02 51 1
26 220391 2007-08-10 2007-05-23 79 1
27 220391 2007-08-24 2007-08-10 14 1

PurchaseID 8 thru 14 for customer 177405 should all be episode 2. Similarly PurchaseID 20 thru 23 for customer 220391 should all be episode 2 and PurchaseID 24 thru 27 should all be episode 3

Can someone help me finish this? Thanks in advance.
 
No, according to your case statement the output is correct...
Code:
case 
    WHEN DATEDIFF(dd,b.DatePurchased,a.DatePurchased) > 179 THEN 2
    ELSE 1

'PurchaseID 8 thru 14 for customer 177405 should all be episode 2'
Only for PurchaseID 8 does the DATEDIFF evaluates as > 179

'PurchaseID 20 thru 23 for customer 220391 should all be episode 2'
Only for PurchaseID 20 does the DATEDIFF evaluates as > 179

'PurchaseID 24 thru 27 should all be episode 3'
Your case statement doesn't evaluate anything to output 3

Rhys

"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it"
Terry Pratchett
 
have you considered using more than 1 when step in your case statement?

case
when 1 then 'a'
when 2 then 'b'
when 3 then 'c'
else 'x'
end

--------------------
Procrastinate Now!
 
Is it not the case, that with an increasing episode count (or ordinal rank), one is going to require a cursor, triangular join or quirky update (Jeff Moden's method) to set the episode value correctly?

soi là, soi carré
 
or a very long case statement...

if episodes get over 5, maybe build a temp table or something...

--------------------
Procrastinate Now!
 
Thanks to everyone for your input.

drlex, you used the right words "increasing episode count". I currently do have a cursor doing this for me. I thought the experts here may suggest a more efficient and faster way of doing this instead of using cursors. I am just a beginner.

Thanks again.
 
drymnfr,
You're welcome; thank you for accolade. If speed is of concern, I urge you to look at Jeff's article on running counts and the ways one can perform this. It has helped me, although I don't fully comprehend his method. [happy]

Whilst I don't have a direct link, it is on under articles and is entitled
"Solving the "Running Total" & "Ordinal Rank" Problems (Rewritten)". Author is Jeff Moden and it;s dated 2009/11/10 (I can't tell whether that's US or UK date format!)



soi là, soi carré
 
drlex, thank you for your input. I spent some time reading up Jeff Moden's article and a few other examples of "running totals".

I had over 50 million records to process and using cursors was going to take a few days for me to get the results I was expecting.

The following is part of what I ended up doing:

DECLARE @Episode int
SET @Episode=1
UPDATE TableName
SET @Episode = Episode =
CASE
WHEN (Interval > 179)
THEN @Episode + 1
WHEN (Interval = 0)
THEN 1
ELSE @Episode
END

Thanks again, for your help.
 
drymnfr

Great; can you now explain it to me?
(just kidding!)

Hate to have to check 50 million rows - are you just going to sample a few and check them against expectation?

Happy Friday*!


(*time-zone dependent)

soi là, soi carré
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top