INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Curious question

Curious question

(OP)
Background first...

I had a table with fields such as

CODE

PERSON_ID    V
0001         A1
0001         A2
0001         A3
0002         A1
0002         B1
0003         A1
0003         A3
0003         B1

The desired output was

CODE

PERSON_ID    V
0001         (A1,A2,A3)
0002         (A1,B1)
0003         (A1,A3,B1)

I searched and found a number of solutions available for v 8.2 and higher using various XML related functions.  We have 8.1.5 and sure enough the functions didn't appear to work right.

Since the output was going to Microsoft Access I chose to just let the application concatenate the values.  Quick, efficient, and works well.

Now, the question...

On one of the websites I visited, a poster was adamant that data manipulation like this should be handled by the client application, not the db engine.

I've never heard anything like this before and wanted to see what others think - especially DBA's.  Should I be concerned with amount of work I put on the db engine versus the amount I put on the client application?  If yes, what's a good rule of thumb?

RE: Curious question

"what's a good rule of thumb?" Don't dim the datacenter lights. . .<g>

If something really is a one-time shot (unfortunately, many OTS live forever), i believe it is less important to ensure the most efficient execution. If a query is to be run over and over (especially if multi-user, online) it is important to conserve resources. A bit more developer time is worth the investment.

Something to keep in mind is that a query that requires an excessive amount of system resources hurts the environment, not just the individual query.

I don't know any specific numbers to use to measure. If you work with your dba(s), they may have some more specific guidance.

RE: Curious question

I've always been told as a rule of thumb that sql should never be used to edit data.

Philippe

RE: Curious question

Larry, in the past I have had similar issues and using a little VBA in Access was MUCH more efficient than trying to work out a solution in the DB. Anything not set-based should be frowned upon and in your case you want a programming solution..   

Ties Blom
 
 

RE: Curious question

(OP)
Thanks for the responses.  Good to hear how others approach this question of balance.

RE: Curious question

This is solution. How I promissed.

CODE

with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1
union all
select '0001', 'A2'
from sysibm.sysdummy1
union all
select '0001', 'A3'
from sysibm.sysdummy1
union all
select '0002', 'A1'
from sysibm.sysdummy1
union all
select '0002', 'B1'
from sysibm.sysdummy1
union all
select '0003', 'A1'
from sysibm.sysdummy1
union all
select '0003', 'A3'
from sysibm.sysdummy1
union all
select '0003', 'B1'
from sysibm.sysdummy1
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_in i1, table
(select count(*) + 1 seq from Person_in i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_in) mm
)
,
Person_Final (person_id, v, seq, Mseq) as
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'),
       f1.seq + 1, f1.Mseq
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')',
       f1.seq + 1, f1.Mseq
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id
  and f1.seq + 1   <= f1.Mseq
)
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
Result:
 
PERSON_ID V
0001    (A1, A2, A3)
0002    (A1, B1)
0003    (A1, A3, B1)


Lenny

RE: Curious question

Lenny, this is a solution for the exact set. It would require going over the code again and again for different sets. A generic solution would involve some kind of programming don't you think?

Ties Blom
 
 

RE: Curious question

Hi Ties Blom !

How I know this is most generic solution for this kind of task.

If you have duplicates for this two columns, or Nulls, it could make solution a bit different, but you have to think in same way.

Lenny

RE: Curious question




Huh?

Skip,

glassesJust traded in my old subtlety...
for a NUANCE!tongue

RE: Curious question

In case if we have DB2 V9, we can use function rowno over() and Order by in Person_Seq  step.

If we have duplicates and Nulls we have to add Person_dst and change Person_Seq step.
See bellow:


CODE

with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1
union all
select '0001', 'A2'
from sysibm.sysdummy1
union all
select '0001', 'A3'
from sysibm.sysdummy1
union all
select '0002', 'A1'
from sysibm.sysdummy1
union all
select '0002', 'B1'
from sysibm.sysdummy1
union all
select '0003', 'A1'
from sysibm.sysdummy1
union all
select '0003', 'A3'
from sysibm.sysdummy1
union all
select '0003', 'B1'
from sysibm.sysdummy1
)
,
Person_dst (person_id, v) as
(select distinct
person_id, ifnull(v, ' ')
from Person_in
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_dst i1, table
(select count(*) + 1 seq from Person_dst i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_in) mm
)

,
Person_Final (person_id, v, seq, Mseq) as
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'),
       f1.seq + 1, f1.Mseq
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')',
       f1.seq + 1, f1.Mseq
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id
  and f1.seq + 1   <= f1.Mseq
)
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result

Lenny

RE: Curious question

Correction:

CODE

with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1
union all
select '0001', 'A2'
from sysibm.sysdummy1
union all
select '0001', 'A3'
from sysibm.sysdummy1
union all
select '0002', 'A1'
from sysibm.sysdummy1
union all
select '0002', 'B1'
from sysibm.sysdummy1
union all
select '0003', 'A1'
from sysibm.sysdummy1
union all
select '0003', 'A3'
from sysibm.sysdummy1
union all
select '0003', 'B1'
from sysibm.sysdummy1
)
,
Person_dst (person_id, v) as
(select distinct
person_id, ifnull(v, ' ')
from Person_in
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_dst i1, table
(select count(*) + 1 seq from Person_dst i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_dst) mm
)
,
Person_Final (person_id, v, seq, Mseq) as
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'),
       f1.seq + 1, f1.Mseq
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')',
       f1.seq + 1, f1.Mseq
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id
  and f1.seq + 1   <= f1.Mseq
)
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result

Lenny

RE: Curious question

(OP)
Lenny,

Interesting option for solving the problem, but the question still stands.  Is it better to use the db engine's resources for data display manipulation like this, or is it better to let the client application manipulate the data display?

I do like that you provided the link and a workable solution in DB2.  Your suggestion deserves a star.

Thanks,
Larry

RE: Curious question

Larry,
This is a very difficult question to answer as it depends on each different scenario. I'm sure that there are cases, dependent on data and table structure, where it is more efficient to use the DB engine, and other times when it is better to use a 3rd party programming language.

I'm going to stick my neck out though and plump for programming languages. As good as Lenny's SQL is, in a business environment, it is a non starter as it is just too complicated to support. Making amendments to it or trying to find an error in it would be a nightmare.

Far better to have a proper language that easily sets out, with comments if necessary, what it is trying to do and efficiently goes about it task. Even if there is little performance difference between a complicated piece of SQL and a proper language, I would head down the language path because the unseen cost of maintaining large complicated swathes of SQL would eventually add up.

Marc

RE: Curious question

Quote:

I'm going to stick my neck out though and plump for programming languages. As good as Lenny's SQL is, in a business environment, it is a non starter as it is just too complicated to support. Making amendments to it or trying to find an error in it would be a nightmare.
Well said. IMHO, it is a technical hobbyist approach rather than a professional IT approach. Given that most work is maintenance, care must be taken that much later (when the author has moved on), the code is maintainable.

Quote:

I would head down the language path because the unseen cost of maintaining large complicated swathes of SQL would eventually add up.
And/or reach a point where the "next requirement" could not be met with "only" SQL. . . Then watch the "fun" when someone must explain to management that the "really neat trick" has to be completely re-done.

RE: Curious question

Hi Larry !

Any client application manipulate the data faster then DB engine.

But anyway you have to declare cursor inside of the application and fetch rows one by one.

Therefore you can't make program without DB engine.

But if you want to show the result immediately, you have no choice.

Anyway, you have to know your real table would be involve only in the first step, then release, same like in an application program.

Lenny

 

RE: Curious question

I'm impressed of the Lenny's solution, but it doesn't work for me. When I try it, I get this error:

CODE

SQL State: 42999
Vendor Code: -255
Message: [SQL0255] Function not supported for query. Cause . . . . . :   The reason code is 5: -- Code 1 -- Scalar subselects and lateral correlation from a nested table expression are not allowed with distributed files. -- Code 2 -- Error occurred while using a temporary distributed file. -- Code 3 -- EXCEPT or INTERSECT not supported for this query. -- Code 4 -- A sequence reference is not supported with distributed files. -- Code 5 -- A recursive common table expression is not supported for this query. -- Code 6 -- An OLAP function is not supported for this query. -- Code 7 -- ORDER OF is not supported for this query. -- Code 8 -- Scalar fullselect is not supported for this query. -- Code 9 -- A distributed file is being processed in a multi-thread capable job or this is not the initial thread. Recovery  . . . :   A list of corrective actions follow: -- If code 1, change the query so it does not use scalar subselects or correlation from a nested table expression. -- If code 2, see the previous messages for more information. -- If code 3, remove EXCEPT or INTERSECT from the query. -- If code 4, remove the sequence reference from the query. -- If code 5, remove the recursive common table expression from the query. -- If code 6, remove the OLAP function from the query. -- If code 7, remove the ORDER OF from the query. -- If code 8, respecify the query without a fullselect. -- If code 9, do not use multiple threads to run the query.
I'm on DB2 UDB on IBM iSeries (OS version: V5R4), where recursion in SQL should be supported too, so I don't understand this error. Maybe differs DB2 on iSeries from DB2 on other systems?

I don't know how to figure out an error in Lenny's big SQL-query, but I can solve the above task quickly using an embedded SQL in an programming languages.

First I created the table

CODE

CREATE TABLE IBPDDB.PERSONS (
  PERSON_ID DECIMAL(4, 0) DEFAULT NULL,
  V CHAR(30) CCSID 870 DEFAULT NULL)
then I filled it with the above data and then I have written this short REXX-program which does the task:
persons.rexx

CODE

/* Global Environment is EXECSQL*/
address 'EXECSQL'

/* Declare cursor for select statement */
EXECSQL,
  'DECLARE C1 CURSOR FOR',
  ' SELECT PERSON_ID, V FROM PERSONS'

/* Open Cursor */
EXECSQL,
  'OPEN C1'

/* Process records */
persons. = ''
do while (SQLCODE = 0)
  EXECSQL,
         'FETCH C1 INTO :pid, :val'

  /* If EOF, stop the loop and don't print out the fetched values */
  if SQLCODE = 100 then leave

  /* reformat pid from ' 0001.' to '0001' */
  pid = substr(strip(pid),1,length(strip(pid))-1)
  /* Add values to persons-stem */
  if persons.pid = '' then do  /* add first value */
    persons.pid = strip(val)
    persons.pids = persons.pids ||' '|| pid
  end
  else do /* append next value to the existing values */
    persons.pid = persons.pid || ', ' || strip(val)
  end
end

/* Print the resulting stem, i.e. the report */
say PERSON_ID||'    V'
do i=1 to words(persons.pids)
  pid = word(persons.pids, i)
  say pid  '       ('||persons.pid||')'
end

/* Close Cursor */
EXECSQL,
   'CLOSE C1'

/* End Pgm */
exit
After running the REXX member with

CODE

STRREXPRC SRCMBR(PERSONS)
I get the resulting report

CODE

PERSON_ID    V                  
0001        (A1, A2, A3)        
0002        (A1, B1)            
0003        (A1, A3, B1)        

I used in the example REXX, because it's historically oldest scripting language I know and it's natively on iSeries, but I could use compiled language like C, COBOL or RPG too.
Howevever, my favorites for such jobs are scripting languages like Perl, Python, Ruby,... These are more powerful than REXX, so coding would be simpler and would make more fun as in the example above.
IMHO, an source written in an programming language is better maintanable then long SQL-statements.
 

RE: Curious question

There are numerous limitations when using common table expressions in iSeries.

I remember reading a document explaining the limitations, but unable to find it now

Quote:


I'm on DB2 UDB on IBM iSeries (OS version: V5R4), where recursion in SQL should be supported too, so I don't understand this error. Maybe differs DB2 on iSeries from DB2 on other systems?

For db2 resoruces visit www.db2click.com
More DB2 questions answered at www.dbforums.com/f8 & http://www.idug.org/user/UserLogin.asp

RE: Curious question

(OP)
Thanks everyone for posting.  I was hoping to see a discussion.  The example I posted was just to get the discussion going, I wasn't looking for a solution to a specific problem.  Nevertheless, I'm glad to see Lenny's proposed solution got the discussion going and I find it useful, hope others do too.

Mikrom's post was an excellent example of Marc and papadba's posts.  Trying to understand the sql error statement is hard enough, trying to understand how the sql & data caused the error is a real drain on developer resources.  However, the REXX routine looks pretty straight forward, easy to understand, and would be much easier to troubleshoot errors.  That's much like the approach I took, just different programming language/platform.

I still think Lenny's sql has merit and glad he posted.  I think there's benefit in learning new methods, even if they're not used.  Never know when something might be the right tool for the job.

Thanks again for the lively discussion!

Larry

RE: Curious question

Any problem has dozens solutions.

Programmer has to find the best way to solve problem.

One more solution.

If you know depth for person (in our case could be 3) you can use very simple query:


CODE

Select p1.person_id,
'(' || p1.v   
|| ifnull(', ' || p2.v, '')
|| ifnull(', ' || p3.v, '') || ')' as V
from
persons p1 left join persons p2
On p1.person_id = p2.person_id
   and p2.v > p1.v
left join persons p3
On p2.person_id = p3.person_id
   and p3.v > p2.v
order by p1.person_id

Lenny Khiger, ADSPA&VP

RE: Curious question

Number of LEFT JOIN equal Maximum(depth) - 1

Lenny

RE: Curious question

CODE

Select p1.person_id,
Min('(' || p1.v   
    || ifnull(', ' || p2.v, '')
    || ifnull(', ' || p3.v, '') || ')') as V
from persons p1 left join persons p2
On p1.person_id = p2.person_id    
and p2.v > p1.v
left join persons p3
On p2.person_id = p3.person_id    
and p3.v > p2.v
Group by p1.person_id
Order by p1.person_id

Lenny

RE: Curious question

This query tested by me today.
Same result, online, cursor dosn't need.
You can use it, if you want.


CODE

with Persons (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1
union all
select '0001', 'A3'
from sysibm.sysdummy1
union all
select '0002', 'A1'
from sysibm.sysdummy1
union all
select '0003', 'A1'
from sysibm.sysdummy1
union all
select '0003', 'A3'
from sysibm.sysdummy1
union all
select '0003', 'B1'
from sysibm.sysdummy1
union all
select '0021', 'C5'
from sysibm.sysdummy1
union all
select '0021', 'C7'
from sysibm.sysdummy1
union all
select '0001', 'A2'
from sysibm.sysdummy1
union all
select '0017', nullif('', '')
from sysibm.sysdummy1
union all
select '0002', 'B1'
from sysibm.sysdummy1
)
Select p1.person_id,
Nullif(Min('(' || ifnull(p1.v, '')         || ifnull(', ' || p2.v, '')
               || ifnull(', ' || p3.v, '') || ifnull(', ' || p4.v, '') || ')'), '()') as V
from persons p1 left join persons p2
On p2.person_id = p1.person_id    
and p2.v > p1.v
left join persons p3
On  p3.person_id = p2.person_id    
and p3.v > p2.v
left join persons p4
On  p4.person_id = p3.person_id    
and p4.v > p3.v
Group by p1.person_id
Order by p1.person_id

Result:
 

Quote:

PERSON_ID        V
0001              (A1, A2, A3)
0002              (A1, B1)
0003              (A1, A3, B1)
0017
0021              (C5, C7)

Lenny

RE: Curious question

Hi Lenny,
The both queries you posted last work on my iSeries too.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close