Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Wanet Telecoms Ltd on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Using collections instead of individual cursors 1

Status
Not open for further replies.

Welshbird

IS-IT--Management
Jul 14, 2000
7,378
DE
Hi guys - its me again!

I'm trying to write a procedure that goes and calculates a 'score' based on records in two different tables.

At the moment, whilst it does do what I need, it seems to be lots of code, and I've only performed the calculations on two fields at the moment rather than the 13 from each table that I need to use, but I was just testing the process.

I understand that I might be able to use collections to do this, but I just can't get my head around the syntax to do it at the moment.

This is the code I have written so far, it as I say it does appear to work:

Code:
CREATE OR REPLACE procedure test_more_calc authid current_user
as
  CURSOR C_TVF IS
   SELECT destcode FROM test_tvf_fuzz;
  CURSOR C_SURG is
   select surgcount from test_tvf_fuzz;
  CURSOR C_MILES1 is
   select miles1 from test_tvf_fuzz;
  Cursor C_RSURG is
   select surgcount from test_rx_fuzz;
  CURSOR C_RMILES1 is
   select miles1 from test_rx_fuzz;
  v_tvf number;
  v_surg float;
  v_rsurg float;
  v_miles1 float;
  v_rmiles1 float;
  v_score float;
  v_score1 float;
BEGIN
   OPEN C_TVF;
   OPEN C_SURG;
   OPEN C_RSURG;
   OPEN C_MILES1;
   OPEN C_RMILES1;
-- temp loop is a function which counts how many
-- times I think I need to loop
   for loop_index in 1..temp_loop
loop
   FETCH C_TVF into v_tvf;
   fetch C_SURG into v_surg;
   Fetch C_RSURG into v_rsurg;
   Fetch C_MILES1 into v_miles1;
   Fetch C_RMILES1 into v_rmiles1;
-- Max_Surg_Count is a function to calculate that figure
-- and so is max_miles
   v_score:=  1-(abs(v_surg - v_rsurg)/max_surg_count);
   v_score1:= 1-(abs(v_miles1 - v_rmiles1)/max_miles);
   DBMS_OUTPUT.put_line ('Record is now ' || v_tvf || ' and score is '|| v_score || ' and ' || v_score1);
END LOOP;
   CLOSE C_TVF;
   CLOSE C_SURG;
   CLOSE C_RSURG;
   CLOSE C_MILES1;
   CLOSE C_RMILES1;
END;
/

I'm just thinking that it will become a little difficult to follow (and probably not as efficient!) to end up with 26 cursors so I'd really appreciate any help.



Fee.

"The question should be, is it worth trying to do, not can it be done"


 
willif,

Can you post the script to create both the tables in question, and some representative data (say a dozen rows from each table).

Then please state what your requirement is. That way we can all see what you're trying to achieve and maybe come up with an innovative solution. If you ask for comments on cursors, in isolation, and with no information about their underlying tables, it's kind of hard to help.

Once you've done this, I've got to put my money where my mouth is, and deliver the goods.

I look forward to an exemplary posting. Please see Mufasa's comments on my thread about oracle having an autonum field for further explicit guidance.

Regards

Tharg

Grinding away at things Oracular
 
Willif, your procedure selects a field from an arbitrary record from test_tvf_fuzz 5 times and then calculates some formula based on that fields (in general belonging to different rows unless that table contains only 1). Is that your business task? Quite strange...
Most probably you need something like
Code:
select 'Record is now ' || C_TVF 
  || ' and score is '
  || (1-(abs(C_SURG - C_RSURG)/max_surg_count)) 
  || ' and ' || (1-(abs(C_MILES1 - C_RMILES1)/max_miles))
  from test_tvf_fuzz

Thargtheslayer, could you explain how that student code relates to the lack of autonumbering in Oracle?

Regards, Dima
 
OK, I'm misunderstanding more about Oracle than I thought then!

The two tables have an identical structure, which at the moment has an identifier, and then 13 fields which are calculated characterstics. One table has about 13,000 rows, and the other has about 1,000. The over all purpose is to find the record in the smaller table which 'best fits' each record in the larger table, so my aim was to take one record in the big table and loop through the smaller table, output the total score each time, and then output to a table. I'll just take the highest score then for each record. (I should add in a thing that says 'this is the masimum score possible - stop looking, but thats for later).

Ok, table structure is as follows for both tables:

Code:
CREATE TABLE Big_table (
  desteta   CHAR(1)      NULL,
  destcode  NUMBER(10,0) NOT NULL,
  itemcount NUMBER       NULL,
  code1     NUMBER       NULL,
  count1    NUMBER       NULL,
  type1     CHAR(3)      NULL,
  miles1    NUMBER       NULL,
  code2     NUMBER       NULL,
  count2    NUMBER       NULL,
  miles2    NUMBER       NULL,
  type2     CHAR(3)      NULL,
  code3     NUMBER       NULL,
  count3    NUMBER       NULL,
  miles3    NUMBER       NULL,
  type3     CHAR(3)      NULL,
  code4     NUMBER       NULL,
  count4    NUMBER       NULL,
  miles4    NUMBER       NULL,
  type4     CHAR(3)      NULL
)
  PCTUSED    0
/

and that goes for both tables.

the fields used for the calculations are the itemcount, and then the 'miles, type and count' which are numbered.
Type is a text field, so no calculation - if it matched its a 1, if not then a 0.

Test Data looks like this:
Code:
DESTETA,DESTCODE,ITEMCOUNT,CODE1,COUNT1,TYPE1,MILES1,CODE2,COUNT2,MILES2,TYPE2,CODE3,COUNT3,MILES3,TYPE3,CODE4,COUNT4,MILES4,TYPE4
F,5700,29,5653,5,SUR,0.236375148,5645,2,0.424693632,SUR,5648,8,0.449118689,SUR,5647,4,0.462613324,SUR
F,1822,96,13591,11,SUR,0.253736831,13594,8,0.510542829,SUR,13593,9,0.553311348,SUR,106842,5,0.593347546,SUR
F,2243,59,2241,3,SUR,0.05676633,104085,9,1.050327098,SUR,104086,1,1.050327098,SUR,96861,4,1.312430574,DSP
F,3212,25,3214,8,SUR,0,88882,11,1.520781454,SUR,88881,9,1.608265915,SUR,21592,5,3.328947929,SUR
F,3652,12,611281,7,SUR,0.107176037,611293,6,0.34344388,SUR,611283,6,0.54891658,SUR,5847863,1,3.23832405,SUR

I really appreciate your help with this guys!


Fee.

"The question should be, is it worth trying to do, not can it be done"


 
I think you're misunderstanding about SQL, not Oracle. Why do you need pl/sql instead of just joining that tables in single query???

Regards, Dima
 
Because I don't have any fields I can join them on. I have to work out the 'best fit' based on a set of calcuations on thirteen fields. In some cases that may be an exact match, but usually it won't.

Any other suggestions of finding the 'closest' match based on weighted scores would be greatly appreciated though!

(The only other way I have found so far is to extract all of the data and use a third party matching software like DatSetV, but this means it cannot be automated, and that's not an option for this project)

Fee.

"The question should be, is it worth trying to do, not can it be done"


 
Are you joking? No fields to join? Why do you need them? And to read 3 fields from a table you need 3 separate select statements, that, I suppose you don't know, may retun field values from different rows?

Again, this question is NOT ORACLE RELATED at all.

Code:
select 'Record is now ' || tvf.destcode
  || ' and score is '
  || (1-(abs(tvf.surgcount - rx.surgcount)/max_surg_count))
  || ' and ' || (1-(abs(tvf.miles1 - rx.miles1)/max_miles))
  from test_tvf_fuzz tvf, test_rx_fuzz rx

Regards, Dima
 
I'm misunderstanding here then again, or possibly not explaining properly.

I have two tables. There are NO fields which are able to be joined between them. In fact, the whole purpose ofthe exersise is to find the MOST SIMILAR record by using that calculation. (maybe thread helps explain).

My aim was to build a loop which would take a record from one table, and using the calculation work out a score against each other record in the other table, and output this; then I can use this to select the best possible fit, as this would be the one with the largest score.

I didn't think I was really misunderstanding data - I still don'! I've been working with it for too long for that - I just am trying to perform a 'best fit' rather than a join.

Hope thats clearer.

All I really wanted to know here, was the following:

Can I build a record set of 13 items which are all different data types, and then in a PL/SQL loop can I perform a calculation of each entitiy within this and output the score?



Fee.

"The question should be, is it worth trying to do, not can it be done"


 
willif,

I have made the mistake of not answering your original question [blush]. You first posted about accessing loads of data without having to use 13 cursors, and wanted a collection based solution.

I believe that there is no join column available between the two tables, because of the fuzzy logic nature of the task and that this has to be based on calculation.

I have therefore created the stub posted below, which gets the data into collections, readby for hacking and slaying. Note that this requires 9i2 or above to work, but since this is a 10g forum, it should be ok.

Code:
CREATE TABLE fuzzy_match_big (
  desteta   CHAR(1)      NULL,
  destcode  NUMBER(10,0) NOT NULL,
  itemcount NUMBER       NULL,
  code1     NUMBER       NULL,
  count1    NUMBER       NULL,
  type1     CHAR(3)      NULL,
  miles1    NUMBER       NULL,
  code2     NUMBER       NULL,
  count2    NUMBER       NULL,
  miles2    NUMBER       NULL,
  type2     CHAR(3)      NULL,
  code3     NUMBER       NULL,
  count3    NUMBER       NULL,
  miles3    NUMBER       NULL,
  type3     CHAR(3)      NULL,
  code4     NUMBER       NULL,
  count4    NUMBER       NULL,
  miles4    NUMBER       NULL,
  type4     CHAR(3)      NULL
);

CREATE TABLE fuzzy_match_small
 (
  desteta   CHAR(1)      NULL,
  destcode  NUMBER(10,0) NOT NULL,
  itemcount NUMBER       NULL,
  code1     NUMBER       NULL,
  count1    NUMBER       NULL,
  type1     CHAR(3)      NULL,
  miles1    NUMBER       NULL,
  code2     NUMBER       NULL,
  count2    NUMBER       NULL,
  miles2    NUMBER       NULL,
  type2     CHAR(3)      NULL,
  code3     NUMBER       NULL,
  count3    NUMBER       NULL,
  miles3    NUMBER       NULL,
  type3     CHAR(3)      NULL,
  code4     NUMBER       NULL,
  count4    NUMBER       NULL,
  miles4    NUMBER       NULL,
  type4     CHAR(3)      NULL
);

CREATE OR REPLACE PROCEDURE match_by_fuzzy_logic

IS

TYPE user_tab_big IS TABLE OF fuzzy_match_big%ROWTYPE INDEX BY BINARY_INTEGER;
--The little table is of exactly the same type as the big one, but if it's
--guaranteed never to change, it will, therefore:-
TYPE user_tab_small IS TABLE OF fuzzy_match_small%ROWTYPE INDEX BY BINARY_INTEGER;

--Declare a collection of the types defined above
big_table_coll user_tab_big;
small_table_coll user_tab_small;


BEGIN

/*
N.B. if the big table gets really big, this will totally trash the UGA
so consult with the DBA before getting carried away.  If needed, use the
limit clause of the bulk collect.  Also, consider the use of
DBMS_SESSION.FREE_UNUSED_SERVER_MEMORY
*/
SELECT *
  BULK COLLECT INTO big_table_coll
  FROM fuzzy_match_big;
  
SELECT *
  BULK COLLECT INTO small_table_coll
  FROM fuzzy_match_small;
  

FOR big_counter IN big_table_coll.FIRST .. big_table_coll.LAST
LOOP
	FOR small_counter IN small_table_coll.FIRST .. small_table_coll.LAST	
		LOOP
			--Within this loop, you can reference and cross-calculate to
			--your hearts content.  Use the dot notation to reference fields
			IF big_table_coll(big_counter).desteta = small_table_coll(small_counter).desteta
			THEN
				DBMS_OUTPUT.PUT_LINE('The dest eta matches, hooray!');
			END IF;
			--Do your fuzzy matching calculations here
		END LOOP;
END LOOP;

EXCEPTION

WHEN OTHERS THEN
  NULL;
  --Invoke your exception handler here.  Dump the null statement.

END match_by_fuzzy_logic;

Please note my caveat about memory usage. Is this what you had in mind?

Regards

Tharg

Grinding away at things Oracular
 
Thank you so much for that!

I'll go away and paly with that and let you kwno how I get on.

I really do appreciate your help with this, (and sorry for not explaining my problem with more clarity to start with!



Fee.

"The question should be, is it worth trying to do, not can it be done"


 
Quick question with regards to memory - the 'big' table is likely to be between 12,500 and 13,000 records, and at the moment the smaller table has about 1,500 records, but this will grow to possibly 5,000 at some point in the future.

I'm not sure - but I would suspect that does not constitute 'very big', so shouldn't have a huge memory issue?



Fee.

"The question should be, is it worth trying to do, not can it be done"


 
willif,

with that number of records, and from the size of each record (which you posted, so I have clear info - thanks) I don't see it being a problem. I had in mind possibly hundreds of thousands or even millions of records.

I've only guesstimated, but even if each record consumed a kilobyte, that's still only 13 and a bit Megabytes worth of storage. Last time I checked, my Oracle server has a couple of gigabytes of RAM, hence my assertion that it shouldn't be a problem.

If your server hasn't got enough memory to cope with that then send the S.A. down to PC world to get some RAM chips. A corporate sized server should eat that for breakfast.

Regards

Tharg

Grinding away at things Oracular
 
Um,

This happens:

Code:
Warning: Procedure created with compilation errors.

SQL> show errors
Errors for PROCEDURE MATCH:

29/13    PL/SQL: Statement ignored
29/86    PLS-00302: component 'DESTETA' must be declared
SQL>

I thought that using %rowtype used the same column definitions as the table, so am now confused....

Fee.

"The question should be, is it worth trying to do, not can it be done"


 
SORRY - Ignore last post.

Was being stoopid!

(Still not out of the woods, but I can at least see them for the trees thanks to your help Tharg! - Have a lovely purple star(

Fee.

"The question should be, is it worth trying to do, not can it be done"


 
ok, back to basics ...

so you have two tables test_tvf_fuzz and test_rx_fuzz

from these you want to play with 5 fields ...

test_tvf_fuzz.destcode
test_tvf_fuzz.surgcount
test_tvf_fuzz.miles1
test_rx_fuzz.surgcount
test_rx_fuzz.miles1

you want to compute 2 variables :

v_score:= 1-(abs(v_surg - v_rsurg)/max_surg_count);
v_score1:= 1-(abs(v_miles1 - v_rmiles1)/max_miles);

question 1 :
what is the business critera for selecting the initial data from test_tvf_fuzz ?

question 2 :
am i correct in assuming that the data from the test_tvf_fuzz table MAY come from 3 different records

question 3 :
once we have the test_tvf_fuzz data, then we need to comapre it against data from the second table test_rx_fuzz. what is the business rule ?

thinking ...
this sounds like a job for an outer and inner cursor ...
possibly the outer loop will select the test_tvf_fuzz data
but how do you control the outer loop iterations ... business rule ?

and what are the business rules for matching data in the second cursor ...

one of the problems with these boards is that we come looking for technical answers BUT we forget to supply the plain english business requirements (me too on occasion !)

if WE cannot put our business requirements into plain english then it should ring some little alarm bells ...

i also fully understand that you may be looking for quick answers under pressure (been ther, done that, ugh ...) BUT sometimes we have to stand back from the tech side of things and try and see the high-level requirements

believe me, it does help - not only the readers but ourselves ... if we cant explain it in plain english, doesnt thet sometimes tell us something ?

sometimes, we can come up with simple solutions by taking a step back from the tech side and reappraising the business requirements

so willif, take the challenge and explain your business requirements in short logical steps and lets see if we can come up with some useful suggestions for you ...

funny enough, i still use the basic mantra of analysis that i learned at college = 3 stages of refinement :

1. business requirements (users understand)
2. technical requirements (users can be helped to understand)
3. technical specification (technical implementation)

please forgive me if i am teaching you how to suck eggs but we genuinely want to help as others have helped me

good luck ...



regards, david - no such thing as problems - only solutions.
 
I've solved it now to be honest, but I do have business rules that I can follow.

I have two tables: each list an entity, and 13 calculated qualities about that entity.

The purpose of the procedure is to find the entity from the one table which is most similar to the entiity in the other table. I have a calculation based on all of the 13 fields in order to give a score as to its similarity. I then need to know the best score, so I can say that this is the most similar entity. I'm basically trying to find a best fit from a panel so I can use the known behavoiur of the panel and extrapolate this to entities for which I do not know the behaviour.

So, my finished procedure (which has just finished running for the first time and passed QA hurrah!) uses two collections and three nested loops. I was advised not to use cursors as it was thought it would be slower (and when my tech department 'advise' its not generally an option to ignore them!)

It had only two loops initially, and then I realised I could use one of the 13 calcuated characteristics to 'filter' which records would be compared to each other, and therefore speed it up and also improve its accuracy.

So, I now have an outer loop, which selects filtered records into each collection, then a loop which takes the first record from the test_tvf_table, and then a loop which performs the calculation for this entity against all of the the filtered records from the test_rx_fuzz table. I output the entities and the maximum score which was reached, and fall out of the inside loop as soon as I hit the maximum score which could be gained.

If more than one record would give me the same maximum sscore I really don't mind which one gets used, so thats not an issue to deal with.

This is just the final step in some fairly horrible analysis, but I don't have someone else to give me business rules as this is a methodology I have invented myself.

But, after LOTS of help from people on this forum, the process all works now, and the first productionised data is chundering out of the end of the process, and thankfully someone has already subscribed to the finished product.

Its actually difficult for me to explain any more about it than that without upsetting the company confidentiality rules.

I appreciate your thoughts though. and yes, you are certainly right that if I can't explain what I need to acheive in plain english I have very little chance of achieving it. My issue has been that my compnay won't allow me to explain the whole process, so I've been trying to ask little questions to help me along the way.

I think the one thing I've gained from it all (apart from a functioning process which validated REALLY well!) is the understanding from the board that a PL/SQL course really would be money well spent!

Fee

[red] Nadolig Llawen a Blwyddyn Newydd Dda [/red]
 
Fee,

off thread, what does your signature mean?

Tharg

Grinding away at things Oracular
 
It means Merry Christmas and a happy new year, in Welsh of course...

Fee

[red] Nadolig Llawen a Blwyddyn Newydd Dda [/red]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top