Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Index on CLOB Column and Upper case search??? 1

Status
Not open for further replies.

amozhi

Programmer
Feb 28, 2003
5
IN
Hi,
  I use Oracle 8i.  I have a CLOB column on which i need to search.
The CLOB column contain data mostly in the range of 2000-4000 characters but it can go up to 20000 characters(on very few cases).
  Firstly i want to know how i can do a case insensitive search. I found out we cant  use lower() or upper () on CLOB columns.
 Secondly, how do i create index on the CLOB column as this is not possible with the create index cmd.   If i need to use intermedia for that, how do i find out intermedia is running or not in my server .
  Also i came to know that we need some server  (ctxsrv) to be running for the  online update of index when we insert any new row. How do i find that is running also.
 Lastly do i need to store the CLOB enabled in row or disabled (While creating the table in the test environment i did not set it) which is faster? why?.
Please Advice,
Arun
 
Hi

Last Q:

Do not use in-line (ENABLE STORAGE IN ROW) if your LOB data change in size ex. short ->long or long->short or many rows are deleted. You get to much defragmentation in your table (and in the table’s table space). Use a LOB data segment with a CHUNK size optimal to the LOB length, network traffic and your database block size.

Then Oracle read a data block – it has to read the whole block – and drop columns not used in statements in memory. If LOB column are not used often, then you get a slower system, if using FULL table scanning (often) – it is not wish to use in-line LOBS.

As we all say – it depends!


Regards
Allan
 
Hi

You create an INDEX with the CREATE table ….LOB…

CREATE TABLE xxxxx ( YYYY CLOB, ZZZZ BLOB ) TABLESPACE table_data
LOB ( ZZZZ, YYYY )
STORE AS ( TABLESPACE lob_space DISABLE STORAGE IN ROW
STORAGE( INITIAL 128K NEXT 128K PCTINCREASE 0 )
CHUNK 8192 PCTVERSION 20 NOCACHE NOLOGGING
INDEX ( tablespace index_data STORAGE ( INITIAL 128K NEXT 128K ) )
);



The index LOB segment is used to access the values of the LOB


Regards
Allan
 
Hi Allan,
Thanks for your help. I have the table already created ( I cannot drop and recreate it). Don't we have anything like Create Index separately for LOB datatypes. This was the case why i asked about Inter media....

Also if we are storing data with disable storage in row, for every access of that row (Say i am going to read all the contents of the row), won't the performance will be little slower, becoz oracle has to get the lob locator and do the read separately . Or is my understanding wrong?.
Actually it is my first time i am doing anything with Large Objects.

Regards,
Arun
 
Hi

Use the ALTER TABLE MOVE command to change LOB information like this:

ALTER TABLE xxxxxx MOVE TABLESPACE table_data
LOB ( column_LOB, column_LOB )
STORE AS ( TABLESPACE lob_space DISABLE STORAGE IN ROW
STORAGE( INITIAL 128K NEXT 128K PCTINCREASE 0 )
CHUNK 8192 PCTVERSION 20 NOCACHE NOLOGGING
INDEX ( tablespace index_data STORAGE ( INITIAL 128K NEXT 128K ) )
);

You can move the table insight the same tablespace!

Oracle translates the table and you got index on lobs.

It is correct - the performance on select LOB data on a row with outline LOBS instead of inline LOBS are slower than inline LOBS – have to make extra IO.

BUT you do have to consider “all the other statements” that happens with this table.

Ex. if the data on the LOBS are change often – then use outline LOBS, because LOB data segment are NOT in rollback segment – and you get a better performance.

Disable storage in row improves performance when other columns are frequently read without the LOB value” – Oracle only reads the locators.

Enable storage in row improves performance when small LOBS frequently read with rows but hurts performance of full table scans that do not access LOB values

Then you do need LOB values -> save a IO if inline LOBS
Then you do not need LOB values -> save a unnecessary I/O and memory space with outline LOBS


Regards
Allan
 
Allan,
Thanks very much for clearing me up. One more query (actually i had asked the same in the first part of my long list of queries :) )..
How can i do a query like this on a CLOB column

select * from table_name where upper(column_name) like 'aBc%'. I found there are no dbms_lob.upper kinds present. Remember i use 8i.
Regards,
Arun


 
Ho,
I am sorry. It must have been like

select * from table_name where upper(column_name) like upper('aBc%').

Regards,
Arun
 
Hi

Well – if you used 9i – UPPER and other String functions, it works – Oracle has made the necessary improvements from 8i to 9i.

But you do not have unlimited resource of service and support money and all the time needed to upgrade each year – just like all others – but consider the extra work and code problems instead of upgrading to 9i.

We also have to keep in mind that performance is important. So I would do the following:

1. Create a procedure where I select ROWID into a table – called FoundRelevantRows.

2. Then use the FoundRelevantRows with the table.

3. Select relevant rows with ROWID with or without FOR UPDATE (lock LOB objects during updates)

Ex.1:
-----
drop procedure RelevantRows;
drop table aol;
drop table RelevantRows_Tmp;


create table aol( aolID integer primary key,
Text CLOB,
len integer ) tablespace table_data
lob (text) store as (tablespace lob_space disable storage in row
storage( initial 128k next 128k pctincrease 0)
chunk 8192 pctversion 20 nocache nologging
index (tablespace index_data storage( initial 128k next 128k ) )
);

create index aol_len on aol(len);


CREATE GLOBAL TEMPORARY TABLE RelevantRows_tmp( aol_rowid rowid) on commit delete rows;


create or replace procedure RelevantRows( SearchText IN VARCHAR2 )
IS
v_SearchText varchar2(4000);
BEGIN
v_searchText := UPPER( SearchText );

delete from RelevantRows_tmp;


insert into RelevantRows_tmp
select rowid from aol
where len between 1 and 4001
and UPPER( dbms_lob.SubStr( aol.text, 4000, 1 ) ) like v_SearchText;

insert into relevantRows_tmp
select rowid from aol
where len between 4001 and 8000
and UPPER( dbms_lob.SubStr( aol.text, 4000, 4001 ) ) like v_SearchText;

insert into relevantRows_tmp
select rowid from aol
where len between 8001 and 12000
and UPPER( dbms_lob.SubStr( aol.text, 4000, 8001 ) ) like v_SearchText;

insert into relevantRows_tmp
select rowid from aol
where len between 12001 and 16000
and UPPER( dbms_lob.SubStr( aol.text, 4000, 12001 ) ) like v_SearchText;

insert into relevantRows_tmp
select rowid from aol
where len between 16001 and 20000
and UPPER( dbms_lob.SubStr( aol.text, 4000, 16001 ) ) like v_SearchText;

END;
/



insert into aol values(1, Rpad('aaa', 4000 ), NULL );
insert into aol values(2, Rpad('aca', 4000 ), NULL );
update aol
set len = dbms_lob.getlength( text );

declare
a clob;
begin
select text into a from aol where aolid = 1 for update;

dbms_lob.append(a, a);

update aol
set text = a, len = dbms_lob.getlength( text )
where aolid = 1;
end;
/

commit;

select aolid, len from aol;


/*
Let's rock
*/



execute RelevantRows( 'aca%' );

select aolId, text from aol
where rowid in (select aol_rowid from RelevantRows_Tmp)
for update;

rollback;

execute RelevantRows( 'aaa%' );
select aolId, text from aol
where rowid in (select aol_rowid from RelevantRows_Tmp);


The problem is that we can’t create a function based index on LOBS. We then have to make FULL table scan each time we query the LOB columns.

Ex.2:
-----
OR you have to create several Search_varchar2 columns on the table and make indexes on each of them, like this:

create table aol( aolID integer primary key,
Text CLOB,
Search_1_4000 varchar2(4000),
Search_4001_8000 varchar2(4000),
Search_8001_12000 varchar2(4000),
Search_12001_16000 varchar2(4000),
Search_16001_20000 varchar2(4000)
) tablespace table_data
lob (text) store as (tablespace lob_space disable storage in row
storage( initial 128k next 128k pctincrease 0)
chunk 8192 pctversion 20 nocache nologging
index (tablespace index_data storage( initial 128k next 128k ) )
);

During insert you insert UPPER( DBMS_LOB.SubStr( Text, 4000, X ) ) on each Search varchar2 line and use them directly in your code. It is fast – but you use up space.

Conclusion:
-------------
Use above example – but you got full table scan always but save disk space
Use describe example no 2 – but you use disk space but avoids full table scan
Upgrade to 9i – and you can use UPPER in your code – but remember you still got problems because the implementation of LOBS is not the same as for varchar2.


Regards
Allan
Icq: 346225948
 
Hi

I see - that I have not explained "full table scan" - it is NOT a true full table scan - but full table scan insight the interval of betweens because I created a LENGTH column with index so Oracle makes a RANGE SCAN on the index, but it is almost a full table scan if you got 80% in one interval.
Regards
Allan
Icq: 346225948
 
Allan,

Thank you very much for the help.

Regards,
Arun
 
I must be missing something, but I can't see how aoleuro's approach saves you any time (As written, I don't think it works properly either - but that's probably just typing mistakes).

As I understand it, the RelevantRows procedure makes a series of passes through the table picking out rows that match. You then use the results returned to do your actual search (very quickly). But surely, this way you'll do (within the RelevantRows procedure)

A full table scan looking at characters 1 to 4000 of all CLOBs

Another (admittedly quicker) pass through the table looking at characters 4001 to 8000 of CLOBS which are that long

Yet another for characters 8001 to 12000

and so on, until we've checked all the characters we think we need (incidentally, what happens if the CLOB contains the match string starting at character 3998 and finishing at character 4002?).

Better (in my view) to write a function which does the match test for a single row and returns a true/false. Sure, it'll do a full table scan, but it probably always would even if we were dealing with VARCHAR2s. Something like this

Code:
FUNCTION clob_like (p_CLOB   IN OUT CLOB,
                    p_search IN     VARCHAR2) RETURN VARCHAR2 IS
BEGIN
   -- PL/SQL can cope with varchars up to 32K in size
   -- if we needed to deal with longer CLOBs we'd need
   -- to loop through checking 32K at a time
   IF UPPER(DBMS_LOB.SUBSTR(p_CLOB, 1, 32767)) LIKE UPPER(p_search) THEN
      RETURN 'Y';
   ELSE
      RETURN 'N';
   END IF;
END;

Note that I return a VARCHAR2 because SQL can't cope with booleans (doh!). You can then do somthing like

Code:
SELECT id
FROM   my_table
WHERE  clob_like(my_table.clob_column,'%HELLO%') = 'Y'

I haven't tested this, but think it should work. Maybe I'll have a play and see if I can come up with something more robust. -- Chris Hunt
Extra Connections Ltd
 
Hi Chris

Thanks - and it is true – my code has a bug in the WHERE clause (I show later) and while looking in the SubStr.

You can’t do what you are trying in your code.

Oracle 8i does not accept varchar2 values above 4000 – you get an ORA message. I tried it before writing my code.

You have to split data like I do.

So your function also need to make several select statements and if you do not know the LENGTH of the LOB you are making several full table scans.

To avoid near 4000 I just check near 4000 in the next search.

If re-writing my code, it must look like this:

create or replace procedure RelevantRows( SearchText IN VARCHAR2 )
IS
v_SearchText varchar2(4000);
v_Len integer;
v_pos_start integer;
v_pos_end integer;
BEGIN
v_searchText := UPPER( SearchText );
v_len := length(v_searchtext);

delete from RelevantRows_tmp;

v_pos_start := 1;
insert into RelevantRows_tmp
select rowid from aol
where len is not null
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_pos_start := 4001 - v_len;
insert into relevantRows_tmp
select rowid from aol
where len > 4000
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_pos_start := 8001 - v_len * 2;
insert into relevantRows_tmp
select rowid from aol
where len > 8000
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_pos_start := 12001 - v_len * 3;
insert into relevantRows_tmp
select rowid from aol
where len > 12000
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_pos_start := 16001 - v_len * 4;
insert into relevantRows_tmp
select rowid from aol
where len > 16000
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

END;
/


Why using a procedure than a function? Well it is my experience that then you found relevant rows we normal want to do something in not only one statement – but in several statements. First we show data to the user – then we change something and then we do some other statements and so on.

As I wrote before – the hard problem is to get the relevant rows – then we can do anything with this rows. Remember that if you want to change a LOB then you got to make a FOR UPDATE otherwise Oracle denies.

I am not sure if I understand you correctly – but you can copy-and-past the example code (just remember to change tablespace names) and it runs.

I have tested the code against an Oracle 8i R3 EE.

And I have tested the new code.


Regards
Allan
Icq: 346225948
 
Hi again

I just remember that I forgot to change procedure with last change - to busy working :)

Please note the use of BIND variables - all statements are the same. Oracle does not make a parse it uses already knowing optimizer hints.

create or replace procedure RelevantRows( SearchText IN VARCHAR2 )
IS
v_SearchText varchar2(4000);
v_Len integer;
v_pos_start integer;
v_pos_end integer;
BEGIN
v_searchText := UPPER( SearchText );

delete from RelevantRows_tmp;


v_len := 0;
v_pos_start := 1;
insert into RelevantRows_tmp
select rowid from aol
where len > v_len
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_len := 4000;
v_pos_start := 4001 - v_len;
insert into relevantRows_tmp
select rowid from aol
where len > v_len
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_len := 8001 - v_len;
v_pos_start := 8001 - v_len * 2;
insert into relevantRows_tmp
select rowid from aol
where len > v_len
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_len := 12001 - v_len * 2;
v_pos_start := 12001 - v_len * 3;
insert into relevantRows_tmp
select rowid from aol
where len > v_len
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_len := 16001 - v_len * 3;
v_pos_start := 16001 - v_len * 4;
insert into relevantRows_tmp
select rowid from aol
where len > v_len
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;

v_len := 20001 - v_len * 4;
v_pos_start := 20001 - v_len * 5;
insert into relevantRows_tmp
select rowid from aol
where len > v_len
and UPPER( dbms_lob.SubStr( aol.text, 4000, v_pos_start ) ) like v_SearchText;


END;
/
Regards
Allan
Icq: 346225948
 
Sorry Allan, perhaps I didn't express myself clearly enough. In 8i PL/SQL can cope with VARCHAR2 variables (explicit or implicit) up to 32767 characters in length. SQL columns and SELECTed results cannot be longer than 4000 characters. Try this and you'll see:

Code:
DECLARE
   dummy VARCHAR2(32767);
BEGIN
   NULL;
END;

It compiles happily. Try to change the maximum length to 32768 and it fails.

Anyway, maybe this isn't true on older versions - it's not really all that relevant as long as you are aware of the constraints on your particular box. I've written and tested (more-or-less) an "industrial strength" version of my function that finds matches in CLOBs of any length:

Code:
CREATE OR REPLACE FUNCTION clob_like (p_CLOB   IN CLOB,
                                      p_search IN VARCHAR2) RETURN VARCHAR2 IS

   -- PL/SQL (as opposed to SQL) can cope with varchar2s up to 32K (on 8i).
   -- This may not be the case on older versions, in which case set the
   -- following constant to 4000
   c_maxvarchar2 CONSTANT NUMBER := 32767;

   l_offset      NUMBER := 1;
   l_length      NUMBER;
   l_nextbite    NUMBER;
   l_answer      VARCHAR2(1) := 'N'; 
BEGIN
   -- Determine the length of the CLOB
   l_length := DBMS_LOB.GETLENGTH(p_CLOB);
   
   -- The offset between one bite and the next is less than the size of each
   -- bite. Bites need to overlap by (length of search string - 1) to allow
   -- for an occurrence of the string spanning the join.
   l_nextbite := c_maxvarchar2 + 1 - length(p_search);

   -- Look through the CLOB as a series of VARCHAR2 bites
   WHILE l_offset <= l_length LOOP

      IF UPPER(DBMS_LOB.SUBSTR(p_CLOB, c_maxvarchar2, l_offset)) LIKE UPPER(p_search) THEN
         l_answer := 'Y';
         EXIT;  -- A match has been found, so bang out of the loop
      END IF;
      
      -- Find the start of the next bite. 
      l_offset := l_offset + l_nextbite;
   END LOOP;

   RETURN l_answer;
END;

I'm not sure I agree with your view on multiple actions generally occurring to result sets. It may be the case, it may not, depending on circumstances. If that series of operations includes any COMMITs, you'll have the potential problem of multiple users' result sets being saved into relevantrows_tmp and conflicting with eachother. If you really need the relevantrows_tmp table to do a series of SELECTs, you could populate it with my function thus:

INSERT INTO relevantrows_tmp
SELECT rowid from aol where clob_like(aol.text,'whatever');

I think it would often be more trouble than it's worth.

In summary, whilst your solution works,I think my solution has a number of advantages:

[ul][li]It works for CLOBs of any length

[li]It works for any CLOB from any table, you have to write a version of RelevantRows for each one.

[li]It uses the larger PL/SQL flavour of VARCHAR2s, and hence requires fewer comparison operations.

[li]It only requires one pass through the table, not several.

[li]Queries can be done in one statement, not two.

[li]It allows the calling process to make use of other (perhaps indexed) columns: &quot;select title from plays where clob_like(text,'%falstaff%') and author = 'SHAKESPEARE'&quot;

[li]It doesn't require the a temporary table (and thus doesn't have to worry about multiple users issuing queries at the same time)

[li]It doesn't require an extra column in the source table, nor the overhead of keeping it up to date.

[li]I think it's a more self-contained, elegant solution - but then I would, wouldn't I?
[/ul]

I've not tangled much with CLOBs. It's been interesting, but it took me ages to remember that DBMS_LOB.SUBSTR() has it's arguments in a different order to SUBSTR() (doh!). Why DID they do that?
-- Chris Hunt
Extra Connections Ltd
 
Hi Chris


Well – is these forums not great!

It shows that there is lot of ways to make the world running.

I only got time to make some small comments on your reply.

1. It is a GLOBAL TEMPORARY table I created – “for your eyes only” – It is an Oracle 8i implementation of session only temporary data. None can see your data.

2. The ON COMMIT DELETE ROWS with create table mean that Oracle auto deletes session data on each COMMIT or ROLLBACK and not on ending the session.

3. Each time your function is called in a select statement – then it makes a full table scan. Because you call the statement one time, it does not mean that Oracle just fix the data for you. So if I want to use lets say 3 select then you do full table scan 3 times. Then I use a ROWID table – then I make a full table scan one time – and the other times it is directly. It is correct that your code is only one statement – I have to generate the ROWID table and then query on the table. But as you also said – it depends of the job been done.

4. Oracle reads LOB data in CHUNK. And always in CHUNK size defined on the create table. The CHUNK size must be optimal based on database block size, network traffic and indeed the number of concurrent users running similar statements. My point is, that even if you specifies that Oracle should read 1…32767 – then Oracle read a CHUNK size. The different between read 1..4000 or read 1...32767 is not that different – but it depends of CHUNK size.

5. Then your function is used – the data leaves the Oracle SQL engine and goes to the PL/SQL engine – moving CHUNK size of data. Then I use an insert into ….select…. then it is still in the SQL engine. Then tuning – it is always important to stay insight SQL engine.

6. Now I did only make the code example so Arun could use it as an example for his solution – then creating a more general solution, then I would also use a FOR or WHILE so it was possible to work with LOB to max size (4GB).

Of cause you think your solution is more elegant – it looks very nice – I agree – and again I find your reply good. But I am concerned about PERFORMANCE.

Regards
Allan
Icq: 346225948
 
Hmmm.... I'd not come across Global Temporary tables before. Handy aren't they? Though I haven't used temp tables much since PL/SQL was made available in Version 6(?). The performance hit involved in using PL/SQL is real, but often not significant.

I want to clarify what you mean by &quot;Each time your function is called in a select statement – then it makes a full table scan&quot;. A SQL statement that only includes a call (or several calls) to my function will be a full table scan - it can't use any index on the CLOB column. If it contains other joins in the WHERE clause it will use indexes where available. It won't do a full table scan &quot;each time the function is called&quot; - i.e. for each row of the table - because the function doesn't scan any tables at all. I simply don't recognise your scenario of needing to select the same dataset repeatedly, maybe we're just working in different problem areas.

I'm not sure your SQL statements will stay inside the SQL engine. Surely DBMS_LOB.SUBSTR is a PL/SQL object?

Using PL/SQL instead of SQL is a typical newbie mistake - e.g. using a PL/SQL loop where a single UPDATE statement would do the same job quicker - but it's not always the wrong choice. If you have a dogmatic attachment to the SQL engine over the PL/SQL one you can end up going to such extremes to stay inside it that you actually take up more time.

Performance is an important consideration, but not the only one, and often not the most important one. Issues such as clarity, maintainability and reusability are just as important and often more so. Obviously one should avoid inefficient code where possible, but adopting an overly complicated approach to shave fractions of a second off the response time is rarely worth it. -- Chris Hunt
Extra Connections Ltd
 
Hi Chris

Someday - somewhere - somehow - we have to make a test.

This forum is great - but we have to test the statements we put in the forum - so we do not create more problems.

I have not testet the case - and at the moment I can't test it - I do not have needed data.

Perhaps someday I will make a test.

Thanks for your reply
Regards
Allan
Icq: 346225948
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top