Smart questions
Smart answers
Smart people
Join Tek-Tips Forums
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login




Remember Me
Forgot Password?
Join Us!

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.
Jobs from Indeed

Link To This Forum!

Partner Button
Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

ddrillich (TechnicalUser) (OP)
20 Aug 07 16:27
Good Day,

We have a DB defined to be encoded as Latin-1. We would like to convert it to al32utf8. However, we are starting to realize that even though it's defined as Latin-1, we have within it characters from different encodings. For example, cp1252.

Any ideas how to proceed with this task?

Regards,
Dan
ddrillich (TechnicalUser) (OP)
24 Aug 07 9:02
Good Day,

Our DBA ran an Oracle conversion utility that attempted to convert a sample DB. The conversion was defined as ISO-8859-1 to UTF8. During this conversion, Oracle reported about thousands of characters as exceptions.


Here is part of the report -

User  : IKMAPP
Table : IKS_BINARYDOCUMENT
Column: IKS_FILENAME
Type  : VARCHAR2(1000)
Number of Exceptions         : 5         
Max Post Conversion Data Size: 137       

ROWID              Exception Type      Size Cell Data(first 30 bytes)     
------------------ ------------------ ----- ------------------------------
AAABKvAAIAAAAfYAAB lossy conversion         Rollout instructions for Lindi
AAABKvAAnAAABcnAAI lossy conversion         Project Manager – 040003588.pd
AAABKvAAnAAABcnAAL lossy conversion         Business Infrastructure Techno
AAABKvAAnAAABcoAAA lossy conversion         Business Continuity Relationsh
AAABKvAAnAAABcoAAB lossy conversion         Infrastructure and Controls Im
------------------ ------------------ ----- ------------------------------

User  : IKMAPP
Table : IKS_BINARYDOCUMENT
Column: IKS_MD5CONTENTHASH
Type  : VARCHAR2(1000)
Number of Exceptions         : 1568      
Max Post Conversion Data Size: 31        

ROWID              Exception Type      Size Cell Data(first 30 bytes)     
------------------ ------------------ ----- ------------------------------
AAABKvAAIAAAAdqAAD lossy conversion         ¹]]’Ì .    èå^>[Å@a              
AAABKvAAIAAAAehAAB lossy conversion         &¨´¦é÷P2È1i’In"              
AAABKvAAIAAAAgDAAG lossy conversion         nÁ "ø°&®sa’mûÜ0              
AAABKvAAIAAAAgmAAD lossy conversion         ’= y»ý é*Õ$¤0³             


What does it mean - 'lossy conversion' and how do I find the exact character that caused the exception?

Regards,
Dan
ddrillich (TechnicalUser) (OP)
24 Aug 07 12:46
I picked up one of the ROWIDs and ran -

select count(*) from IKMAPP.IKS_BINARYDOCUMENT where ROWID = chartorowid('AAABKvAAnAAABcnAAI');

Got back -

ERROR at line 1:
ORA-01410: invalid ROWID

Maybe the conversion changed the ROWIDs...
ddrillich (TechnicalUser) (OP)
27 Aug 07 21:26
Dan,

Things got a bit clearer when I pointed to the right instance winky smile

We realized two things using the CSSCAN utility -

1) That most of the exceptions are due to binary data stored as varchar2.

2) The rest of the exceptions seem to be cp1252 related. Meaning, cp1252 characters reached our DB, even though it's defined to be Latin-1.

We found the values using queries such -

select ascii(substr(IKS_FILENAME,31,1))  from IKMAPP.IKS_BINARYDOCUMENT
where rowidtochar(ROWID) = 'AAABKvAAIAAAAfYAAB';

ASCII(SUBSTR(IKS_FILENAME,31,1))
--------------------------------
                             146


Any suggestions how to proceed? One thing that I wonder about is whether we can redefine the DB to have the cp1252 character set. After all, cp1252 is a Microsoft extension of Latin-1.

Regards,
Dan
SantaMufasa (TechnicalUser)
1 Sep 07 0:10
Dan,

As I have mentioned in several other character-set-conversion threads, the only way that I have ever found success in converting an existing database is to:

1) create a new, completely separate database that has al32utf8 character-set definition.
2) do user exports from your Latin-1 database.
3) import into the al32utf8 database the dump files from the Latin-1 database.

Let us know your findings.

santaMufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]

ddrillich (TechnicalUser) (OP)
3 Sep 07 21:27
Dave,

Thank you for the information!

One of the issues we have is that even though the character set of the DB is defined as Latin-1, we have within it characters from the cp1252 character set and we don't know how to deal with these characters.

It seems that the root cause of our problem is explained at -
http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10749/ch3globenv.htm#sthref205

 
In our case, we have the following setting -
NLS_LANG -  AMERICAN_AMERICA.WE8ISO8859P1
NLS_CHARACTERSET - WE8ISO8859P1

Meaning, both are Latin-1. The document says the following about this case - When these settings are the same, Oracle assumes that the data being sent or received is encoded in the same character set as the database character set, so validation or conversion is not guaranteed.

Any way out of this situation?

Regards,
Dan

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Back To Forum

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close