Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

windows word document special charater turn to unreadable in oracle 1

Status
Not open for further replies.

greenpee

Programmer
Aug 3, 2004
39
US
Hi,
I have a form which collect input from website posts and insert into oracle database. Some of the posts are copied and pasted from .doc and when I check these posts in sql, the special charaters like '’' do not show up. When I spool it out and use vi to see it, it shows '^R' in vi. When I view it in web browser, it trun out to be ''. Could someone help?
 
Greenpee,

The nature of MS-Word (and other document processing software) is to embed "special characters" (i.e. high-ASCII and other "beyond-text" control characters) in the document to govern formatting. If, during transfer or storage, "non-text" character (codes) change by even one bit, then the integrity of the document's formatting is at risk.

Your task, should you decide to accept it, is to identify the appropriate storage mechanism to preserve document integrity. Your first task is to decide whether your business need is to store documents "intact" or to store values harvested from the documents. If you must store documents "intact", then you will probably need some type of "raw" or "CLOB" data type. (CLOBs are as problematic as LONGs in Oracle 8i, but in Oracle 9i, CLOBs behave just like VARCHAR2.)

I don't expect that I have given you any resolutions to your problem here, but it furthers the discussion toward reaching an eventual resolution for you. So, please tell us more about your processing requirements.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
@ 18:19 (02Dec04) UTC (aka "GMT" and "Zulu"),
@ 11:19 (02Dec04) Mountain Time
 
Thanks Santa,
We just need to "store values harvested from the documents" and I used verchar2 to store them. We don't need to store documents "intact". I can replace some of special characters with web application before they get into the database. But this would not replace the old data in the database. By the way, this has never been a problem before we reinstalled new oracle database on a new machine.
 
Hi greenpee,

Word is kind to us by introducing non-ascii7 characters even when we type just ascii7 characters. For example, Word quite often converts the dash character to em-dash.

Not only they introduce non-ascii7 characters, they also to do it via their own propriety standard, which is called cp1252 and the following page shows how it differs from the 'rest of the world' standard ISO 8859-1 -
Greenpee, as you said, it a best practice to scan every character that is a candidate to reach your database.
btw, which encoding is your DB set to?
Can you find out what these bytes hold? It should be a value between 0 and 255.

Regards,
Dan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top