Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Latin Characters All of a Sudden 1

Status
Not open for further replies.

jdubbish

Programmer
Dec 9, 2004
70
US
I have a site that allows users to input text to a SQL DB that will be diplayed on various pages throughout the site. All of sudden, and for seemingly no reason, some special characters, such as hyphens (-), single quotes ('), and double-quotes ("), among others, have been replaced by latin characters.

This has been happening on nearly all of my tables, including tables that have not been updated since this has started happening.

Here's an example:

TBL1 has fldQUOTE of type ntext. The value of one record used to display on the Web site using ASP to output the value as the following:

start text from DB -->
I am so impressed by NAME & staff I have decided to spread the word in Australia & re-sell all there great software & products.
<-- end text from DB

However, it is now displayed as follows:

start text from DB -->
I am so impressed by James Leichter & Aptora staff I have decided to spread the word in Australia & re – sell all there great software & products.
<-- end text from DB

Notice that the hyphen in "re-sell" was replaced with "re – sell". When I look at the value directly using SQL Management Studio it displays as "re – sell".

For details on how this text is handled on the site consider the following scenario:

1. User inputs text into a form textarea on a .asp page.
2. The textarea value is inserted into a new record in the SQL Table).
3. Another .asp page queries the record in the table and then displays the value.

Again, the text displayed above displayed perfectly fine on my site for years, and just within the past couple of months has been displayed with these new characters.

Please let me know if you have any ideas as to the cause of this.
 
What is the collation setting of the database?

"NOTHING is more important in a database than integrity." ESquared
 
what are your datatypes? If you have gone from nvarchar to varchar you could easily have characters it can't interpret properly.

"NOTHING is more important in a database than integrity." ESquared
 
Office has autocorrect to turn the humble hyphen into an n-dash when the value is typed or perhaps manipulated in some way in a user form. The n-dash may be unicode, in which case nvarchar may be necessary. Or what SQLSister said about the column's data type changing...

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.

The best part about anything that has cheese is the cheese.[/color]
 
As mentioned in my original post the fields are of data type NTEXT.

The most bizarre thing about this is that these fields have not changed, been updated, or otherwise altered. It's almost like some sort of update ran against my entire DB.

Am I not handling the input of these special characters properly? It seems that SQL should be able to handle inputting and outputting hyphens, quotes, and other special characters.

It seems like I'm missing a small portion of the data processing, and that is the portion causing this problem.

Let me know what you guys think, and thank a ton for your help!
 
Hmmm..... Odd. I won't pretend to know specifically what is causing this problem. I would suggest that you first take a look here:

If I had to guess, I would suggest that this is NOT a database issue at all, but more of an ASP issue.

Something probably changed on the ASP pages. In the head section, there is a 'meta' tag to determine the content type. Also, you could try adding Server.HTMLEncode to the pages to see if that will help.

I'm no ASP expert, but I was playing around a little and noticed this little gem....

First of all, nText field store unicode data. The unicode code for the greek Omega character is 937. So, if you open Query Anaylyzer and run this....

Code:
Select NCHAR(937)

You will see the OMEGA character in your display. (You may need to press CTRL-T before you see it).

So... great. We can see unicode character in QA. Helpful, but not what you really need. You need this to display in a web browser (using ASP). Well, let's experiment a little, shall we?

I create a new ASP document on my server named test.asp. It looks like this...

Code:
<html xmlns="[URL unfurl="true"]http://www.w3.org/1999/xhtml">[/URL]
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
	</head>
	<body>
		<% 
			
			Response.Write(chrw(937))
			Response.Write("<br/>")
			Response.Write([!]Server.HTMLEncode([/!]chrw(937)[!])[/!])
		%>
	</body>
</html>

First, realize the CHRW is the equivalent of the nchar function in T-SQL. Essentially, it returns a unicode character. Next, notice how I write another line, but add Server.HTMLEncode. When you view this page in a browser, you will see on the first line something that looks like an O or 0. However, on the second line, you will see the greek letter omega.

I encourage you to try adding Server.HTMLEncode to your ASP page. I suspect this will solve your problem. Another solution may have something to do with the content-type meta 'stuff'. This is not my area of expertise, but I know there are some pretty sharp ASP guys that like to dabble in SQL Server, so there is a chance that someone will be able to fill in some of the gaps here.

You may also have better luck if you post this question in the ASP forum.

Hope this helps.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
jdubbish,

Have you had a chance to read my previous post? Was it helpful? I'm just curious to know if you found a solution yet.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Wait... I had non-unicode high ascii characters such as [&eacute;] show up as two characters in my web application just recently. And I realize right this second that it could have something to do with the character set I did (or didn't) define in the page's metadata. So unnecessary character translation occurred based on the implied or browser-chosen character set, and unicode was chosen which got displayed as two single-byte characters in a select control instead of one dual-byte character...

So take a look at your web page's character sets.
 
gmmastros,

I did review post, and the information given is very useful for troubleshooting my issue further.

However, I've had bigger fish to fry in the past few days, so I have not had the chance to test and confirm anything.

Thanks to everyone for the suggestions, it is very useful!
 
Well, after a bit of research and testing I've found that the problem is indeed an issue with ASP's handling of Unicode characters.

I am still not sure EXACTLY what caused this to change on my site, but after reviewing numerous site backups from the past year I believe this to be a problem with a function that I've changed in which handled a small variety of unicode characters when inserting and getting them from the DB.

The pain of this issue is that since my code was not initially written to proplerly handle unicode (nor were the various third-party controls I've implemented) I have to update tons of scripts to handle it.

Better late than never I suppose.

Thanks gain to all for you help, it has added a good amount of knowledge to my experience bank.
 
If you find anything that might be helpful, about about posting about it in the ASP forum and putting a link to that here?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top