Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Major problem, please help

Status
Not open for further replies.

Oostwijk

Technical User
Oct 19, 2003
82
NL
I use LWP::Simple to fetch a certain page on the internet.

The page contains:
<html>
&#8226; geavanceerd zoeken &#8226; voorkeuren &#8226; taalhulpmiddelen het web doorzoeken zoeken in <br><br>5 records -><br>the code
</html>

As you can see &#8226 occures in the code.
I wrote a script that searches for a user defined character/word in the stored page using the grep command.
It works fine, though it's giving me trouble when the user want's to search on the numbers 8,2 or 6.
This is because when I view the fetched page in html mode the &#8226 code is changed into dots, so the numbers 8,2 and 6
shouldn't be found by my script. Though it does, since my code takes a look at the stored html file.
(I only want to show the user defined characters/words that are present in html mode)

Is there a way to let the script read out the fetched page in html mode or do I need to store the fetched page in an other way ?

I've tried to encode the page, but that didn't do the job since it escapes special characters with an backslash. In case of
&#8226 only & would be escaped.


How to work around this ?
 
Greetings to our (Dutch?) neighbour

What is this diamond character, how is it stored in the actual webpage

--Paul
 
Hi there,
(you're right I'm dutch ;) )

Well this is the actual html page:
<html>
&#8226; geavanceerd zoeken &#8226; voorkeuren &#8226; taalhulpmiddelen het web doorzoeken zoeken in <br><br>5 records -><br>the code
</html>

I hope you can help me.
 
Is the page UTF-8 encoded? I would imagine so.

If so, you can do a 'use UTF8;' at the top of your code and then perl will not convert it into gibberish.

That will allow you to search for the UTF8 character code. You just need to find out what the code is for that character. We do this for our French and Japanese sites all the time.

Check out UTF-8 on CPAN

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top