Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Unicode filenames and Perl on Windows

Status
Not open for further replies.

shadow67

Technical User
Mar 6, 2006
4
US
Hi,

I am writing a perl script that loops over my directories on a Windows XP system, and some of the directories have non-ascii filenames (accented letters and some Japanese text).

This seems to break many things in perl, and I was wondering if there were any workarounds.

One problem I am having is just traversing directories. When I do an opendir / readdir, I cannot "chdir" to the value that is returned. For example, readdir will return a string of non-ascii characters for a Japanese directory, but this string of characters is not accepted by chdir.

I have done a lot of research of this on the web, and from what I have figured out, the system calls that perl is using is trying to do some sort of translation on the characters.

Is there any way to make the system not do any translation? I have tried changing my default "language" for non-unicode programs, and it does change the results, but just ends up with a different set of "broken directories".

Anything that would allow the value returned by "readdir" to be passed into "chdir" for all types of filenames would make me very happy.

Some of what I read on the internet talks about older versions of perl supporting a "-c" option that kind of did what I am looking for. Would tracking down and downloading an old version fix my problem?

Thanks,
Ron Miller

p.s. Fixing readdir / chdir would fix most of my problems, but it would also be nice if I could open files with non-ascii filenames, and use "rename" as well.
 
Have you looked into globbing? File::Find or File::Find::Rule?

perl -c is for syntax checking. Doesn't run the script, just checks the syntax.

Could you give us a few sample dirnames etc to play around with?

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Hello Paul,

Thanks for the quick reply!

You are right about -c ... I had meant -C which apparently caused Perl to use "Wide" functions for system interactions with the filesystem. This was removed because it caused some inconsistent behaviour with other unicode support.

As far as an example goes, here is one. I am writing something that processes the files that I have in my iTunes directory. One of the subdirectories is called "Café Quijano". When I do a readdir() on the DIR filehandle I get from iTunes, it returns the filename as Cafe Quijano (with the é changed into an e).

Note that my default Language (from Control Panel --> Regional & Language Options --> Advanced) is "Japanese", so I believe that Windows sees the é character as something that does not exist in the Japanese character sets, so it replaces it with the best approximation (an 'e').

I have tried changing my default language to "English", and if I do that, then Perl does pass back the correct filename for this example, but then all of my Japanese filenames come back with the characters replaced with question marks.

This leads me to believe that the problem is because Perl is using the non-wide system calls, and Windows does character translations for the non-wide system calls.

As I write this, I am thinking more and more that there is nothing that I am going to be able to do about this unless I want to get one of the older versions of Perl with the -C flag.

However, it would be nice if someone out there had a simpler way.

Thanks,
Ron Miller
 
One more thing ... thanks for the suggestion, but unfortunately File::Find also fails to process the directories that have non-standard characters in them. I believe it processes the directory structure using opendir/readdir, so it has the same limitations.

Ron Miller
 
Multiple passes perhaps ...? Just clutching ;-)

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
What about running two passes under different users? You could use SRVANY to set it up as a service, so you wouldn't have to physically log on interactively, and then merge the two sets of results.

Where they match all is good, where they don't there has to be some test to see which guy is right for each file.

The straws have had it ...

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Hi Paul,

Thanks for the tips.

I actually ended up reverting to Activestate perl 5.6, and with the -C switch, it does exactly what I needed.

Installing new CPAN modules for Activestate is a real pain, but other than that minor grip I am now happy.

Thanks for your time!

Ron
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top