Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to tell a file type in perl? 2

Status
Not open for further replies.

lcs01

Programmer
Joined
Aug 2, 2006
Messages
182
Location
US
I know in unix/linux, we can use a command 'file' to tell file types, e.g.

Code:
% file azip.zip b.txt.gz a.txt
application/x-zip
application/octet-stream
text/plain; charset=us-ascii

I'd think there must be a perl library method which can do the same. However, by looking at File::basename module, I did not find what I want.

Any advice? Thanks.
 
You could always use a shell command in your Perl program:
[tt]
$type=`file azip.zip`;
[/tt]
 
The file test operators -T and -B are meant to test whether a file is text or binary e.g.

if (-T $SOMEFILE) {
print "$SOMEFILE is text\n";
}

but I have found the results to be quite unreliable and prefer to use the unix file command if on that platform.
 
Thank you all for your help and specially thank rharsh!

I installed File::Type module and wrote a small testing code:

Code:
#! /usr/local/bin/perl

use File::Type;

my $ft = File::Type->new();

my $srce1 = "sourcefile.gz";
my $type1 = $ft->checktype_filename($srce1);
print "\$type1 = $type1\n";

my $srce2 = "sourcefile.zip"; # a WinZip file
my $type2 = $ft->checktype_filename($srce2);
print "\$type2 = $type2\n";

And here is the output:

Code:
% ./test.pl
$type1 = application/x-gzip
$type2 = application/zip
 
A follow-up question:

At the site recommended by rharsh, I first downloaded Filesys::Type module (which also requires Module::Pluggable module). However, it seems to me that fstype() defined in Filesys::Type module did not work as expected.

I then downloaded File::Type module which works fine as Istated before. Now, my first question is:

Does File::Type module depend on Filesys::Type module? I took a quick look at the documentation and did not find any dependency here. But I want to the confirmation from some experts here. Thanks.

BTW, here is my small testing code using Filesys::Type module:

Code:
#! /usr/local/bin/perl

use Filesys::Type qw(fstype);

my $srce = "sourcefile.gz";
my $type = fstype($srce);
print "\$type = #$type#\n";

And the output:

Code:
% ./tt.pl
$type = ##

Did I do something wrong?

Again, thank you for your help!
 
The script needs the path to the file so the file can be opened and read, not just a file name. That's probably why yur example above is not working.


The only dependency File::Type appears to have is IO::File, which is a core module.

What File::Type does is it reads the first 16K of a file ad looks for the identifying pattern within the file data, for example:

Code:
  if ($data =~ m[^\x31\xbe\x00\x00]) {
    return q{application/msword};
  }
  if ($data =~ m[^PO\^Q\`]) {
    return q{application/msword};
  }
  if (length $data > 1) {
    $substr = substr($data, 1, 1024);
    if (defined $substr && $substr =~ m[^WPC]) {
      return q{text/vnd.wordperfect};
    }
  }

it goes through a rather long list like this to "ID" a files content type. So it may not return something for all files, but seems to have all the common file types covered and then some.



- Kevin, perl coder unexceptional!
 
Thank you, Kevin!

In addition, my previous code using Filesys::Type did not work due to a typo of the source filename. Sorry about that.

However, this time, I did a few more tests with care. It appears to me that Filesys::Type still does not do the job well.

Following is a test I did:

Code:
[% 1069] => ls -l
total 7940
-rw-r--r--  1 nobody nobody 8086302 Dec  1 14:11 f1.zip
-rw-r--r--  1 nobody nobody      41 Dec  1 14:20 f2.txt.gz
-rwxr-xr-x  1 nobody nobody     384 Dec  1 14:29 useFilesysType.pl*
-rwxr-xr-x  1 nobody nobody     433 Dec  1 14:30 useFileType.pl*

[% 1070] => file f1.zip f2.txt.gz
f1.zip:    Zip archive data, at least v2.0 to extract
f2.txt.gz: gzip compressed data, was "f2.txt", from Unix

[% 1071] => cat useFilesysType.pl

[b]
#! /usr/local/bin/perl

use Filesys::Type qw(fstype);

print "\n";
my $srce1 = "f1.zip";
if(-e $srce1) {
  my $type1 = fstype($srce1);
  print "\$type1 = #$type1#\n";
}
else {
  print "File '$srce1' not found!!\n";
}

my $srce2 = "f2.txt.gz";
if(-e $srce2) {
  my $type2 = fstype($srce2);
  print "\$type2 = #$type2#\n";
}
else {
  print "File '$srce2' not found!!\n";
}
[/b]

[% 1072] => cat useFileType.pl

[b]
#! /usr/local/bin/perl

use File::Type;

my $ft = File::Type->new();

print "\n";
my $srce1 = "f1.zip";
if(-e $srce1) {
  my $type1 = $ft->checktype_filename($srce1);
  print "\$type1 = #$type1#\n";
}
else {
  print "File '$srce1' not found!!\n";
}

my $srce2 = "f2.txt.gz";
if(-e $srce2) {
  my $type2 = $ft->checktype_filename($srce2);
  print "\$type2 = #$type2#\n";
}
else {
  print "File '$srce2' not found!!\n";
}
[/b]

[% 1073] => ./useFilesysType.pl

$type1 = #ext3#
$type2 = #ext3#

[% 1074] => ./useFileType.pl

$type1 = #application/zip#
$type2 = #application/x-gzip#

As you can see here, Filesys::Type module thinks a winzip file is the same file type as a gzip file.
 
It does seem to make some distinction between the two. But perl is at a toal disadvantage for this type of thing compared to the operating system. I would not expect the module to be as accurate as the operating system perl is running on.

- Kevin, perl coder unexceptional!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top