×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Determine encoding of text
2

Determine encoding of text

Determine encoding of text

(OP)
Is there away to determine the encoding of text in a PCL file? Lets say I have two PCL files. The first one prints 'Hello World!' and it's encoding is ASCII. The second file I have prints 'I love pie!' and it's encoding is EBCDIC.

 (esc) *p99XHello World!
 (esc) *p99XÉ@"—‰...Z

Is there away to determine which file has the EBCDIC text and which file has the ASCII text?

~ Thanks

RE: Determine encoding of text

Look at it with an ascii editor, which will display the files similarly as you have shown above.  If you can read it, it is ASCII.  If not, and you know the only alternative is EBCDIC, then that is what it is

To verify it, you need to use an editor that can display EBCDIC, or hex.  For hex, you need to have an EBCDIC character chart to translate the hex values to EBCDIC display.  Note that in EBCDIC, the alphabet is not contiguous.but has other characters between I & J, as well as R & S.  The lower case alphabet has the same gaps.

Note also that (esc)*p99X is a PCL command to indent 99 units, which usually would be abour 1/3".

RE: Determine encoding of text

In the PCL file, there is probably a Symbol Set definition that when applied will render the text in its original form.
 

Jim Asman
http://www.spectracolorservices.com

RE: Determine encoding of text

(OP)
The PCL commands above were used as an example. I'm looking for a programatic solution to determine the encoding of a PCL file. The application I'm working on currently parses and processes ASCII and EBCDIC files. After the file has been manipulated it's written back to the file system as ASCII if it was orginally an EBCDIC file. Currently the application looks at the first page for a certain word/phrase that should appear in all documents. If it can't be read then it's assumed EBCDIC. I was wondering how printers/viewers are able to distinguish between the two encodings. I really don't want the parser portion of the application to beaware of any business logic. This will limit the use case of the parser.

RE: Determine encoding of text

That is what the symbol set is for in a PCL file. As I stated previously, you should find a symbol set specification in the file that gives the character mappings.

 

Jim Asman
http://www.spectracolorservices.com

RE: Determine encoding of text

Your sample does not appear to make logical sense:

(a) The string 'I love pie!' is 11 characters long;

(b) The string 'É@"—‰...Z' is only 9 characters long.

(c) Assuming that the quoted string 'É@"—‰...Z' is what is displayed  on a device which assumes an extended-ASCII-based encoding (such as 'Windows ANSI', or 'ISO 8859-1 Latin-1'), this would be the (hexadecimal) character codes:
C9 40 22 97 89 2E 2E 2E 5A

which is unlikely to be plain text (upper-case  and lower-case alphabetic characters, and simple punctuation) encoded in any of the various 'standard' EBCDIC encodings; closest would be 'I..pi...!' where the '.' characters represent non-graphic characters.

(d) Perhaps much more likely is that the text is encoded using the 'obfuscation' techniques associated with downloaded soft fonts, so  recovering the 'plain text' is impossible (or, at least, very difficult, without a very good knowledge of the downloaded soft font).

(e) ... or it could be, as Jim Asman suggests, using an obscure, or user-defined, symbol set (the HP PCL name for 'coded character set').

Without analysing the whole PCL file, it is impossible to be sure just how the text is encoded.  

RE: Determine encoding of text

... and if (because of private data) you don't want to post samples of your PCL files here for analysis, you can analyse them yourself using the 'PRN File Analyse' tool in the PCL Paraphernalia application (which you can obtain via  http://www.pclparaphernalia.eu ).

RE: Determine encoding of text

(OP)
I've attached two small files. I replaced all of the original text with asterisks for security reasons. I used asterisk-ebcdic.pcl as input file. I'm wondering if there is away to scan the input file programatically and determine it's encoding. Currently the parser has to be told the encoding of the file. Here is sudo code of what I'm doing.


CODE

  def isEbcdic = true
  new PCLParser(file, isEbcdic).eachCommand { cmd ->
     if(cmd.isText()) {
         cmd.setData(new String(cmd.data, "cp037").bytes)   \\ Where cp037 is the carset name
     }
  }

RE: Determine encoding of text

HP does not have a EBCDIC symbol set. So, most files that start-off as EBCDIC run through some type of protocol conversion and the EBCDIC is mapped to a download font in a custom symbol set.  This is a file we use in our product to map those characters.

    32     64
    à     68
    ¢     74
    .     75
    <     76
    (     77
    +     78
    |     79
    &     80
    é     81
    è     84
    !     90
    $     91
    *     92
    )     93
    \     94
    ¬     95
    -     96
    /     97
    Ñ    105
    ,    107
    %    108
    _    109
    >    110
    ?    111
    :    122
    #    123
    @    124
    '    125
    =    126
    "    127
;
; US ASCII
;
     a    129
    b    130
    c    131
    d    132
    e    133
    f    134
    g    135
    h    136
    i    137
    °    144
    j    145
    k    146
    l    147
    m    148
    n    149
    o    150
    p    151
    q    152
    r    153
    s    162
    t    163
    u    164
    v    165
    w    166
    x    167
    y    168
    z    169
     ]    181
    `    185
    A    193
    B    194
    C    195
    D    196
    E    197
    F    198
    G    199
    H    200
    I    201
    ô    203
    J    209
    K    210
    L    211
    M    212
    N    213
    O    214
    P    215
    Q    216
    R    217
    S    226
    T    227
    U    228
    V    229
    W    230
    X    231
    Y    232
    Z    233
    0    240
    1    241
    2    242
    3    243
    4    244
    5    245
    6    246
    7    247
    8    248
    9    249

However, there are many other ways to get from EBCDIC to ASCII PCL.  So, trying to solve these types of problems without a sample file is painful.  You should generate a mock-up file for analysis.

RE: Determine encoding of text

It appears as though you replaced all the printablb text in the files with asterisks.  Are you just trying to make it as difficult as possible for someone to help you?

Aside from that, your ASCII file has a partial, temporary download font bound to the undefined default symbol set.

The EBCDIC file has a partial, temporary download font bound to a custom symbol set.  If the mapping file that I provided shows you that a "B" is remapped to cell 194 in the ISO 8859/1 Latin I (E1) character set and so on.

So, you're in luck the characters are not "scambled".  But, the EBCDIC characters are just using a custom symbol set because HP does not have one for EBCDIC.

If you change printer drivers, fonts, point sizes ... you could be back in the soup.

RE: Determine encoding of text

As pcltools has already advised:

(a) The characters to be printed (in both the 'ASCII' and 'EBCDIC' samples) are using a custom 'symbol set' which is effectively defined by the characters downloaded in the custom (bitmap) soft font download.

(b) So the character mapping from the original source documents to the values used in the PCL files is effectively defined by the process that generates the downloaded soft font files.

(c) With the data characters in your (doctored) samples replaced by asterisks (ASCII sample) or backslash (EBCDIC sample), it is difficult to see what characters are used - and you'd need the original documents to work out the mapping.

Attached are analyses of your two .pcl files  
 

RE: Determine encoding of text

... and using the two soft fonts to (attempt to) print all characters (range 0x32 - 0xff) appears to show the following mappings between code-point (given as a hexadecimal value) and the ASCII character:

ASCII font:

CODE

0x2a *
0x2f /
0x30 0
0x31 1
0x32 2
0x34 4
0x37 7
0x38 8
0x39 9
0x3a :
0x41 A
0x42 B
0x43 C
0x44 D
0x45 E
0x46 F
0x47 G
0x49 I
0x4a J
0x4c L
0x4d M
0x4e N
0x4f O
0x50 P
0x51 Q
0x52 R
0x53 S
0x54 T
0x55 U
0x56 V
0x57 W
0x58 X
0x59 Y

EBCDIC font:

CODE

0x5c *
0x61 /
0x7a :
0xc1 A
0xc2 B
0xc3 C
0xc4 D
0xc5 E
0xc6 F
0xc7 G
0xc9 I
0xd1 J
0xd3 L
0xd4 M
0xd5 N
0xd6 O
0xd7 P
0xd8 Q
0xd9 R
0xe2 S
0xe3 T
0xe4 U
0xe5 V
0xe6 W
0xe7 X
0xe8 Y
0xf0 0
0xf1 1
0xf2 2
0xf4 4
0xf7 7
0xf8 8
0xf9 9

Note that (on both cases) some of the alphabetic characters and digits do not appear to be defined.

RE: Determine encoding of text

So to return to your original question:

>> Is there a way to determine which file has the EBCDIC text and which file has the ASCII text?

The answer is 'not very easily', since you'd have to be able to interpret the downloaded soft fonts (and with a custom symbol set you're in the realm of working with 'shapes', rather than defined mappings).

Of course, if the two soft fonts (one for ASCII, the other for EBCDIC) were always the same, you could perhaps recognise which one was in use by the 'signature' of its header.
... but it seems unlikely that they WILL always be the same for each file, since the sample ones you've provided don't include all the alphabetic characters or digits - although perhaps the header may always be the same?

Note that the fonts are the old format-0 bitmap fonts, which may, or may not, be supported on modern LaserJet devices.
... and use of a 'unit of measure' of 300 PCL units-per-inch perhaps indicates the age of the generated PCL.

RE: Determine encoding of text

... an attached are more verbose analyses of your .pcl files, showing the character shapes associated with each downloaded soft font (bitmap) character.

RE: Determine encoding of text

Any feed-back?

RE: Determine encoding of text

(OP)
I wasn't aware that HP didn't have a concept of EBCDIC. The information provided above was useful.

You mentioned previously that the fonts used in the file are format-0 bitmap. Do you mind elaborating a bit on that? Is it possible the PCL in this file is PCL4?

RE: Determine encoding of text

>>  fonts used in the file are format-0 bitmap. Do you mind elaborating a bit on that?

There are a number of PCL soft font formats:

0 - original bitmap format; now deprecated; "not recommended for LaserJet 4 and later printers".

10 - Intellifont Bound scalable
11 - Intellifont Unbound scalable

Intellifont format has fallen out of favour, and may not be supported on modern devices.

15 - TrueType scalable (bound and unbound)
16 - Universal: as TrueType scalable (but capable of 'large font' support).

20 - Resolution-specified bitmap; replaced format 0 fonts.


>> Is it possible the PCL in this file is PCL4?

Possibly, although as PCL5 is backwards compatible, difficult to say from your small sample.

RE: Determine encoding of text

(OP)
Thanks, I'm interesting in knowing more about some of this. Do you know where I could learn more about the changes from PCL4 to PCL5? Also, I'd also be interesting in knowing what other PCL is now deprecated. Is this documented some where?

RE: Determine encoding of text

... I forgot to mention that format 16 fonts can be used to define bitmap fonts, as well as TrueType scalable.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close