modified on 6 November 2007 at 01:04 ••• 10,340 views

ASCII 85 Filter

From GNUpdf

Jump to: navigation, search

PDF Reference 3.3.2 ASCII85Decode Filter:

The ASCII85Decode filter decodes data that has been encoded in ASCII base-85 encoding and produces binary data. The following 
paragraphs describe the process for encoding binary data in ASCII base-85; the ASCII85Decode filter reverses this process.

The ASCII base-85 encoding uses the characters ! through u and the character z, with the 2-character sequence ~> as its EOD marker. The 
ASCII85Decode filter ignores all white-space characters. Any other characters, and any character sequences that represent impossible 
combinations in the ASCII base-85 encoding, cause an error.

Specifically, ASCII base-85 encoding produces 5 ASCII characters for every 4 bytes of binary data. Each group of 4binary input bytes, 
`(b1 b2 b3 b4)`, is converted to a group of 5 output bytes, `(c1 c2 c3 c4 c5)`, using the relation:

(b1 * 256^3) + (b2 * 256^2) + (b3 * 256^1) + b4 =

(c1 * 85^4) + (c2 * 85^3) + (c3 * 85^2) + (c4 * 85^1) + c5

In other words, 4 bytes of binary data are interpreted as a base-256 number and then converted to a base-85 number. The five bytes of the 
base-85 number are then converted to ASCII characters by adding 33 (the ASCII code for the character !) to each. The resulting encoded data 
contains only printable ASCII characters with codes in the range 33 (!) to 117 (u). As a special case, if all five bytes are 0, they are 
represented by the character with code 122 (z) instead of by five exclamation points (!!!!!).

If the length of the binary data to be encoded is not a multiple of 4 bytes, the last, partial group of 4 is used to produce a last, partial 
group of 5 output  characters. Given n (1, 2 or 3) bytes of binary data, the encoder first appends 4 - n zero bytes to make a complete group 
of 4. It then encodes this group in the usual way, but without applying the special z case. Finally, it writes only the first n + 1 characters 
of the resulting group of 5. These characters are immediately followed by the ~> EOD marker.

The following conditions (which never occur in a correctly encoded byte sequence) cause errors during decoding:

   * The value represented by a group of 5 characters is greater than 2^32 - 1.
   * A z character occurs in the middle of a group.
   * A final partial group contains only one character.