Base85 (ASCII85) Encoder/Decoder

Encodes and decodes data using Base85 (ASCII85) encoding.

Base85 (ASCII85) Encoding

Base85, also known as ASCII85, is a binary-to-text encoding scheme that encodes binary data into ASCII characters, using a set of 85 printable characters. It is commonly used in applications where a compact representation of binary data is needed, and it is more efficient than Base64 because it produces shorter encoded strings. Base85 is often used in systems like Adobe PostScript and PDF files, where binary data such as images or fonts need to be embedded in text-based formats.

How Base85 (ASCII85) Encoding Works

Base85 encoding works by breaking the input binary data into 4-byte chunks (32 bits), then encoding each chunk into a group of 5 ASCII characters. Each group of 32 bits is represented as a 5-character string, allowing for a more compact representation than other encodings like Base64, which encodes each 3-byte chunk as 4 characters.

Encoding Process:

  1. Input Breakdown: The input binary data is split into groups of 4 bytes (32 bits). If the input is not a multiple of 4 bytes, padding is added to ensure complete chunks.
  2. Integer Representation: Each 4-byte chunk is treated as a 32-bit integer.
  3. Base85 Encoding: The 32-bit integer is divided into 5 groups, and each group is represented by an ASCII character in the Base85 character set. The character set for Base85 includes characters from the ASCII range 33–117.

Example:

Let’s consider an example where the binary data is a simple string: "Hello".

  1. Convert "Hello" to its ASCII byte representation:

    • H = 72, e = 101, l = 108, l = 108, o = 111
  2. Group the bytes into 4-byte chunks (32 bits):

    • Chunk 1: [72, 101, 108, 108]
  3. Encode this 4-byte chunk as a 5-character string using the Base85 algorithm.

The output is a Base85-encoded string representing the original binary data.

Applications of Base85 Encoding

Base85 encoding is commonly used in situations where binary data needs to be stored or transmitted in a text-based format, and compactness is important. Some typical use cases include:

  • PDF Files: Base85 encoding is used in PDF files to encode binary data such as images, fonts, or embedded files.
  • PostScript: In PostScript documents, Base85 is used to encode binary data that must be embedded within text-based files.
  • Data Compression: Base85 is sometimes used in combination with compression algorithms to represent compressed binary data in a human-readable and compact form.

Decoding

Decoding Base85 is the reverse process. The encoded string is parsed, with each group of 5 characters being decoded into a 4-byte chunk. If padding was used during encoding, it is removed, and the original binary data is restored.

Example:

For the Base85-encoded string 9jqo^Fv>, the decoding process would reverse the encoding steps to yield the original binary data.

Key Points

  • Base85 encoding is more compact than Base64 because it encodes 4 bytes of data into 5 ASCII characters, rather than 3 bytes into 4 characters.
  • It is used in specialized formats such as PDF and PostScript to embed binary data efficiently in text-based documents.
  • The Base85 character set includes 85 ASCII characters (from ASCII 33 to 117), which ensures compatibility with systems that may not support extended ASCII characters.

In summary, Base85 encoding offers an efficient and space-saving way to represent binary data in text format, making it ideal for scenarios where compactness and readability are key, such as embedding binary data in documents and files.