The Base45 Data Encoding

When using QR or Aztec codes a different encoding scheme is needed than the already established base 64, base 32 and base 16 encoding schemes that are described in RFC 4648. The difference from those and base 45 is the key table and that the padding with '=' is not required.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Encoded data is to be interpreted as described in RFC 4648 with the exception that a different alphabet is selected.

A 45-character subset of US-ASCII is used, the 45 characters that can be used in a QR or Aztec code. If we look at Base 64, it encodes 3 bytes in 4 characters. Base 45 encodes 2 bytes over 3 characters. The two bytes [A, B] are turned into [ C, D, E] where (A*256) + B = (C*45*45) + (D*45) + E. The values C, D and E are then looked up in Table 1 to produce a three character string and the reverse when decoding. If the number of octets are not dividable by two, the last remaining byte is represented by two characters.

Table 1: The Base 45 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 00 0 12 C 24 O 36 Space 01 1 13 D 25 P 37 $ 02 2 14 E 26 Q 38 % 03 3 15 F 27 R 39 * 04 4 16 G 28 S 40 + 05 5 17 H 29 T 41 - 06 6 18 I 30 U 42 . 07 7 19 J 31 V 43 / 08 8 20 K 32 W 44 : 09 9 21 L 33 X 10 A 22 M 34 Y 11 B 23 N 35 Z

A series of bytes is turned into groups of two. Each such 16 bit value is turned into a series of three values calculated by doing successive calculations modulo 45. The values are in turned looked up in what is displayed in Table 1. Example: The string "Hello!" is the byte sequence [ 72, 101, 108, 108, 111, 33 ]. If we look at each 16 bit value, it is [ 18633, 27756, 28449]. When looking at the values modulo 45, we get [[ 9, 9, 3], [ 13, 30, 36], [14, 2, 9]]. By looking up these values in the table we get the encoded string "993DU E29".

There are no considerations for IANA in this document.

When implementing encoding and decoding it is important to be very careful so that buffer overflow does not take place, or anything similar. This includes of course the calculations of modulo 45 and lookup in the table of characters. Decoder also must be robust regarding input, including proper handling of the NUL character (ASCII 0). Specifically it should be noted that Base 64 (for example) pad the string so that the encoding has the correct number of characters. This is something that Base 45 does not do, i.e. Base 45 do not include padding. Because of this, special care is to be taken when odd number of octets are to be encoded which results not in N*3 characters, but (N-1)*3+2 characters in the encoded string and vice versa, when the number of encoded characters are not divisible by 3.

The authors thank everyone that have been working with Base64 during the years that have proven the implementions are stable.