<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc PUBLIC "-//IETF//DTD RFC 2629//EN" "https://xml2rfc.tools.ietf.org/authoring/rfc2629.dtd" [
  <!ENTITY rfc2119 SYSTEM "https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
  <!ENTITY rfc5646 SYSTEM "https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5646.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xsl' ?>

<?rfc strict="yes" ?>
<?rfc toc="yes"?>
<?rfc tocdepth="4"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>

<rfc category="info" docName="draft-msporny-base58-00"
     ipr="trust200902" submissionType="independent">
  <front>
    <title abbrev="Base58 Encoding">The Base58 Encoding Scheme</title>

    <author fullname="Manu Sporny" initials="M.S." surname="Sporny">
      <organization>Digital Bazaar</organization>
      <address>
        <email>msporny@digitalbazaar.com</email>
      </address>
    </author>

    <!-- Meta-data Declarations -->
    <date month="November" year="2019" />
    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <keyword>base</keyword>
    <keyword>encoding</keyword>
    <keyword>base58</keyword>
    <keyword>bitcoin</keyword>
    <abstract>
      <t>
This document specifies the base 58 encoding scheme, including an introduction
to the benefits of the approach, the encoding and decoding algorithm,
alternative alphabets, and security considerations.
      </t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>
When trasmitting data, it can be useful to encode the data in a way that
survives lower fidelity transmission mechanisms. For example, encoding data
using a human alphabet, or encoding the data in a way that a human being can
visually confirm the encoded data while reducing the opportunity to make a
mistake. The Base58 encoding scheme is similar to the Base64 encoding scheme in
that it can translate any binary data to a text string. It is different from
Base64 in that the conversion alphabet has been carefully picked to work well
in environments where a human being, such as a developer or other human being,
might need to work with the information.
      </t>
      <t>
Base58 is designed with a number of usability characteristics in mind that
Base64 does not consider. First, similar looking letters are omitted such as 0
(zero), O (capital o), I (capital i) and l (lower case L). Doing so eliminates
the possibility of a human being mistaking similar characters for a different
character. Second, the non-alphanumeric characters + (plus), = (equals), and /
(slash) are omitted to make it possible to use Base58 values in all modern file
systems and URL schemes without the need for translation. Third, by using only
alphanumeric characters, easy double-click or double tap selection is possible
in modern computer interfaces. Fourth, social messaging systems do not line
break on alphanumeric strings making it easier to e-mail or message Base58
values for debugging or sharing purposes. Fifth, unlike Base64, there is no
padding making many Base58 values smaller (on average) or the same size as
Base64 values for values up to 64 bytes, and less than 2% larger for values
larger than 256 bytes. Finally, Base64 has eleven encoding variations that lead
to miscommunication among developers on which variety of Base64 to use. This
specification asserts that there is just one simple encoding mechanism for
Base58, making implementations and developer interactions simpler.
      </t>
      <t>
While Base58 does have a number of beneficial usability features, it is not
always a good choice for an encoding format. For example, when encoding large
amounts of data, it is 2% less efficient than base64. Developers might avoid
Base58 if a 2% increase in efficiency over large data sets is desired.
      </t>

      <section title="Requirements Language">
        <t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
        </t>
      </section>
    </section>

    <?rfc needLines="8" ?>

    <section anchor="alphabet" title="The Base58 Alphabet">
      <t>
The Base58 alphabet consists of the following characters:
      </t>

      <figure>
        <artwork>123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz</artwork>
      </figure>

      <t>
Each byte value from 0 to 57 maps to the alphabet above in the following way:
      </t>

      <texttable anchor="alphabet-table" title="Base58 Mapping Table">
        <ttcol align="center">Byte</ttcol>
        <ttcol align="center">Character</ttcol>
        <ttcol align="center">Byte</ttcol>
        <ttcol align="center">Character</ttcol>
        <ttcol align="center">Byte</ttcol>
        <ttcol align="center">Character</ttcol>
        <ttcol align="center">Byte</ttcol>
        <ttcol align="center">Character</ttcol>

        <c>0</c><c>1</c> <c>1</c><c>2</c> <c>2</c><c>3</c> <c>3</c><c>4</c>
        <c>4</c><c>5</c> <c>5</c><c>6</c> <c>6</c><c>7</c> <c>7</c><c>8</c>
        <c>8</c><c>9</c> <c>9</c><c>A</c> <c>10</c><c>B</c> <c>11</c><c>C</c>
        <c>12</c><c>D</c> <c>13</c><c>E</c> <c>14</c><c>F</c> <c>15</c><c>G</c>
        <c>16</c><c>H</c> <c>17</c><c>J</c> <c>18</c><c>K</c> <c>19</c><c>L</c>
        <c>20</c><c>M</c> <c>21</c><c>N</c> <c>22</c><c>P</c> <c>23</c><c>Q</c>
        <c>24</c><c>R</c> <c>25</c><c>S</c> <c>26</c><c>T</c> <c>27</c><c>U</c>
        <c>28</c><c>V</c> <c>29</c><c>W</c> <c>30</c><c>X</c> <c>31</c><c>Y</c>
        <c>32</c><c>Z</c> <c>33</c><c>a</c> <c>34</c><c>b</c> <c>35</c><c>c</c>
        <c>36</c><c>d</c> <c>37</c><c>e</c> <c>38</c><c>f</c> <c>39</c><c>g</c>
        <c>40</c><c>h</c> <c>41</c><c>i</c> <c>42</c><c>j</c> <c>43</c><c>k</c>
        <c>44</c><c>m</c> <c>45</c><c>n</c> <c>46</c><c>o</c> <c>47</c><c>p</c>
        <c>48</c><c>q</c> <c>49</c><c>r</c> <c>50</c><c>s</c> <c>51</c><c>t</c>
        <c>52</c><c>u</c> <c>53</c><c>v</c> <c>54</c><c>w</c> <c>55</c><c>x</c>
        <c>56</c><c>y</c> <c>57</c><c>z</c> <c></c><c></c> <c></c><c></c>
      </texttable>

      <t>
Other application-specific alphabets for Base58, such as the Ripple alphabet
and the Flickr alphabet exist. Those alphabets, while valid in their own
application spaces, are not valid encoding formats for this specification and
MUST NOT be used. Supporting more than one Base58 encoding alphabet would
harm interoperability.
      </t>

    </section>

    <section anchor="encode" title="The Base58 Encoding Algorithm">
      <t>
To encode an array of bytes to a Base58 encoded value, run the following
algorithm. All mathematical operations MUST be performed using integer
arithmetic. Start by initializing a 'zero_counter' to zero (0x0), an
'encoding_flag' to zero (0x0), a 'b58_bytes' array, a 'b58_encoding' array,  and
a 'carry' value to zero (0x0). For each byte in the array of bytes and
while 'carry' does not equal zero (0x0) after the first iteration:

      <list style="numbers">
        <t>
If 'encoding_flag' is not set, and if the byte is a zero (0x0), increment
the value of 'zero_counter'. If the value is not zero (0x0),
set 'encoding_flag' to true (0x1).
        </t>
        <t>
If 'encoding_flag' is set, multiply the current byte value by 256 and add it  to
'carry'.
        </t>
        <t>
Set the corresponding byte value in 'b58_bytes' to the value of 'carry' modulus
58.
        </t>
        <t>
Set 'carry' to the value of 'carry' divided by 58.
        </t>
      </list>
    </t>

      <t>
Once the 'b58_bytes' array has been constructed, generate the final
'b58_encoding' using the following algorithm. Set the first 'zero_counter'
bytes in 'b58_encoding' to '1'. Then, for every byte in 'b58_array', map the
byte value using the Base58 alphabet in the previous section to its
corresponding character in 'b58_encoding'. Return 'b58_encoding' as the
Base58 representation of the input array of bytes.
      </t>

    </section>

    <section anchor="decode" title="The Base58 Decoding Algorithm">
      <t>
To decode a Base58 encoded array of bytes to a decoded array of bytes, run
the following algorithm. All mathematical operations MUST be performed using
integer arithmetic. Start by initializing a 'raw_bytes' array, and a
'carry' value to zero (0x0). For each input byte in the array of input bytes:

      <list style="numbers">
        <t>
Set 'carry' to the byte value associated with the input byte character. If
a mapping does not exist, return an error code.
        </t>
        <t>
While 'carry' does not equal zero and there are input bytes remaining:
        <list style="numbers">
          <t>
Multiply the input byte value by 58 and add it to 'carry'.
          </t>
          <t>
Set the output byte value to 'carry' modulus 256.
          </t>
          <t>
Set 'carry' to the value of 'carry' divided by 256.
          </t>
        </list>
        </t>
        <t>
Set the corresponding byte value in 'raw_bytes' to the value of 'carry' modulus
58.
        </t>
        <t>
Set 'carry' to the value of 'carry' divided by 58.
        </t>
      </list>
      </t>

    </section>

    <section anchor="examples" title="Test Vectors">
      <t>
The following examples can be used as test vectors for the algorithms in
this specification:
      </t>

      <t>
The Base58 encoded value for "Hello World!" is:
        <figure>
          <artwork>2NEpo7TZRRrLZSi2U</artwork>
        </figure>
      </t>

      <t>
The Base58 encoded value for "The quick brown fox jumps over the lazy dog." is:
        <figure>
          <artwork>USm3fpXnKG5EUBx2ndxBDMPVciP5hGey2Jh4NDv6gmeo1LkMeiKrLJUUBk6Z</artwork>
        </figure>
      </t>

      <t>
The Base58 encoded value for 0x0000287fb4cd is:
        <figure>
          <artwork>111233QC4</artwork>
        </figure>
      </t>

    </section>


    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>
Thanks to Satoshi Nakamoto for inventing the Base58 encoding format and the
Bitcoin community for popularizing its usage.
      </t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>
      </t>
    </section>
  </middle>

  <back>
  </back>
</rfc>
