<?xml version="1.0" encoding="US-ASCII" ?>
<!DOCTYPE rfc SYSTEM "http://xml.resource.org/authoring/rfc2629.dtd" [
    <!ENTITY rfc2119 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
    <!ENTITY rfc3629 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml'>
    <!ENTITY rfc4343 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4343.xml'>
    <!ENTITY rfc4406 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4406.xml'>
    <!ENTITY rfc4408 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4408.xml'>
    <!ENTITY rfc4592 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4592.xml'>
    <!ENTITY rfc5198 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5198.xml'>
    <!ENTITY rfcSMTP PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.klensin-rfc2821bis.xml'>
    <!ENTITY rfcUTF8 PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-eai-smtpext.xml'>
    <!ENTITY rfcHEAD PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-eai-utf8headers.xml'>
    <!ENTITY rfcUDSN PUBLIC ''
    'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-eai-dsn.xml'>
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>

<!-- no I-D - >
    <?rfc private="SPF draft" ?>
    <?rfc header="Sender Policy Framework" ?>
    <?rfc footer="draft-ellermann-spf-eai-01" ?>
<! - no I-D -->

<?rfc toc="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc strict="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc rfcprocack="yes" ?>

<rfc ipr="full3978" docName="draft-ellermann-spf-eai-01" category="exp">
<front>
    <title abbrev="SPF Email Address Internationalization">
        Sender Policy Framework: Email Address Internationalization
    </title>
    <author initials="F." surname="Ellermann" fullname="Frank Ellermann">
        <organization>xyzzy</organization>
        <address>
            <postal>
                <street></street>
                <city>Hamburg</city>
                <region>Germany</region>
            </postal>
            <email>hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com</email>
            <uri>http://purl.net/xyzzy/</uri>
        </address>
    </author>
    <date />
    <abstract><t>
        UTF8SMTP is an extension of SMTP (Simple Mail Transfer Protocol)
        allowing the use of UTF-8 in the SMTP envelope for EAI (Email
        Address Internationalization) and message headers. This memo
        discusses the consequences for SPF (Sender Policy Framework).
    </t></abstract>

    <note title="Editorial note"><t>
        This note, the <xref target="iana">IANA considerations</xref>,
        and the <xref target="history">document history</xref> should
        be removed before publication. The draft can be discussed on
        the SPF-Discuss mailing list.
    </t></note>
</front>

<middle>
    <section title="Introduction"><t>
        The keywords "MUST NOT" and "SHOULD" in this memo are to be
        interpreted as described in <xref target="RFC2119" />.  UTF-8
        is specified in <xref target="RFC3629" />.
    </t><t>
        Readers should be familiar with SMTP as specified in
        <xref target="I-D.klensin-rfc2821bis" />
        and the SPF terminology in <xref target="RFC4408" />.  An MTA
        is a Mail Transfer Agent, e.g., an SMTP relay.
    </t><t>
        For an EAI (Email Address Internationalization) overview see
        <xref target="RFC4592" />.
    </t></section>

    <section title="Background"><t>
        SMTP as specified in <xref target="I-D.klensin-rfc2821bis" />
        supports only ASCII addresses and LDH (letter, digit, hyphen)
        domain labels. The letters are ASCII letters;
        LDH-labels are also known as A-labels in the context of IDN
        (Internationalization of Domain Names) and <xref target="IDNAbis" />.

    </t><section title="SPF HELO identity"><t>
        In SMTP sessions after an SMTP EHLO command from the client the
        server response can indicate supported SMTP extensions.
        <xref target="I-D.ietf-eai-smtpext" /> specifies the UTF8SMTP
        extension.
    </t><t>
        The SMTP client can accept an offered UTF8SMTP extension by
        using one of the specified features, notably by the use of UTF-8
        in mailbox addresses of SMTP commands, by the use of alternative
        ASCII addresses in these commands, or by the use of UTF-8 in
        the message header for addresses and other purposes, i.e. by
        sending a "message/global" instead of a "message/rfc822" as
        specified in <xref target="I-D.ietf-eai-utf8headers" />.
    </t><t>
        Because UTF8SMTP support is indicated in the response to an
        EHLO command it cannot be used after HELO, and the SPF HELO
        identity is not affected by EAI:  The domain in a HELO
        or EHLO command consists of ordinary LDH-labels, or it is a
        domain literal.  For an empty reverse path, as it is used in
        non-delivery reports and other auto-replies, SPF fabricates a
        MAIL FROM identity based on the HELO identity with a case
        insensitive local part "postmaster"; this scenario is also not
        affected by EAI.
    </t><t>
        A domain consisting of LDH-labels including IDN A-labels
        beginning with "xn--" is an ordinary LDH-domain as far as
        DNS (Domain Name System), SPF, and UTF8SMTP are concerned.
        Apart from HELO and EHLO the only relevant SMTP command for
        SPF is the MAIL FROM command with the reverse path containing
        the envelope sender address (if it is not empty, see above).
        When the derived MAIL FROM identity is an ordinary address
        SPF can handle it as specified in <xref target="RFC4408" />.

    </t></section><section title="EAI MAIL FROM identity"><t>
        The interesting UTF8SMTP cases for SPF contain non-ASCII
        UTF-8 characters in the local part (left hand side) or the
        domain part (right hand side) of the MAIL FROM identity.
        Domain labels containing non-ASCII UTF-8 characters are also
        known as U-labels in IDN.
    </t><t>
        SPF checks typically make only sense at the "border MTA", and
        this is normally an MX (Mail eXchanger) of the receiver
        talking with a sender.  An MTA wishing to check the SPF sender
        policy against the IP of the sender fetches the sender policy
        for the domain in the HELO or MAIL FROM identity as DNS client
        of a server for the alleged sender. The SPF terminology might
        be confusing:  The border MTA is the SMTP server, but for the
        purpose of checking a sender policy it is the SPF (or rather
        DNS) client of a name server for the alleged sender with a
        given HELO or MAIL FROM identity.
    </t><t>
        An MTA could "downgrade" EAI MAIL FROM addresses using an
        optional alternative address given as UTF8SMTP MAIL-parameter.
        Where that happens the resulting new MAIL FROM address is an
        ordinary <xref target="I-D.klensin-rfc2821bis" />
        reverse path and can be handled as usual.
    </t><t>
        Skipping all ordinary cases as noted above the SPF client
        confronted with an EAI address in the MAIL FROM identity is
        generally an MTA supporting UTF8SMTP, and supposed to know how
        to transform U-labels into corresponding A-labels, e.g., because
        it might need to send a non-delivey report to the envelope
        sender address later; see <xref target="I-D.ietf-eai-dsn" />.
    </t><t>
        For agents trying post-SMTP SPF-checks this might be not the
        case, and unsurprisingly attempts to fetch the sender policy
        of a domain with U-labels "as is" will fail with SPF result
        NONE.  Arguably this is a broken setup, the border MTA should
        not offer and accept UTF8SMTP mails if critical parts behind
        it - not limited to the mailbox of the receiver - don't support
        EAI.
    </t></section><section title="Local parts" anchor="havoc"><t>
        Top down at this point the remaining SPF clients are supposed
        to know how to transform U-labels into A-labels, and fetch the
        SPF policy of the alleged sender.  SPF implementors and
        publishers of SPF sender policies should note that only the
        domain part of the MAIL FROM identity is transformed from
        U-labels into A-labels.  The local part MUST NOT be transformed,
        it is used "as is" in the construction of a &lt;target-name&gt;
        by SPF macro expansion involving local part macros.
    </t><t>
        SPF allows all octets in labels of a &lt;target-name&gt; excluding
        dots, which are supposed to separate labels.  Sender policies can
        directly talk about any &lt;domain-spec&gt; with labels separated
        by dots, where each label consists of 1 to 63 visible ASCII
        characters except "%" introducing macros.  The macros "%%" and
        "%_" in a &lt;domain-spec&gt; expand into "%" or space in the
        &lt;target-name&gt;, respectively.
    </t><t>
        Normally sender policies do not use such slightly odd labels, and
        the most extravagant case is "_" (underscore) in a label that is
        intentionally no LDH-label.  Nevertheless implementations have to
        support such oddities, because they are needed in the case of a
        &lt;target-name&gt; derived from a &lt;domain-spec&gt; using the
        local part macro.
    </t><t>
        For SMTP local parts can have two forms, &lt;Dot-string&gt; or
        &lt;Quoted-string&gt;.  A &lt;Dot-string&gt; consists of dot
        separated &lt;Atom&gt;s, each &lt;Atom&gt; consists of one or
        more &lt;atext&gt; characters.  Please note that an &lt;Atom&gt;
        is not the same as an LDH-label, it is also not the same as a
        domain label, e.g., &lt;Atom&gt;s can be longer than 63 octets.
    </t><t>
        By definition there are no leading, trailing, or adjacent dots
        in a &lt;Dot-string&gt;. <xref target="I-D.klensin-rfc2821bis" />
        and its predecessor recommend to avoid the &lt;Quoted-String&gt;
        form of a local part.  Current SPF implementations are known to
        strip the quotes from a &lt;Quoted-string&gt; for the purpose of
        determining a &lt;target-name&gt; derived from a
        &lt;domain-spec&gt; using the local part macro. This can result
        in an invalid &lt;target-name&gt; with leading, trailing, or
        adjacent dots, e.g., for a mail address "do..ts"@example.org.
    </t><t>
        Publishers of sender policies using the local part macro SHOULD
        make sure that the used pieces of valid local parts in their
        domain can be parsed into non-empty domain labels; one way to
        achieve this is to avoid &lt;Quoted-string&gt;.  The effect of
        a &lt;Quoted-string&gt; local part is not clearly specified in
        <xref target="RFC4408" />.  In theory DNS supports any octet,
        even "embedded" dots within a label.  In practice current SPF
        implementations cannot handle embedded dots, and it is far from
        clear that quoted pairs introduced by a "\" (backslash) in a
        &lt;Quoted-string&gt; are interpreted as specified in
        <xref target="I-D.klensin-rfc2821bis" /> section 4.1.2.
    </t><t>
        Publishers of sender policies using the local part macro SHOULD
        make sure that the used pieces of valid local parts in their
        domain result in 1 to 63 octets per dot separated domain label
        as mentioned in <xref target="RFC4408" /> section 8.1.  Please
        note that the truncation of longer labels after macro expansion
        is not clearly specified:  SPF implementations could truncate
        longer labels left to right or right to left, they could also
        ignore affected directives, or treat this case as error.
    </t><t>
        Publishers of sender policies using the local part macro need
        to be aware that ASCII letters in the used pieces of valid local
        parts in their domain are in essence treated as case-insensitive
        by DNS as explained in <xref target="RFC4343" />.
    </t><t>
        UTF8SMTP extends &lt;atext&gt; by &lt;UTF8-non-ascii&gt;, and
        it also permits &lt;UTF8-non-ascii&gt; in quoted strings.  As
        far as SPF is concerned &lt;UTF8-non-ascii&gt; can result in
        non-ASCII octets in a &lt;target-name&gt;, working "as is" in
        DNS labels with similar caveats as noted above with respect to
        the length of labels, case sensitivity, and normalization.
    </t><t>
        <xref target="I-D.ietf-eai-utf8headers" /> suggests to restrict
        the use of UTF-8 in EAI addresses to Normalization Form C (NFC)
        as recommended in <xref target="RFC5198" />.  Publishers of
        sender policies using the local part macro need to be aware
        that SPF implementations treat local parts "as is".  Mapping
        different forms of an EAI local part to one mailbox at their
        border MTAs has no effect on different forms of EAI local parts
        in DNS queries.  A straight forward strategy to avoid potential
        issues with respect to SPF is to use local part macros only in
        non-critical explanations and maybe for logging, if at all.
    </t></section></section>

    <section title="Considerations for policy publishers" anchor="spf"><t>
        Policy publishers should know that this memo does not update
        <xref target="RFC4408" />, in theory EAI is compatible with
        SPF.  It is not possible to use U-labels in sender policies
        directly, they have to be transformed into the corresponding
        A-labels.  Likewise U-labels in UTF8SMTP MAIL FROM addresses
        are transformed into A-labels for the purposes of SPF by
        implementations supporting EAI.
    </t><t>
        SPF implementations not supporting U-labels in MAIL FROM
        identities will return NONE instead of the intended result,
        e.g., PASS or FAIL.  UTF8SMTP senders wishing to avoid this
        problem could transform MAIL FROM U-labels into A-labels on
        their side.  They could also hope that spammers forging MAIL
        FROM identities will not abuse IDN U-labels in the near
        future, and that most SPF implementations will be updated
        before this changes.  Unfortunately experience has shown that
        spammers learn faster than lazy users.
    </t><t>
        MAIL FROM identities using only A-labels, with or without
        UTF-8 in the local part, work "as is" for the purposes of
        SPF.  HELO identities consist either of A-labels, are domain
        literals and irrelevant for SPF, or are syntactically malformed
        as far as UTF8SMTP and SPF are concerned.  SPF does not
        specify how receivers should handle SMTP syntax errors.
    </t><t>
        UTF8SMTP allows to specify an optional alternative address
        in the traditional syntax.  Receivers are free to check SPF
        also or only based on the alternative address.  Obviously a
        sender policy for the alternative address should permit the
        same sending IPs as the sender policy for the EAI address,
        and one simple way to achieve this is to use corresponding
        A-labels in an alternative address yielding one SPF sender
        policy for both addresses.  Please note that this is not
        required by UTF8SMTP, it permits to use unrelated domains
        with different policies.  Clearly if some IPs permitted for
        one address fail for the other address, or vice versa, the
        sender will have problems, if the affected IPs are actually
        used to send mails.
    </t></section>

    <section title="Received-SPF"><t>
        The Received-SPF header field in <xref target="RFC4408" />
        section 7 is a "message/rfc822" trace header field.
    </t><t>
        UTF8SMTP transports a "message/global" as specified in
        <xref target="I-D.ietf-eai-utf8headers" /> permitting the
        use of UTF-8 in header fields.  For Received-SPF this is
        necessary to record an UTF8SMTP &lt;envelope-from&gt;.
    </t><t>
        UTF-8 might be also needed in comments and other parts of
        this header field in conjunction with UTF8SMTP.  See
        <xref target="I-D.ietf-eai-utf8headers" /> for the
        corresponding syntax modifications.
    </t><t>
        SPF implementations could check the EAI MAIL FROM
        <spanx style="strong">and</spanx> an alternative address
        (if given).  In this case SPF implementations SHOULD
        record both results.  An exception could be to omit a
        "less interesting" (e.g., equivalent, NONE, or NEUTRAL)
        result.  Receivers could also adopt the strategy to check
        the second of two addresses only if the result of their
        first check is "not helpful" (e.g., NONE or NEUTRAL).
    </t></section>

    <section title="Acknowledgements"><t>
        Various folks on the SPF discuss and devel mailing lists worked
        on the <xref target="RFC4408" /> errata and the SPF test suite.
        All obscure issues with local parts discussed in this memo are
        based on their prior work and indirectly related discussions
        on the EAI and SMTP mailing lists.
    </t><t>
        Thanks to
        Alessandro Vesely,
        Boyd Lynn Gerber,
        Julian Mehnle,
        Scott Kitterman,
        Stuart D. Gathman,
        and Wayne Schlitt for their encouragement or feedback.
    </t></section>

    <section title="Security Considerations"><t>
        When UTF8SMTP senders use a different domain in the optional
        alternative MAIL FROM address, and the corresponding sender
        policy is also different, it is hard to predict which policy
        will be checked, if any, depending on the route to the
        receiver and other factors.  Different results can be hard
        to diagnose, e.g., if a mail from the same sender to the
        same receiver sometimes results in PASS and at other times
        in FAIL.  One proposal to mitigate this problem is discussed
        in <xref target="spf" />.
    </t><t>
        Not all SPF implementations will already support U-labels as
        explained in this memo.  Senders could transform U-labels in
        MAIL FROM commands to A-labels on their side where this is a
        problem.
    </t><t>
        Using the SPF local part macro in conjunction with EAI is not
        intuitive, local parts are not transformed to A-labels.  This
        is no new problem, but in conjunction with EAI it is more
        likely.  The details with respect to label lengths, quoted
        strings, adjacent dots, and quoted pairs are explained in
        <xref target="havoc" />.  Not using quoted strings in local
        parts is recommended in <xref target="I-D.klensin-rfc2821bis" />.
    </t><t>
        A new UTF8SMTP problem is the use of local part macros for the
        construction of "per user" policies, when different variations
        of an UTF-8 local part correspond to one user mailbox.  One way
        to address this problem is to avoid this use of the local part
        macro as discussed in <xref target="havoc" />.
    </t><t>
        UTF8SMTP servers can be forced to send non-delivery reports
        to forged envelope sender addresses, if some receiver mailboxes
        can handle EAI mails, while others can't, and the server has no
        way to "downgrade" mails to traditional receivers.  Hopefully a
        future SMTP extension will allow a kind of "selective reject"
        mechanism.  Publishing SPF PASS or FAIL policies, and rejecting
        FAIL at the border MTA, would eliminate this problem.  Similarly
        non-delivery reports after a PASS cannot hit innocent bystanders.
    </t><t>
        Evaluating PRA (Purported Responsible Address) policies as
        specified in <xref target="RFC4406" /> with SPF or vice versa
        can cause havoc, as the algorithms are semantically different
        even when the policies are otherwise syntactically identical.
        This known problem is discussed in <xref target="RFC4408" />.
    </t></section>

    <section title="IANA Considerations" anchor="iana"><t>
        Keep up the good work, nothing to do here.
    </t></section>
</middle>

<back>
    <references title="Normative References">
        &rfc2119;
        &rfc3629;
        &rfc4408;
        &rfcSMTP;
        &rfcUTF8;
        &rfcHEAD;
    </references>
    <references title="Informative References">
        &rfc4343;
        &rfc4592;
        &rfc4406;
        &rfc5198;
        &rfcUDSN;

        <reference anchor='IDNAbis' target='http://tools.ietf.org/wg/idnabis'>
        <front>
            <title>Internationalized Domain Names in Applications (Revised)</title>
            <author><organization> IETF </organization></author>
            <date month="April" year='2008' />
        </front>
        </reference>
    </references>
    <section title="Document History" anchor="history"><t>
        Changes in version 01:
    </t><t>
        <list style="symbols"><t>
            Extended the local part discussion in <xref target="havoc" />
            significantly to cover known gaps in the SPF specification.
            Added potential issues with non-normalized local parts.
        </t><t>
            Added a Received-SPF section using the now "Last Called"
            <xref target="I-D.ietf-eai-utf8headers" />.  Added references
            to <xref target="RFC4343" />, <xref target="RFC4592" />,
            <xref target="RFC5198" />, and the approved
            <xref target="I-D.ietf-eai-dsn" />.
        </t><t>
            All referenced drafts are hopefully near to approval.
        </t><t>
            Removed the discussion of two vs. five SPF "subtypes", this
            belongs into another draft.  Kept the caveat about using an
            algorithm designed for subtype x on a policy with subtype y
            (or v.v.), as this could cause hard to debug mail losses.
        </t></list>
    </t><t>
        Initial version:
    </t><t>
        <list style="symbols"><t>
            In a short discussion on the EAI list Harald Alvestrand and
            John Klensin encouraged to collect SPF EAI considerations
            in a separate memo.
        </t></list>
    </t></section>
</back></rfc>
