Sender Policy Framework: Email Address Internationalization

Sender Policy Framework: Email Address Internationalization xyzzy

Hamburg Germany hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com http://purl.net/xyzzy/

UTF8SMTP is an extension of SMTP (Simple Mail Transfer Protocol) allowing the use of UTF-8 in the SMTP envelope for EAI (Email Address Internationalization) and message headers. This memo discusses the consequences for SPF (Sender Policy Framework). This note, the IANA considerations, and the document history should be removed before publication. The draft can be discussed on the SPF-Discuss mailing list.

The keywords "MUST NOT" and "SHOULD" in this memo are to be interpreted as described in . UTF-8 is specified in . Readers should be familiar with SMTP as specified in and the SPF terminology in . An MTA is a Mail Transfer Agent, e.g., an SMTP relay. For an EAI (Email Address Internationalization) overview see .

SMTP as specified in supports only ASCII addresses and LDH (letter, digit, hyphen) domain labels. The letters are ASCII letters; LDH-labels are also known as A-labels in the context of IDN (Internationalization of Domain Names) and .

In SMTP sessions after an SMTP EHLO command from the client the server response can indicate supported SMTP extensions. specifies the UTF8SMTP extension. The SMTP client can accept an offered UTF8SMTP extension by using one of the specified features, notably by the use of UTF-8 in mailbox addresses of SMTP commands, by the use of alternative ASCII addresses in these commands, or by the use of UTF-8 in the message header for addresses and other purposes, i.e. by sending a "message/global" instead of a "message/rfc822" as specified in . Because UTF8SMTP support is indicated in the response to an EHLO command it cannot be used after HELO, and the SPF HELO identity is not affected by EAI: The domain in a HELO or EHLO command consists of ordinary LDH-labels, or it is a domain literal. For an empty reverse path, as it is used in non-delivery reports and other auto-replies, SPF fabricates a MAIL FROM identity based on the HELO identity with a case insensitive local part "postmaster"; this scenario is also not affected by EAI. A domain consisting of LDH-labels including IDN A-labels beginning with "xn--" is an ordinary LDH-domain as far as DNS (Domain Name System), SPF, and UTF8SMTP are concerned. Apart from HELO and EHLO the only relevant SMTP command for SPF is the MAIL FROM command with the reverse path containing the envelope sender address (if it is not empty, see above). When the derived MAIL FROM identity is an ordinary address SPF can handle it as specified in .

The interesting UTF8SMTP cases for SPF contain non-ASCII UTF-8 characters in the local part (left hand side) or the domain part (right hand side) of the MAIL FROM identity. Domain labels containing non-ASCII UTF-8 characters are also known as U-labels in IDN. SPF checks typically make only sense at the "border MTA", and this is normally an MX (Mail eXchanger) of the receiver talking with a sender. An MTA wishing to check the SPF sender policy against the IP of the sender fetches the sender policy for the domain in the HELO or MAIL FROM identity as DNS client of a server for the alleged sender. The SPF terminology might be confusing: The border MTA is the SMTP server, but for the purpose of checking a sender policy it is the SPF (or rather DNS) client of a name server for the alleged sender with a given HELO or MAIL FROM identity. An MTA could "downgrade" EAI MAIL FROM addresses using an optional alternative address given as UTF8SMTP MAIL-parameter. Where that happens the resulting new MAIL FROM address is an ordinary reverse path and can be handled as usual. Skipping all ordinary cases as noted above the SPF client confronted with an EAI address in the MAIL FROM identity is generally an MTA supporting UTF8SMTP, and supposed to know how to transform U-labels into corresponding A-labels, e.g., because it might need to send a non-delivey report to the envelope sender address later; see . For agents trying post-SMTP SPF-checks this might be not the case, and unsurprisingly attempts to fetch the sender policy of a domain with U-labels "as is" will fail with SPF result NONE. Arguably this is a broken setup, the border MTA should not offer and accept UTF8SMTP mails if critical parts behind it - not limited to the mailbox of the receiver - don't support EAI.

Top down at this point the remaining SPF clients are supposed to know how to transform U-labels into A-labels, and fetch the SPF policy of the alleged sender. SPF implementors and publishers of SPF sender policies should note that only the domain part of the MAIL FROM identity is transformed from U-labels into A-labels. The local part MUST NOT be transformed, it is used "as is" in the construction of a <target-name> by SPF macro expansion involving local part macros. SPF allows all octets in labels of a <target-name> excluding dots, which are supposed to separate labels. Sender policies can directly talk about any <domain-spec> with labels separated by dots, where each label consists of 1 to 63 visible ASCII characters except "%" introducing macros. The macros "%%" and "%_" in a <domain-spec> expand into "%" or space in the <target-name>, respectively. Normally sender policies do not use such slightly odd labels, and the most extravagant case is "_" (underscore) in a label that is intentionally no LDH-label. Nevertheless implementations have to support such oddities, because they are needed in the case of a <target-name> derived from a <domain-spec> using the local part macro. For SMTP local parts can have two forms, <Dot-string> or <Quoted-string>. A <Dot-string> consists of dot separated <Atom>s, each <Atom> consists of one or more <atext> characters. Please note that an <Atom> is not the same as an LDH-label, it is also not the same as a domain label, e.g., <Atom>s can be longer than 63 octets. By definition there are no leading, trailing, or adjacent dots in a <Dot-string>. and its predecessor recommend to avoid the <Quoted-String> form of a local part. Current SPF implementations are known to strip the quotes from a <Quoted-string> for the purpose of determining a <target-name> derived from a <domain-spec> using the local part macro. This can result in an invalid <target-name> with leading, trailing, or adjacent dots, e.g., for a mail address "do..ts"@example.org. Publishers of sender policies using the local part macro SHOULD make sure that the used pieces of valid local parts in their domain can be parsed into non-empty domain labels; one way to achieve this is to avoid <Quoted-string>. The effect of a <Quoted-string> local part is not clearly specified in . In theory DNS supports any octet, even "embedded" dots within a label. In practice current SPF implementations cannot handle embedded dots, and it is far from clear that quoted pairs introduced by a "\" (backslash) in a <Quoted-string> are interpreted as specified in section 4.1.2. Publishers of sender policies using the local part macro SHOULD make sure that the used pieces of valid local parts in their domain result in 1 to 63 octets per dot separated domain label as mentioned in section 8.1. Please note that the truncation of longer labels after macro expansion is not clearly specified: SPF implementations could truncate longer labels left to right or right to left, they could also ignore affected directives, or treat this case as error. Publishers of sender policies using the local part macro need to be aware that ASCII letters in the used pieces of valid local parts in their domain are in essence treated as case-insensitive by DNS as explained in . UTF8SMTP extends <atext> by <UTF8-non-ascii>, and it also permits <UTF8-non-ascii> in quoted strings. As far as SPF is concerned <UTF8-non-ascii> can result in non-ASCII octets in a <target-name>, working "as is" in DNS labels with similar caveats as noted above with respect to the length of labels, case sensitivity, and normalization. suggests to restrict the use of UTF-8 in EAI addresses to Normalization Form C (NFC) as recommended in . Publishers of sender policies using the local part macro need to be aware that SPF implementations treat local parts "as is". Mapping different forms of an EAI local part to one mailbox at their border MTAs has no effect on different forms of EAI local parts in DNS queries. A straight forward strategy to avoid potential issues with respect to SPF is to use local part macros only in non-critical explanations and maybe for logging, if at all.

Policy publishers should know that this memo does not update , in theory EAI is compatible with SPF. It is not possible to use U-labels in sender policies directly, they have to be transformed into the corresponding A-labels. Likewise U-labels in UTF8SMTP MAIL FROM addresses are transformed into A-labels for the purposes of SPF by implementations supporting EAI. SPF implementations not supporting U-labels in MAIL FROM identities will return NONE instead of the intended result, e.g., PASS or FAIL. UTF8SMTP senders wishing to avoid this problem could transform MAIL FROM U-labels into A-labels on their side. They could also hope that spammers forging MAIL FROM identities will not abuse IDN U-labels in the near future, and that most SPF implementations will be updated before this changes. Unfortunately experience has shown that spammers learn faster than lazy users. MAIL FROM identities using only A-labels, with or without UTF-8 in the local part, work "as is" for the purposes of SPF. HELO identities consist either of A-labels, are domain literals and irrelevant for SPF, or are syntactically malformed as far as UTF8SMTP and SPF are concerned. SPF does not specify how receivers should handle SMTP syntax errors. UTF8SMTP allows to specify an optional alternative address in the traditional syntax. Receivers are free to check SPF also or only based on the alternative address. Obviously a sender policy for the alternative address should permit the same sending IPs as the sender policy for the EAI address, and one simple way to achieve this is to use corresponding A-labels in an alternative address yielding one SPF sender policy for both addresses. Please note that this is not required by UTF8SMTP, it permits to use unrelated domains with different policies. Clearly if some IPs permitted for one address fail for the other address, or vice versa, the sender will have problems, if the affected IPs are actually used to send mails.

The Received-SPF header field in section 7 is a "message/rfc822" trace header field. UTF8SMTP transports a "message/global" as specified in permitting the use of UTF-8 in header fields. For Received-SPF this is necessary to record an UTF8SMTP <envelope-from>. UTF-8 might be also needed in comments and other parts of this header field in conjunction with UTF8SMTP. See for the corresponding syntax modifications. SPF implementations could check the EAI MAIL FROM and an alternative address (if given). In this case SPF implementations SHOULD record both results. An exception could be to omit a "less interesting" (e.g., equivalent, NONE, or NEUTRAL) result. Receivers could also adopt the strategy to check the second of two addresses only if the result of their first check is "not helpful" (e.g., NONE or NEUTRAL).

Various folks on the SPF discuss and devel mailing lists worked on the errata and the SPF test suite. All obscure issues with local parts discussed in this memo are based on their prior work and indirectly related discussions on the EAI and SMTP mailing lists. Thanks to Alessandro Vesely, Boyd Lynn Gerber, Julian Mehnle, Scott Kitterman, Stuart D. Gathman, and Wayne Schlitt for their encouragement or feedback.

When UTF8SMTP senders use a different domain in the optional alternative MAIL FROM address, and the corresponding sender policy is also different, it is hard to predict which policy will be checked, if any, depending on the route to the receiver and other factors. Different results can be hard to diagnose, e.g., if a mail from the same sender to the same receiver sometimes results in PASS and at other times in FAIL. One proposal to mitigate this problem is discussed in . Not all SPF implementations will already support U-labels as explained in this memo. Senders could transform U-labels in MAIL FROM commands to A-labels on their side where this is a problem. Using the SPF local part macro in conjunction with EAI is not intuitive, local parts are not transformed to A-labels. This is no new problem, but in conjunction with EAI it is more likely. The details with respect to label lengths, quoted strings, adjacent dots, and quoted pairs are explained in . Not using quoted strings in local parts is recommended in . A new UTF8SMTP problem is the use of local part macros for the construction of "per user" policies, when different variations of an UTF-8 local part correspond to one user mailbox. One way to address this problem is to avoid this use of the local part macro as discussed in . UTF8SMTP servers can be forced to send non-delivery reports to forged envelope sender addresses, if some receiver mailboxes can handle EAI mails, while others can't, and the server has no way to "downgrade" mails to traditional receivers. Hopefully a future SMTP extension will allow a kind of "selective reject" mechanism. Publishing SPF PASS or FAIL policies, and rejecting FAIL at the border MTA, would eliminate this problem. Similarly non-delivery reports after a PASS cannot hit innocent bystanders. Evaluating PRA (Purported Responsible Address) policies as specified in with SPF or vice versa can cause havoc, as the algorithms are semantically different even when the policies are otherwise syntactically identical. This known problem is discussed in .

Keep up the good work, nothing to do here.

&rfc2119; &rfc3629; &rfc4408; &rfcSMTP; &rfcUTF8; &rfcHEAD; &rfc4343; &rfc4592; &rfc4406; &rfc5198; &rfcUDSN; Internationalized Domain Names in Applications (Revised) IETF

Changes in version 01: Extended the local part discussion in significantly to cover known gaps in the SPF specification. Added potential issues with non-normalized local parts. Added a Received-SPF section using the now "Last Called" . Added references to , , , and the approved . All referenced drafts are hopefully near to approval. Removed the discussion of two vs. five SPF "subtypes", this belongs into another draft. Kept the caveat about using an algorithm designed for subtype x on a policy with subtype y (or v.v.), as this could cause hard to debug mail losses. Initial version: In a short discussion on the EAI list Harald Alvestrand and John Klensin encouraged to collect SPF EAI considerations in a separate memo.