<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc iprnotified="no" ?>
<?rfc sortrefs="yes"?>
<?rfc strict="yes"?>
<?rfc symrefs="yes"?>
<?rfc toc="yes"?>
<?rfc tocdepth="3"?>
<?rfc rfcedstyle="yes"?>

<rfc category="std" ipr="trust200902" docName="draft-ietf-precis-7564bis-02" obsoletes="7564">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

  <front>
    <title abbrev="PRECIS Framework">PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols</title>

    <author initials="P." surname="Saint-Andre" fullname="Peter Saint-Andre">
      <organization>Filament</organization>
      <address>
        <email>peter@filament.com</email>
        <uri>https://filament.com/</uri>
      </address>
    </author>

    <author initials="M." surname="Blanchet" fullname="Marc Blanchet">
      <organization>Viagenie</organization>
      <address>
        <postal>
          <street>246 Aberdeen</street>
          <city>Quebec</city>
          <region>QC</region>
          <code>G1R 2E1</code>
          <country>Canada</country>
        </postal>
        <email>Marc.Blanchet@viagenie.ca</email>
        <uri>http://www.viagenie.ca/</uri>
      </address>
    </author>

    <date/>

    <keyword>internationalization</keyword>
    <keyword>i18n</keyword>
    <keyword>Stringprep</keyword>

    <abstract>
      <t>Application protocols using Unicode characters in protocol strings
      need to properly handle such strings in order to enforce
      internationalization rules for strings placed in various protocol slots
      (such as addresses and identifiers) and to perform valid comparison
      operations (e.g., for purposes of authentication or authorization).
      This document defines a framework enabling application protocols to
      perform the preparation, enforcement, and comparison of
      internationalized strings ("PRECIS") in a way that depends on the
      properties of Unicode characters and thus is agile with respect to
      versions of Unicode.  As a result, this framework provides a more
      sustainable approach to the handling of internationalized strings than
      the previous framework, known as Stringprep (RFC 3454).  This document
      obsoletes RFC 7564.</t>
    </abstract>

  </front>

  <middle>

    <section title="Introduction" anchor='intro'>
      <t>Application protocols using Unicode characters <xref
      target='Unicode'/> in protocol strings need to properly handle such
      strings in order to enforce internationalization rules for strings
      placed in various protocol slots (such as addresses and identifiers) and
      to perform valid comparison operations (e.g., for purposes of
      authentication or authorization).  This document defines a framework
      enabling application protocols to perform the preparation, enforcement,
      and comparison of internationalized strings ("PRECIS") in a way that
      depends on the properties of Unicode characters and thus is agile with
      respect to versions of Unicode.</t>

      <t>As described in the PRECIS problem statement <xref
      target='RFC6885'/>, many IETF protocols have used the Stringprep
      framework <xref target='RFC3454'/> as the basis for preparing,
      enforcing, and comparing protocol strings that contain Unicode
      characters, especially characters outside the ASCII range <xref
      target='RFC20'/>.  The Stringprep framework was developed during work on
      the original technology for internationalized domain names (IDNs), here
      called "IDNA2003" <xref target='RFC3490'/>, and Nameprep <xref
      target="RFC3491"/> was the Stringprep profile for IDNs.  At the time,
      Stringprep was designed as a general framework so that other application
      protocols could define their own Stringprep profiles.  Indeed, a number
      of application protocols defined such profiles.</t>

      <t>After the publication of <xref target='RFC3454'/> in 2002, several
      significant issues arose with the use of Stringprep in the IDN case, as
      documented in the IAB's recommendations regarding IDNs <xref
      target='RFC4690'/> (most significantly, Stringprep was tied to Unicode
      version 3.2).  Therefore, the newer IDNA specifications, here called
      "IDNA2008" (<xref target='RFC5890'/>, <xref target='RFC5891'/>, <xref
      target='RFC5892'/>, <xref target='RFC5893'/>, <xref target='RFC5894'/>),
      no longer use Stringprep and Nameprep.  This migration away from
      Stringprep for IDNs prompted other "customers" of Stringprep to consider
      new approaches to the preparation, enforcement, and comparison of
      internationalized strings, as described in <xref target='RFC6885'/>.</t>

      <t>This document defines a framework for a post-Stringprep approach to
      the preparation, enforcement, and comparison of internationalized
      strings in application protocols, based on several principles:</t>

      <t>
        <list style='numbers'>
          <t>Define a small set of string classes that specify the Unicode
          characters (i.e., specific "code points") appropriate for common
          application protocol constructs.</t>

          <t>Define each PRECIS string class in terms of Unicode code points
          and their properties so that an algorithm can be used to determine
          whether each code point or character category is (a)&nbsp;valid,
          (b) allowed in certain contexts, (c) disallowed, or
          (d)&nbsp;unassigned.</t>

          <t>Use an "inclusion model" such that a string class consists only
          of code points that are explicitly allowed, with the result that any
          code point not explicitly allowed is forbidden.</t>

          <t>Enable application protocols to define profiles of the PRECIS
          string classes if necessary (addressing matters such as width
          mapping, case mapping, Unicode normalization, and directionality)
          but strongly discourage the multiplication of profiles beyond
          necessity in order to avoid violations of the "Principle of Least
          Astonishment".</t>

        </list>
      </t>
      <t>It is expected that this framework will yield the following
      benefits:</t>

      <t>
        <list style="symbols">
          <t>Application protocols will be agile with regard to Unicode
          versions.</t>

          <t>Implementers will be able to share code point tables and software
          code across application protocols, most likely by means of software
          libraries.</t>

          <t>End users will be able to acquire more accurate expectations
          about the characters that are acceptable in various contexts.  Given
          this more uniform set of string classes, it is also expected that
          copy/paste operations between software implementing different
          application protocols will be more predictable and coherent.</t>

        </list>
      </t>
      <t>Whereas the string classes define the "baseline" code points for a
      range of applications, profiling enables application protocols to apply
      the string classes in ways that are appropriate for common constructs
      such as usernames <xref target='RFC7613'/>, opaque
      strings such as passwords <xref target='RFC7613'/>,
      and nicknames <xref target='RFC7700'/>.  Profiles are
      responsible for defining the handling of right-to-left characters as
      well as various mapping operations of the kind also discussed for IDNs
      in <xref target='RFC5895'/>, such as case preservation or lowercasing,
      Unicode normalization, mapping of certain characters to other characters
      or to nothing, and mapping of fullwidth and halfwidth characters.</t>

      <t>When an application applies a profile of a PRECIS string class, it
      transforms an input string (which might or might not be conforming) into
      an output string that definitively conforms to the profile.  In
      particular, this document focuses on the resulting ability to achieve
      the following objectives:</t>

      <t>
        <list style='letters'>
          <t>Enforcing all the rules of a profile for a single output
          string (e.g., to determine if a string can be included in a protocol
          slot, communicated to another entity within a protocol, stored in a
          retrieval system, etc.).</t>

          <t>Comparing two output strings to determine if they are equivalent,
          typically through octet-for-octet matching to test for
          "bit&nbhy;string identity" (e.g., to make an access decision for
          purposes of authentication or authorization as further described
          in <xref target='RFC6943'/>).</t>

        </list>
      </t>
      <t>The opportunity to define profiles naturally introduces the
      possibility of a proliferation of profiles, thus potentially mitigating
      the benefits of common code and violating user expectations.  See <xref
      target='profiles'/> for a discussion of this important topic.</t>

      <t>In addition, it is extremely important for protocol designers and
      application developers to understand that the transformation of an input
      string to an output string is rarely reversible.  As one relatively
      simple example, case mapping would transform an input string of
      "StPeter" to "stpeter", and information about the capitalization of the
      first and third characters would be lost.  Similar considerations apply
      to other forms of mapping and normalization.</t>

      <t>Although this framework is similar to IDNA2008 and includes by
      reference some of the character categories defined in <xref
      target='RFC5892'/>, it defines additional character categories to meet
      the needs of common application protocols other than DNS.</t>

      <t>The character categories and calculation rules defined under
      Sections&nbsp;<xref target="PropertyCalculation" format="counter"/>
      and <xref target="categories" format="counter" /> are normative and
      apply to all Unicode code points.  The code point table that
      results from applying the character categories and calculation
      rules to the latest version of Unicode can be found in an IANA
      registry.</t>

    </section>

    <section title="Terminology" anchor="terms">
      <t>Many important terms used in this document are defined in <xref
      target='RFC5890'/>, <xref target='RFC6365'/>, <xref target='RFC6885'/>,
      and <xref target='Unicode'/>.  The terms "left-to-right" (LTR) and
      "right-to-left" (RTL) are defined in Unicode Standard Annex #9 <xref
      target='UAX9'/>.</t>

      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in <xref
      target='RFC2119'/>.</t>

    </section>

  <section title="Preparation, Enforcement, and Comparison" anchor="precis">
    <t>This document distinguishes between three different actions that an
    entity can take with regard to a string:</t>

    <t>
      <list style='symbols'>
        <t>Enforcement entails applying all of the rules specified for a
        particular string class or profile thereof to an individual string,
        for the purpose of determining if the string can be used in a given
        protocol slot.</t>

        <t>Comparison entails applying all of the rules specified for a
        particular string class or profile thereof to two separate strings,
        for the purpose of determining if the two strings are equivalent.</t>

        <t>Preparation primarily entails ensuring that the characters in an
        individual string are allowed by the underlying PRECIS string
        class, and sometimes also entails applying one or more of the rules 
        specified for a particular string class or profile thereof.
        Preparation can be appropriate for constrained devices that can
        to some extent restrict the characters in a string to a limited
        repertoire but that do not have the processing power or onboard 
        memory to perform operations such as Unicode normalization. 
        However, preparation does not ensure that an input string conforms
        to all of the rules for a string class or profile thereof.

        <list style='empty'><t>Note: The term "preparation" as used in
        this specification and related documents has a much more limited 
        scope than it did in Stringprep; it essentially refers to a kind
        of preprocessing of an input string, not the actual operations 
        that apply internationalization rules to produce an output string
        (here termed "enforcement") or to compare two output strings (here
        termed "comparison").</t></list>

        </t>

      </list>
    </t>
    <t>In most cases, authoritative entities such as servers are responsible
    for enforcement, whereas subsidiary entities such as clients are
    responsible only for preparation.  The rationale for this distinction is
    that clients might not have the facilities (in terms of device memory and
    processing power) to enforce all the rules regarding internationalized
    strings (such as width mapping and Unicode normalization), although they
    can more easily limit the repertoire of characters they offer to an end
    user.  By contrast, it is assumed that a server would have more capacity
    to enforce the rules, and in any case acts as an authority regarding
    allowable strings in protocol slots such as addresses and endpoint
    identifiers.  In addition, a client cannot necessarily be trusted to
    properly generate such strings, especially for security-sensitive contexts
    such as authentication and authorization.</t>

  </section>

    <section title="String Classes" anchor='classes'>

      <section title="Overview" anchor='classes-overview'>
        <t>Starting in 2010, various "customers" of Stringprep began to
        discuss the need to define a post-Stringprep approach to the
        preparation and comparison of internationalized strings other than
        IDNs.  This community analyzed the existing Stringprep profiles and
        also weighed the costs and benefits of defining a relatively small set
        of Unicode characters that would minimize the potential for user
        confusion caused by visually similar characters (and thus be
        relatively "safe") vs. defining a much larger set of Unicode
        characters that would maximize the potential for user creativity (and
        thus be relatively "expressive").  As a result, the community
        concluded that most existing uses could be addressed by two string
        classes:</t>

        <t>
          <list style='hanging'>
            <t hangText="IdentifierClass:">a sequence of letters, numbers, and
            some symbols that is used to identify or address a network entity
            such as a user account, a venue (e.g., a chatroom), an information
            source (e.g., a data feed), or a collection of data (e.g., a
            file); the intent is that this class will minimize user confusion
            in a wide variety of application protocols, with the result that
            safety has been prioritized over expressiveness for this
            class.</t>

            <t hangText="FreeformClass:">a sequence of letters, numbers,
            symbols, spaces, and other characters that is used for free-form
            strings, including passwords as well as display elements such as
            human-friendly nicknames for devices or for participants in a
            chatroom; the intent is that this class will allow nearly any
            Unicode character, with the result that expressiveness has been
            prioritized over safety for this class.  Note well that protocol
            designers, application developers, service providers, and end
            users might not understand or be able to enter all of the
            characters that can be included in the FreeformClass -- see <xref
            target='security-freeformclass'/> for details.</t>

          </list>
        </t>
        <t>Future specifications might define additional PRECIS string
        classes, such as a class that falls somewhere between the
        IdentifierClass and the FreeformClass.  At this time, it is not clear
        how useful such a class would be.  In any case, because application
        developers are able to define profiles of PRECIS string classes, a
        protocol needing a construct between the IdentifierClass and the
        FreeformClass could define a restricted profile of the FreeformClass
        if needed.</t> 

        <t>The following subsections discuss the IdentifierClass and
        FreeformClass in more detail, with reference to the dimensions
        described in Section 5 of <xref target='RFC6885'/>.  Each string
        class is defined by the following behavioral rules:</t>

        <t>
          <list style='hanging'>
            <t hangText='Valid:'>Defines which code points are treated as
            valid for the string.</t>

            <t hangText='Contextual Rule Required:'>Defines which code points
            are treated as allowed only if the requirements of a contextual
            rule are met (i.e., either CONTEXTJ or CONTEXTO).</t>

            <t hangText='Disallowed:'>Defines which code points need to be
            excluded from the string.</t>

            <t hangText='Unassigned:'>Defines application behavior in the
            presence of code points that are unknown (i.e., not yet
            designated) for the version of Unicode used by the
            application.</t>

          </list>
        </t>
        <t>This document defines the valid, contextual rule required,
        disallowed, and unassigned rules for the IdentifierClass and
        FreeformClass.  As described under <xref target='profiles'/>, profiles
        of these string classes are responsible for defining the width
        mapping, additional mappings, case mapping, normalization, and
        directionality rules.</t>

      </section>

      <section title="IdentifierClass" anchor="classes-id">
        <t>Most application technologies need strings that can be used to
        refer to, include, or communicate protocol strings like usernames,
        filenames, data feed identifiers, and chatroom names.  We group such
        strings into a class called "IdentifierClass" having the following
        features.</t>

        <section title='Valid' anchor='classes-id-valid'>
          <t>
            <list style='symbols'>
              <t>Code points traditionally used as letters and numbers in
              writing systems, i.e., the LetterDigits ("A") category first
              defined in <xref target='RFC5892'/> and listed here under <xref
              target='A'/>.</t>

              <t>Code points in the range U+0021 through U+007E, i.e., the
              (printable) ASCII7 ("K") category defined under <xref target='K'/>.
              These code points are "grandfathered" into PRECIS and thus are
              valid even if they would otherwise be disallowed according to
              the property-based rules specified in the next section.</t>

            </list>
          </t>
          <t><list style='empty'><t>Note: Although the PRECIS IdentifierClass
          reuses the LetterDigits category from IDNA2008, the range of
          characters allowed in the IdentifierClass is wider than the range of
          characters allowed in IDNA2008.  The main reason is that IDNA2008
          applies the Unstable category before the LetterDigits category, thus
          disallowing uppercase characters, whereas the IdentifierClass does
          not apply the Unstable category.</t></list></t>

        </section>
        <section title='Contextual Rule Required' anchor='classes-id-contextual'>
          <t>
            <list style='symbols'>
              <t>A number of characters from the Exceptions ("F") category
              defined under <xref target='F'/> (see <xref target='F'/> for a
              full list).</t>

              <t>Joining characters, i.e., the JoinControl ("H") category
              defined under <xref target='H'/>.</t>

            </list>
          </t>
        </section>
        <section title='Disallowed' anchor='classes-id-disallowed'>
          <t>
            <list style='symbols'>
              <t>Old Hangul Jamo characters, i.e., the OldHangulJamo ("I")
              category defined under <xref target='I'/>.</t>

              <t>Control characters, i.e., the Controls ("L") category defined
              under <xref target='L'/>.</t>

              <t>Ignorable characters, i.e., the PrecisIgnorableProperties
              ("M") category defined under <xref target='M'/>.</t>

              <t>Space characters, i.e., the Spaces ("N") category defined
              under <xref target='N'/>.</t>

              <t>Symbol characters, i.e., the Symbols ("O") category defined
              under <xref target='O'/>.</t>

              <t>Punctuation characters, i.e., the Punctuation ("P") category
              defined under <xref target='P'/>.</t>

              <t>Any character that has a compatibility equivalent, i.e., the
              HasCompat ("Q") category defined under <xref target='Q'/>.
              These code points are disallowed even if they would otherwise be
              valid according to the property-based rules specified in the
              previous section.</t>

              <t>Letters and digits other than the "traditional" letters and
              digits allowed in IDNs, i.e., the OtherLetterDigits ("R")
              category defined under <xref target='R'/>.</t>

            </list>
          </t>
        </section>
        <section title='Unassigned' anchor='classes-id-unassigned'>
          <t>Any code points that are not yet designated in the Unicode
          character set are considered unassigned for purposes of the
          IdentifierClass, and such code points are to be treated as
          disallowed.  See <xref target='J'/>.</t>

        </section>
        <section title='Examples' anchor='classes-id-examples'>
          <t>As described in the Introduction to this document, the string
          classes do not handle all issues related to string preparation and
          comparison (such as case mapping); instead, such issues are handled
          at the level of profiles.  Examples for profiles of the
          IdentifierClass can be found in <xref target='RFC7613'/>
          (the UsernameCaseMapped and UsernameCasePreserved profiles).</t>

        </section>
      </section>

      <section title="FreeformClass" anchor="classes-free">
        <t>Some application technologies need strings that can be used in a
        free-form way, e.g., as a password in an authentication exchange (see
        <xref target='RFC7613'/>) or a nickname in a
        chatroom (see <xref target='RFC7700'/>).  We group
        such things into a class called "FreeformClass" having the following
        features.</t>

        <t><list style='empty'><t>Security Warning: As mentioned, the
        FreeformClass prioritizes expressiveness over safety; <xref
        target='security-freeformclass'/> describes some of the security
        hazards involved with using or profiling the
        FreeformClass.</t></list></t>

        <t><list style='empty'><t>Security Warning: Consult <xref
        target='security-passwords'/> for relevant security considerations
        when strings conforming to the FreeformClass, or a profile thereof,
        are used as passwords.</t></list></t>

        <section title='Valid' anchor='classes-free-valid'>
          <t>
            <list style='symbols'>
              <t>Traditional letters and numbers, i.e., the LetterDigits ("A")
              category first defined in <xref target='RFC5892'/> and listed
              here under <xref target='A'/>.</t>

              <t>Letters and digits other than the "traditional" letters and
              digits allowed in IDNs, i.e., the OtherLetterDigits ("R")
              category defined under <xref target='R'/>.</t>

              <t>Code points in the range U+0021 through U+007E, i.e., the
              (printable) ASCII7 ("K") category defined under <xref
              target='K'/>.</t>

              <t>Any character that has a compatibility equivalent, i.e., the
              HasCompat ("Q") category defined under <xref target='Q'/>.</t>

              <t>Space characters, i.e., the Spaces ("N") category defined
              under <xref target='N'/>.</t>

              <t>Symbol characters, i.e., the Symbols ("O") category defined
              under <xref target='O'/>.</t>

              <t>Punctuation characters, i.e., the Punctuation ("P") category
              defined under <xref target='P'/>.</t>

            </list>
          </t>
        </section>
        <section title='Contextual Rule Required' anchor='classes-free-contextual'>
          <t>
            <list style='symbols'>
              <t>A number of characters from the Exceptions ("F") category
              defined under <xref target='F'/> (see <xref target='F'/> for a
              full list).</t>

              <t>Joining characters, i.e., the JoinControl ("H") category
              defined under <xref target='H'/>.</t>

            </list>
          </t>
        </section>
        <section title='Disallowed' anchor='classes-free-disallowed'>
          <t>
            <list style='symbols'>
              <t>Old Hangul Jamo characters, i.e., the OldHangulJamo ("I")
              category defined under <xref target='I'/>.</t>

              <t>Control characters, i.e., the Controls ("L") category defined
              under <xref target='L'/>.</t>

              <t>Ignorable characters, i.e., the PrecisIgnorableProperties
              ("M") category defined under <xref target='M'/>.</t>

            </list>
          </t>
        </section>
        <section title='Unassigned' anchor='classes-free-unassigned'>
          <t>Any code points that are not yet designated in the Unicode
          character set are considered unassigned for purposes of the
          FreeformClass, and such code points are to be treated as
          disallowed.</t>

        </section>
        <section title='Examples' anchor='classes-free-examples'>
          <t>As described in the Introduction to this document, the string
          classes do not handle all issues related to string preparation and
          comparison (such as case mapping); instead, such issues are handled
          at the level of profiles.  Examples for profiles of the
          FreeformClass can be found in <xref target='RFC7613'/>
          (the OpaqueString profile) and <xref target='RFC7700'/>
          (the Nickname profile).</t>

        </section>
      </section>
    </section>

    <section title="Profiles" anchor="profiles">

        <t>This framework document defines the valid,
        contextual-rule-required, disallowed, and unassigned rules for the
        IdentifierClass and the FreeformClass.  A profile of a PRECIS string
        class MUST define the width mapping, additional mappings (if any),
        case mapping, normalization, and directionality rules.  A profile MAY
        also restrict the allowable characters above and beyond the definition
        of the relevant PRECIS string class (but MUST NOT add as valid any
        code points that are disallowed by the relevant PRECIS string class).
        These matters are discussed in the following subsections.</t>

        <t>Profiles of the PRECIS string classes are registered with the IANA
        as described under <xref target='iana-profiles'/>.  Profile names use
        the following convention: they are of the form "Profilename of
        BaseClass", where the "Profilename" string is a differentiator and
        "BaseClass" is the name of the PRECIS string class being profiled; for
        example, the profile of the FreeformClass used for opaque strings
        such as passwords is the OpaqueString profile <xref
        target='RFC7613'/>.</t>


      <section title="Profiles Must Not Be Multiplied beyond Necessity" anchor="profiles-proliferation">
        <t>The risk of profile proliferation is significant because having too
        many profiles will result in different behavior across various
        applications, thus violating what is known in user interface design as
        the "Principle of Least Astonishment".</t>

        <t>Indeed, we already have too many profiles.  Ideally we would have
        at most two or three profiles.  Unfortunately, numerous application
        protocols exist with their own quirks regarding protocol strings.
        Domain names, email addresses, instant messaging addresses, chatroom
        nicknames, filenames, authentication identifiers, passwords, and other
        strings are already out there in the wild and need to be supported in
        existing application protocols such as DNS, SMTP, the
        Extensible Messaging and Presence Protocol (XMPP),
        Internet Relay Chat (IRC), NFS, the Internet Small Computer System
        Interface (iSCSI), the Extensible Authentication Protocol (EAP),
        and the Simple Authentication and Security Layer (SASL), among
        others.</t>

        <t>Nevertheless, profiles must not be multiplied beyond necessity.</t>

        <t>To help prevent profile proliferation, this document recommends
        sensible defaults for the various options offered to profile creators
        (such as width mapping and Unicode normalization).  In addition, the
        guidelines for designated experts provided under <xref
        target='guidelines'/> are meant to encourage a high level of due
        diligence regarding new profiles.</t>

      </section>

      <section title="Rules" anchor="profiles-rules">

        <section title="Width Mapping Rule" anchor="profiles-principles-width">
          <t>The width mapping rule of a profile specifies whether width
          mapping is performed on the characters of a string, and how the
          mapping is done.  Typically, such mapping consists of mapping
          fullwidth and halfwidth characters, i.e., code points with a
          Decomposition Type of Wide or Narrow, to their decomposition
          mappings; as an example, FULLWIDTH DIGIT ZERO (U+FF10) would be
          mapped to DIGIT ZERO (U+0030).</t>

          <t>The normalization form specified by a profile (see below) has an
          impact on the need for width mapping.  Because width mapping is
          performed as a part of compatibility decomposition, a profile
          employing either normalization form KD (NFKD) or normalization form
          KC (NFKC) does not need to specify width mapping.  However, if
          Unicode normalization form C (NFC) is used (as is recommended) then
          the profile needs to specify whether to apply width mapping; in this
          case, width mapping is in general RECOMMENDED because allowing
          fullwidth and halfwidth characters to remain unmapped to their
          compatibility variants would violate the "Principle of Least
          Astonishment".  For more information about the concept of width in
          East Asian scripts within Unicode, see Unicode Standard Annex #11
          <xref target='UAX11'/>.</t>

        </section>

        <section title="Additional Mapping Rule" anchor="profiles-principles-additional">
          <t>The additional mapping rule of a profile specifies whether
          additional mappings are performed on the characters of a string, such
          as:</t>

          <t>
            <list>
              <t>Mapping of delimiter characters (such as '@', ':', '/', '+',
              and '-')</t>

              <t>Mapping of special characters (e.g., non-ASCII space
              characters to ASCII space or control characters to nothing).</t>

            </list>
          </t>
          <t>The PRECIS mappings document <xref
          target='RFC7790'/> describes such mappings in more
          detail.</t>

        </section>

        <section title="Case Mapping Rule" anchor="profiles-principles-case">
          <t>The case mapping rule of a profile specifies whether case mapping
          (instead of case preservation) is performed on the characters of a
          string, and how the mapping is applied (e.g., mapping uppercase and
          titlecase characters to their lowercase equivalents).</t>

          <t>If case mapping is desired (instead of case preservation), it is
          RECOMMENDED to use the Unicode toLower() operation defined in the
          Unicode Standard <xref target='Unicode'/>.  In contrast to the Unicode
          CaseFold() operation, the toLower() operation is less likely to violate 
          the "Principle of Least Astonishment", especially when an application 
          merely wishes to convert uppercase and titlecase code points to the 
          lowercase equivalents while preserving lowercase code points.  Although
          the CaseFold() operation can be appropriate when an application needs 
          to compare two strings (such as in search operations), in general few
          application developers and even fewer users understand its implications,
          so toLower() is almost always the safer choice.</t>

          <t><list style='empty'><t>Note: Neither toLower() nor CaseFold() is
          designed to handle various localization issues (such as so-called
          "dotless i" in several Turkic languages).  The PRECIS mappings
          document <xref target='RFC7790'/> describes these
          issues in greater detail and defines a "local case mapping" method
          that handles some locale-dependent and context-dependent
          mappings.</t></list></t>

          <t>In order to maximize entropy and minimize the potential for false
          positives, it is NOT RECOMMENDED for application protocols to map
          uppercase and titlecase code points to their lowercase equivalents
          when strings conforming to the FreeformClass, or a profile thereof,
          are used in passwords; instead, it is RECOMMENDED to preserve the
          case of all code points contained in such strings and then perform
          case-sensitive comparison.  See also the related discussion
          in <xref target="security-passwords"/> and
          in <xref target='RFC7613'/>.</t>

        </section>

        <section title="Normalization Rule" anchor="profiles-principles-normalization">
          <t>The normalization rule of a profile specifies which Unicode
          normalization form (D, KD, C, or KC) is to be applied (see Unicode
          Standard Annex #15 <xref target='UAX15'/> for background
          information).</t>

          <t>In accordance with <xref target='RFC5198'/>, normalization form C
          (NFC) is RECOMMENDED.</t>

        </section>

        <section title="Directionality Rule" anchor="profiles-principles-directionality">
          <t>The directionality rule of a profile specifies how to treat
          strings containing what are often called "right-to-left" (RTL)
          characters (see Unicode Standard Annex #9 <xref target='UAX9'/>).
          RTL characters come from scripts that are normally written from
          right to left and are considered by Unicode to, themselves, have
          right-to-left directionality.  Some strings containing RTL
          characters also contain "left-to-right" (LTR) characters, such as
          numerals, as well as characters without directional properties.
          Consequently, such strings are known as "bidirectional strings".</t>

          <t>Presenting bidirectional strings in different layout systems
          (e.g., a user interface that is configured to handle primarily an
          RTL script vs. an interface that is configured to handle primarily
          an LTR script) can yield display results that, while predictable to
          those who understand the display rules, are counter-intuitive to
          casual users.  In particular, the same bidirectional string (in
          PRECIS terms) might not be presented in the same way to users of
          those different layout systems, even though the presentation is
          consistent within any particular layout system.  In some
          applications, these presentation differences might be considered
          problematic and thus the application designers might wish to
          restrict the use of bidirectional strings by specifying a
          directionality rule.  In other applications, these presentation
          differences might not be considered problematic (this especially
          tends to be true of more "free-form" strings) and thus no
          directionality rule is needed.</t> 

          <t>The PRECIS framework does not directly address how to deal with
          bidirectional strings across all string classes and profiles, and
          does not define any new directionality rules, since at present there
          is no widely accepted and implemented solution for the safe display
          of arbitrary bidirectional strings beyond the Unicode bidirectional
          algorithm <xref target='UAX9'/>.  Although rules for management and
          display of bidirectional strings have been defined for domain name
          labels and similar identifiers through the "Bidi Rule" specified in
          the IDNA2008 specification on right-to-left scripts <xref
          target='RFC5893'/>, those rules are quite restrictive and are not
          necessarily applicable to all bidirectional strings.</t>

          <t>The authors of a PRECIS profile might believe that they need to
          define a new directionality rule of their own.  Because of the
          complexity of the issues involved, such a belief is almost always
          misguided, even if the authors have done a great deal of careful
          research into the challenges of displaying bidirectional strings.
          This document strongly suggests that profile authors who are
          thinking about defining a new directionality rule think again, and
          instead consider using the "Bidi Rule" <xref target='RFC5893'/> (for
          profiles based on the IdentifierClass) or following the Unicode
          bidirectional algorithm <xref target='UAX9'/> (for profiles based on
          the FreeformClass or in situations where the IdentifierClass is not
          appropriate).</t>

        </section>

      </section>

      <section title="A Note about Spaces" anchor="profiles-space">
        <t>With regard to the IdentifierClass, the consensus of the PRECIS
        Working Group was that spaces are problematic for many reasons,
        including the following:</t>

        <t>
          <list style='symbols'>
            <t>Many Unicode characters are confusable with ASCII space.</t>

            <t>Even if non-ASCII space characters are mapped to ASCII space
            (U+0020), space characters are often not rendered in user
            interfaces, leading to the possibility that a human user might
            consider a string containing spaces to be equivalent to the same
            string without spaces.</t>

            <t>In some locales, some devices are known to generate a character
            other than ASCII space (such as ZERO WIDTH JOINER, U+200D) when a
            user performs an action like hitting the space bar on a
            keyboard.</t>

          </list>
        </t>
        <t>One consequence of disallowing space characters in the
        IdentifierClass might be to effectively discourage their use within
        identifiers created in newer application protocols; given the
        challenges involved with properly handling space characters
        (especially non-ASCII space characters) in identifiers and other
        protocol strings, the PRECIS Working Group considered this to be a
        feature, not a bug.</t>

        <t>However, the FreeformClass does allow spaces, which enables
        application protocols to define profiles of the FreeformClass that are
        more flexible than any profiles of the IdentifierClass.  In addition,
        as explained in <xref target="apps-constructs"/>, application
        protocols can also define application-layer constructs containing
        spaces.</t>

      </section>
    </section>

    <section title="Applications" anchor="apps">
      <section title="How to Use PRECIS in Applications" anchor="apps-howto">
        <t>Although PRECIS has been designed with applications in mind,
        internationalization is not suddenly made easy through the use of
        PRECIS.  Application developers still need to give some thought to how
        they will use the PRECIS string classes, or profiles thereof, in their
        applications.  This section provides some guidelines to application
        developers (and to expert reviewers of application protocol
        specifications).</t>

        <t>
          <list style='symbols'>
            <t>Don't define your own profile unless absolutely necessary (see
            <xref target="profiles-proliferation"/>).  Existing profiles have
            been designed for wide reuse.  It is highly likely that an existing
            profile will meet your needs, especially given the ability to
            specify further excluded characters (<xref
            target='apps-exclusion'/>) and to build application-layer
            constructs (see <xref target='apps-constructs'/>).</t>

            <t>Do specify:
              <list style='symbols'>
                <t>Exactly which entities are responsible for preparation,
                enforcement, and comparison of internationalized strings
                (e.g., servers or clients).</t>

                <t>Exactly when those entities need to complete their tasks
                (e.g., a server might need to enforce the rules of a profile
                before allowing a client to gain network access).</t>

                <t>Exactly which protocol slots need to be checked against
                which profiles (e.g., checking the address of a message's
                intended recipient against the UsernameCaseMapped profile
                <xref target='RFC7613'/> of the
                IdentifierClass, or checking the password of a user against
                the OpaqueString profile <xref
                target='RFC7613'/> of the
                FreeformClass).</t>

              </list>
            See <xref target='RFC7613'/> and <xref target='RFC7622'/> for definitions of these matters for several applications.</t>
          </list>
        </t>
      </section>
      <section title="Further Excluded Characters" anchor="apps-exclusion">
        <t>An application protocol that uses a profile MAY specify particular
        code points that are not allowed in relevant slots within that
        application protocol, above and beyond those excluded by the string
        class or profile.</t>

        <t>That is, an application protocol MAY do either of the
        following:</t>

        <t>
          <list style='numbers'>
            <t>Exclude specific code points that are allowed by the relevant
            string class.</t>

            <t>Exclude characters matching certain Unicode properties (e.g.,
            math symbols) that are included in the relevant PRECIS string
            class.</t>

          </list>
        </t>
        <t>As a result of such exclusions, code points that are defined as
        valid for the PRECIS string class or profile will be defined as
        disallowed for the relevant protocol slot.</t>

        <t>Typically, such exclusions are defined for the purpose of
        backward compatibility with legacy formats within an application
        protocol.  These are defined for application protocols, not profiles,
        in order to prevent multiplication of profiles beyond necessity (see
        <xref target='profiles-proliferation'/>).</t>

      </section>

      <section title="Building Application-Layer Constructs" anchor="apps-constructs">
        <t>Sometimes, an application-layer construct does not map in a
        straightforward manner to one of the base string classes or a profile
        thereof.  Consider, for example, the "simple user name" construct in
        the Simple Authentication and Security Layer (SASL) <xref
        target='RFC4422'/>.  Depending on the deployment, a simple user name
        might take the form of a user's full name (e.g., the user's personal
        name followed by a space and then the user's family name).  Such a
        simple user name cannot be defined as an instance of the
        IdentifierClass or a profile thereof, since space characters are not
        allowed in the IdentifierClass; however, it could be defined using a
        space-separated sequence of IdentifierClass instances, as in the
        following ABNF <xref target='RFC5234'/> from <xref
        target='RFC7613'/>:</t>

        <figure>
          <artwork><![CDATA[
   username   = userpart *(1*SP userpart)
   userpart   = 1*(idbyte)
                ;
                ; an "idbyte" is a byte used to represent a 
                ; UTF-8 encoded Unicode code point that can be
                ; contained in a string that conforms to the 
                ; PRECIS "IdentifierClass"
                ;
          ]]></artwork>
        </figure>
        <t>Similar techniques could be used to define many application-layer
        constructs, say of the form "user@domain" or "/path/to/file".</t>

      </section>
    </section>

    <section title="Order of Operations" anchor="order">
      <t>To ensure proper comparison, the rules specified for a particular
      string class or profile MUST be applied in the following order:</t>

      <t>
        <list style='numbers'>
          <t>Width Mapping Rule</t>
          <t>Additional Mapping Rule</t>
          <t>Case Mapping Rule</t>
          <t>Normalization Rule</t>
          <t>Directionality Rule</t>
          <t>Behavioral rules for determining whether a code point is valid,
          allowed under a contextual rule, disallowed, or unassigned</t>

        </list>
      </t>
      <t>As already described, the width mapping, additional mapping, case
      mapping, normalization, and directionality rules are specified for each
      profile, whereas the behavioral rules are specified for each string
      class.  Some of the logic behind this order is provided under <xref
      target='profiles-principles-width'/> (see also the PRECIS mappings
      document <xref target='RFC7790'/>).</t>

    </section>

    <section anchor='PropertyCalculation' title="Code Point Properties">
      <t>In order to implement the string classes described above, this
      document does the following:</t>

      <t>
        <list style='numbers'>
          <t>Reviews and classifies the collections of code points in the
          Unicode character set by examining various code point
          properties.</t>

          <t>Defines an algorithm for determining a derived property value,
          which can vary depending on the string class being used by the
          relevant application protocol.</t>

         </list>
       </t>
      <t>This document is not intended to specify precisely how derived
      property values are to be applied in protocol strings.  That information
      is the responsibility of the protocol specification that uses or
      profiles a PRECIS string class from this document.  The value of the
      property is to be interpreted as follows.</t> 

      <t>
        <list style="hanging">
          <t hangText='PROTOCOL VALID'>Those code points that are allowed to
          be used in any PRECIS string class (currently, IdentifierClass and
          FreeformClass).  The abbreviated term "PVALID" is used to refer to
          this value in the remainder of this document.</t>

          <t hangText='SPECIFIC CLASS PROTOCOL VALID'>Those code points that
          are allowed to be used in specific string classes.  In the remainder
          of this document, the abbreviated term *_PVAL is used, where * = (ID
          | FREE), i.e., either "FREE_PVAL" or "ID_PVAL".  In practice, the
          derived property ID_PVAL is not used in this specification, since
          every ID_PVAL code point is PVALID.</t>

          <t hangText='CONTEXTUAL RULE REQUIRED'>Some characteristics of the
          character, such as its being invisible in certain contexts or
          problematic in others, require that it not be used in labels unless
          specific other characters or properties are present.  As in
          IDNA2008, there are two subdivisions of CONTEXTUAL RULE REQUIRED --
          the first for Join_controls (called "CONTEXTJ") and the second for
          other characters (called "CONTEXTO").  A character with the derived
          property value CONTEXTJ or CONTEXTO MUST NOT be used unless an
          appropriate rule has been established and the context of the
          character is consistent with that rule.  The most notable of the
          CONTEXTUAL RULE REQUIRED characters are the Join Control characters
          U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH NON&nbhy;JOINER, which
          have a derived property value of CONTEXTJ.  See Appendix A of <xref
          target='RFC5892'/> for more information.</t>

          <t hangText='DISALLOWED'>Those code points that are not permitted in
          any PRECIS string class.</t>

          <t hangText='SPECIFIC CLASS DISALLOWED'>Those code points that are
          not to be included in one of the string classes but that might be
          permitted in others.  In the remainder of this document, the
          abbreviated term *_DIS is used, where * = (ID | FREE), i.e., either
          "FREE_DIS" or "ID_DIS".  In practice, the derived property FREE_DIS
          is not used in this specification, since every FREE_DIS code point
          is DISALLOWED.</t>

          <t hangText='UNASSIGNED'>Those code points that are not designated
          (i.e., are unassigned) in the Unicode Standard.</t>

        </list>
      </t>
      <t>The algorithm to calculate the value of the derived property is as
      follows (implementations MUST NOT modify the order of operations within
      this algorithm, since doing so would cause inconsistent results across
      implementations):</t>

      <figure>
        <artwork>
If .cp. .in. Exceptions Then Exceptions(cp);
Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
Else If .cp. .in. Unassigned Then UNASSIGNED;
Else If .cp. .in. ASCII7 Then PVALID;
Else If .cp. .in. JoinControl Then CONTEXTJ;
Else If .cp. .in. OldHangulJamo Then DISALLOWED;
Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED;
Else If .cp. .in. Controls Then DISALLOWED;
Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL;
Else If .cp. .in. LetterDigits Then PVALID;
Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL;
Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL;
Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL;
Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL;
Else DISALLOWED;
        </artwork>
      </figure>
      <t>The value of the derived property calculated can depend on the string
      class; for example, if an identifier used in an application protocol is
      defined as profiling the PRECIS IdentifierClass then a space character
      such as U+0020 would be assigned to ID_DIS, whereas if an identifier is
      defined as profiling the PRECIS FreeformClass then the character would
      be assigned to FREE_PVAL.  For the sake of brevity, the designation
      "FREE_PVAL" is used herein, instead of the longer designation "ID_DIS or
      FREE_PVAL".  In practice, the derived properties ID_PVAL and FREE_DIS
      are not used in this specification, since every ID_PVAL code point is
      PVALID and every FREE_DIS code point is DISALLOWED.</t>

      <t>Use of the name of a rule (such as "Exceptions") implies the set of
      code points that the rule defines, whereas the same name as a function
      call (such as "Exceptions(cp)") implies the value that the code point
      has in the Exceptions table.</t>

      <t>The mechanisms described here allow determination of the value of the
      property for future versions of Unicode (including characters added
      after Unicode 5.2 or 7.0 depending on the category, since some
      categories mentioned in this document are simply pointers to IDNA2008
      and therefore were defined at the time of Unicode 5.2).  Changes in
      Unicode properties that do not affect the outcome of this process
      therefore do not affect this framework.  For example, a character can
      have its Unicode General_Category value change from So to Sm, or
      from Lo to Ll, without affecting the algorithm results.  Moreover, even
      if such changes were to result, the <xref target="G">BackwardCompatible
      list</xref> can be adjusted to ensure the stability of the results.</t>

    </section>

    <section anchor="categories" title="Category Definitions Used to Calculate Derived Property">
      <t>The derived property obtains its value based on a two-step
      procedure:</t>

      <t>
        <list style='numbers'>
          <t>Characters are placed in one or more character categories either
          (1) based on core properties defined by the Unicode Standard or (2)
          by treating the code point as an exception and addressing the code
          point based on its code point value.  These categories are not
          mutually exclusive.</t>

          <t>Set operations are used with these categories to determine the
          values for a property specific to a given string class. These
          operations are specified under <xref target="PropertyCalculation"
          />.</t>

        </list>
      </t>
      <t><list style='empty'><t>Note: Unicode property names and property
      value names might have short abbreviations, such as "gc" for the
      General_Category property and "Ll" for the Lowercase_Letter property
      value of the gc property.</t></list></t>

     <t>In the following specification of character categories, the operation
     that returns the value of a particular Unicode character property for a
     code point is designated by using the formal name of that property (from
     the Unicode PropertyAliases.txt file <xref target="PropertyAliases"/>
     followed by "(cp)" for "code point".  For example, the value of the
     General_Category property for a code point is indicated by
     General_Category(cp).</t>

      <t>The first ten categories (A-J) shown below were previously defined
      for IDNA2008 and are referenced from <xref target='RFC5892'/> to ease
      the understanding of how PRECIS handles various characters.  Some of
      these categories are reused in PRECIS, and some of them are not; however,
      the lettering of categories is retained to prevent overlap and to ease
      implementation of both IDNA2008 and PRECIS in a single software
      application.  The next eight categories (K-R) are specific to
      PRECIS.</t>

        <section anchor="A" title="LetterDigits (A)">
          <t>This category is defined in Section 2.1 of <xref
          target='RFC5892'/> and is included by reference for use in
          PRECIS.</t>

        </section>
        <section anchor="B" title="Unstable (B)">
          <t>This category is defined in Section 2.2 of <xref
          target='RFC5892'/>.  However, it is not used in PRECIS.</t>

        </section>
        <section anchor="C" title="IgnorableProperties (C)">
          <t>This category is defined in Section 2.3 of <xref
          target='RFC5892'/>.  However, it is not used in PRECIS.</t>

          <t>Note: See the PrecisIgnorableProperties ("M") category below
          for a more inclusive category used in PRECIS identifiers.</t>

        </section>
        <section anchor="D" title="IgnorableBlocks (D)">
          <t>This category is defined in Section 2.4 of <xref
          target='RFC5892'/>.  However, it is not used in PRECIS.</t>

        </section>
        <section anchor="E" title="LDH (E)">
          <t>This category is defined in Section 2.5 of <xref
          target='RFC5892'/>.  However, it is not used in PRECIS.</t>

          <t>Note: See the ASCII7 ("K") category below for a more
          inclusive category used in PRECIS identifiers.</t>

        </section>
        <section anchor="F" title="Exceptions (F)">
          <t>This category is defined in Section 2.6 of <xref
          target='RFC5892'/> and is included by reference for use in
          PRECIS.</t>

        </section>
        <section anchor="G" title="BackwardCompatible (G)">
          <t>This category is defined in Section 2.7 of <xref
          target='RFC5892'/> and is included by reference for use in
          PRECIS.</t>

          <t>Note: Management of this category is handled via the processes
          specified in <xref target='RFC5892'/>.  At the time of this writing
          (and also at the time that RFC 5892 was published), this category
          consisted of the empty set; however, that is subject to change as
          described in RFC&nbsp;5892.</t>

        </section>
        <section anchor="H" title="JoinControl (H)">
          <t>This category is defined in Section 2.8 of <xref
          target='RFC5892'/> and is included by reference for use in
          PRECIS.</t>

        </section>
        <section anchor="I" title="OldHangulJamo (I)">
          <t>This category is defined in Section 2.9 of <xref
          target='RFC5892'/> and is included by reference for use in
          PRECIS.</t>

        </section>
        <section anchor="J" title="Unassigned (J)">
          <t>This category is defined in Section 2.10 of <xref
          target='RFC5892'/> and is included by reference for use in
          PRECIS.</t>

        </section>
        <section anchor="K" title="ASCII7 (K)">
          <t>This PRECIS-specific category consists of all printable,
          non-space characters from the 7-bit ASCII range.  By applying this
          category, the algorithm specified under <xref
          target='PropertyCalculation'/> exempts these characters from other
          rules that might be applied during PRECIS processing, on the
          assumption that these code points are in such wide use that
          disallowing them would be counter-productive.</t>

          <figure>
            <artwork>
K: cp is in {0021..007E}
            </artwork>
          </figure>
        </section>
        <section anchor="L" title="Controls (L)">
          <t>This PRECIS-specific category consists of all control
          characters.</t>

          <figure>
            <artwork>
L: Control(cp) = True
            </artwork>
          </figure>
        </section>
        <section anchor="M" title="PrecisIgnorableProperties (M)">
          <t>This PRECIS-specific category is used to group code points that
          are discouraged from use in PRECIS string classes.</t>

          <figure>
            <artwork>
M: Default_Ignorable_Code_Point(cp) = True or
   Noncharacter_Code_Point(cp) = True
            </artwork>
          </figure>
          <t>The definition for Default_Ignorable_Code_Point can be found in
  the DerivedCoreProperties.txt file <xref target="DerivedCoreProperties"/>.
  </t>

        </section>
        <section anchor="N" title="Spaces (N)">
          <t>This PRECIS-specific category is used to group code points that
          are space characters.</t>

          <figure>
            <artwork>
N: General_Category(cp) is in {Zs}
            </artwork>
          </figure>
        </section>
        <section anchor="O" title="Symbols (O)">
          <t>This PRECIS-specific category is used to group code points that
          are symbols.</t>

          <figure>
            <artwork>
O: General_Category(cp) is in {Sm, Sc, Sk, So}
            </artwork>
          </figure>
        </section>
        <section anchor="P" title="Punctuation (P)">
          <t>This PRECIS-specific category is used to group code points that
          are punctuation characters.</t>

          <figure>
            <artwork>
P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}
            </artwork>
          </figure>
        </section>
        <section anchor="Q" title="HasCompat (Q)">
          <t>This PRECIS-specific category is used to group code points that
          have compatibility equivalents as explained in the Unicode Standard.</t>

          <figure>
            <artwork>
Q: toNFKC(cp) != cp
            </artwork>
          </figure>
          <t>The toNFKC() operation returns the code point in normalization
          form KC.  For more information, see Section 5 of Unicode Standard
          Annex #15 <xref target='UAX15'/>.</t>

        </section>
        <section anchor="R" title="OtherLetterDigits (R)">
          <t>This PRECIS-specific category is used to group code points that
          are letters and digits other than the "traditional" letters and
          digits grouped under the LetterDigits (A) class (see <xref
          target='A'/>).</t>

          <figure>
            <artwork>
R: General_Category(cp) is in {Lt, Nl, No, Me}
            </artwork>
          </figure>
        </section>
      </section>

    <section title='Guidelines for Designated Experts' anchor='guidelines'>
      <t>Experience with internationalization in application protocols has
      shown that protocol designers and application developers usually do not
      understand the subtleties and tradeoffs involved with
      internationalization and that they need considerable guidance in making
      reasonable decisions with regard to the options before them.</t>

      <t>Therefore:</t>
      <t>
        <list style='symbols'>
          <t>Protocol designers are strongly encouraged to question the
          assumption that they need to define new profiles, since existing
          profiles are designed for wide reuse (see <xref target='profiles'/>
          for further discussion).</t>

          <t>Those who persist in defining new profiles are strongly
          encouraged to clearly explain a strong justification for doing so,
          and to publish a stable specification that provides all of the
          information described under <xref target='iana-profiles'/>.</t>

          <t>The designated experts for profile registration requests ought to
          seek answers to all of the questions provided under <xref
          target='iana-profiles'/> and to encourage applicants to provide a
          stable specification documenting the profile (even though the
          registration policy for PRECIS profiles is Expert Review and a
          stable specification is not strictly required).</t>

          <t>Developers of applications that use PRECIS are strongly
          encouraged to apply the guidelines provided under <xref
          target='apps'/> and to seek out the advice of the designated experts
          or other knowledgeable individuals in doing so.</t>

          <t>All parties are strongly encouraged to help prevent the
          multiplication of profiles beyond necessity, as described under
          <xref target='profiles-proliferation'/>, and to use PRECIS in ways
          that will minimize user confusion and insecure application
          behavior.</t>

        </list>
      </t>
      <t>Internationalization can be difficult and contentious; designated
      experts, profile registrants, and application developers are strongly
      encouraged to work together in a spirit of good faith and mutual
      understanding to achieve rough consensus on profile registration
      requests and the use of PRECIS in particular applications.  They are
      also encouraged to bring additional expertise into the discussion if
      that would be helpful in adding perspective or otherwise resolving
      issues.</t>

    </section>

    <section title="IANA Considerations" anchor='iana'>

      <section anchor="iana-derived" title="PRECIS Derived Property Value Registry">
        <t>IANA has created and now maintains the "PRECIS Derived Property
	Value" registry that records the derived properties for the versions
	of Unicode that are released after
        (and including) version 7.0.  The derived property value is to be
        calculated in cooperation with a designated expert <xref
        target='RFC5226'/> according to the rules specified under
        Sections <xref target="PropertyCalculation" format="counter" />
        and <xref target="categories" format="counter" />.</t>

        <t>The IESG is to be notified if backward-incompatible changes to the
        table of derived properties are discovered or if other problems arise
        during the process of creating the table of derived property values or
        during expert review.  Changes to the rules defined under
        Sections&nbsp;<xref target="PropertyCalculation" format="counter" />
        and <xref target="categories" format="counter" />
        require IETF Review.</t>

      </section>

      <section anchor="iana-classes" title="PRECIS Base Classes Registry">
        <t>IANA has created the "PRECIS Base Classes" registry.
        In accordance with <xref target='RFC5226'/>, the registration policy
        is "RFC Required".</t>

        <t>The registration template is as follows:</t>
        <t>
          <list style='hanging'>
            <t hangText='Base Class:'>[the name of the PRECIS string
            class]</t>

            <t hangText='Description:'>[a brief description of the PRECIS
            string class and its intended use, e.g., "A sequence of letters,
            numbers, and symbols that is used to identify or address a network
            entity."]</t>

            <t hangText='Specification:'>[the RFC number]</t>
          </list>
        </t>
        <t>The initial registrations are as follows:</t>
        <figure>
          <artwork>
Base Class: FreeformClass.
Description: A sequence of letters, numbers, symbols, spaces, and
      other code points that is used for free-form strings.
Specification: Section 4.3 of RFC 7564.

Base Class: IdentifierClass.
Description: A sequence of letters, numbers, and symbols that is 
      used to identify or address a network entity.
Specification: Section 4.2 of RFC 7564.
          </artwork>
        </figure>
      </section>

      <section anchor="iana-profiles" title="PRECIS Profiles Registry">
        <t>IANA has created the "PRECIS Profiles" registry to identify
	profiles that use the
        PRECIS string classes.  In accordance with <xref target='RFC5226'/>,
        the registration policy is "Expert Review".  This policy was chosen in
        order to ease the burden of registration while ensuring that
        "customers" of PRECIS receive appropriate guidance regarding the
        sometimes complex and subtle internationalization issues related to
        profiles of PRECIS string classes.</t>

        <t>The registration template is as follows:</t>
        <t>
          <list style='hanging'>
            <t hangText='Name:'>[the name of the profile]</t>
            <t hangText='Base Class:'>[which PRECIS string class is being
            profiled]</t>

            <t hangText='Applicability:'>[the specific protocol elements to
            which this profile applies, e.g., "Localparts in XMPP
            addresses."]</t>

            <t hangText='Replaces:'>[the Stringprep profile that this PRECIS
            profile replaces, if any]</t>

            <t hangText='Width Mapping Rule:'>[the behavioral rule for
            handling of width, e.g., "Map fullwidth and halfwidth characters
            to their compatibility variants."]</t>

            <t hangText='Additional Mapping Rule:'>[any additional mappings
            that are required or recommended, e.g., "Map non-ASCII space
            characters to ASCII space."]</t>

            <t hangText='Case Mapping Rule:'>[the behavioral rule for handling
            of case, e.g., "apply the Unicode toLower() operation"]</t>

            <t hangText='Normalization Rule:'>[which Unicode normalization
            form is applied, e.g., "NFC"]</t>

            <t hangText='Directionality Rule:'>[the behavioral rule for
            handling of right-to-left code points, e.g., "The 'Bidi Rule'
            defined in RFC 5893 applies."]</t>

            <t hangText='Enforcement:'>[which entities enforce the rules, and
            when that enforcement occurs during protocol operations]</t>

            <t hangText='Specification:'>[a pointer to relevant documentation,
            such as an RFC or Internet-Draft]</t>

          </list>
        </t>
        <t>In order to request a review, the registrant shall send a completed
        template to the precis@ietf.org list or its designated successor.</t>

        <t>Factors to focus on while defining profiles and reviewing profile
        registrations include the following:</t>

        <t>
          <list style='symbols'>
            <t>Would an existing PRECIS string class or profile solve the
            problem? If not, why not? (See <xref
            target='profiles-proliferation'/> for related considerations.)</t>

            <t>Is the problem being addressed by this profile well defined?</t>
            <t>Does the specification define what kinds of applications are
            involved and the protocol elements to which this profile
            applies?</t>

            <t>Is the profile clearly defined?</t>
            <t>Is the profile based on an appropriate dividing line between
            user interface (culture, context, intent, locale, device
            limitations, etc.) and the use of conformant strings in protocol
            elements?</t>

            <t>Are the width mapping, case mapping, additional mappings,
            normalization, and directionality rules appropriate for the
            intended use?</t>

            <t>Does the profile explain which entities enforce the rules, and
            when such enforcement occurs during protocol operations?</t>

            <t>Does the profile reduce the degree to which human users could
            be surprised or confused by application behavior (the "Principle
            of Least Astonishment")?</t>

            <t>Does the profile introduce any new security concerns such as
            those described under <xref target='security'/> of this document
            (e.g., false positives for authentication or authorization)?</t>

          </list>
        </t>
      </section>

    </section>

    <section title="Security Considerations" anchor='security'>
      <section title="General Issues" anchor='security-gen'>
        <t>If input strings that appear "the same" to users are
        programmatically considered to be distinct in different systems, or if
        input strings that appear distinct to users are programmatically
        considered to be "the same" in different systems, then users can be
        confused.  Such confusion can have security implications, such as the
        false positives and false negatives discussed in <xref
        target='RFC6943'/>.  One starting goal of work on the PRECIS framework
        was to limit the number of times that users are confused (consistent
        with the "Principle of Least Astonishment").  Unfortunately, this goal
        has been difficult to achieve given the large number of application
        protocols already in existence.  Despite these difficulties, profiles
        should not be multiplied beyond necessity (see <xref
        target='profiles-proliferation'/>).  In particular, application
        protocol designers should think long and hard before defining a new
        profile instead of using one that has already been defined, and if
        they decide to define a new profile then they should clearly explain
        their reasons for doing so.</t>

        <t>The security of applications that use this framework can depend in
        part on the proper preparation, enforcement, and comparison of
        internationalized strings.  For example, such strings can be used to
        make authentication and authorization decisions, and the security of
        an application could be compromised if an entity providing a given
        string is connected to the wrong account or online resource based on
        different interpretations of the string (again, see <xref
        target='RFC6943'/>).</t>

        <t>Specifications of application protocols that use this framework are
        strongly encouraged to describe how internationalized strings are used
        in the protocol, including the security implications of any false
        positives and false negatives that might result from various
        enforcement and comparison operations.  For some helpful guidelines,
        refer to <xref target='RFC6943'/>, <xref target='RFC5890'/>, <xref
        target='UTR36'/>, and <xref target='UTS39'/>.</t>

      </section>
      <section title="Use of the IdentifierClass" anchor='security-identifierclass'>
        <t>Strings that conform to the IdentifierClass and any profile thereof
        are intended to be relatively safe for use in a broad range of
        applications, primarily because they include only letters, digits, and
        "grandfathered" non-space characters from the ASCII range; thus, they
        exclude spaces, characters with compatibility equivalents, and almost
        all symbols and punctuation marks.  However, because such strings can
        still include so-called confusable characters (see <xref
        target='security-confusables'/>), protocol designers and implementers
        are encouraged to pay close attention to the security considerations
        described elsewhere in this document.</t>

      </section>
      <section title="Use of the FreeformClass" anchor='security-freeformclass'>
        <t>Strings that conform to the FreeformClass and many profiles thereof
        can include virtually any Unicode character.  This makes the
        FreeformClass quite expressive, but also problematic from the
        perspective of possible user confusion.  Protocol designers are hereby
        warned that the FreeformClass contains code points they might not
        understand, and are encouraged to profile the IdentifierClass wherever
        feasible; however, if an application protocol requires more code
        points than are allowed by the IdentifierClass, protocol designers are
        encouraged to define a profile of the FreeformClass that restricts the
        allowable code points as tightly as possible.  (The PRECIS Working
        Group considered the option of allowing "superclasses" as well as
        profiles of PRECIS string classes, but decided against allowing
        superclasses to reduce the likelihood of security and interoperability
        problems.)</t>

      </section>
      <section title="Local Character Set Issues" anchor='security-charset'>
        <t>When systems use local character sets other than ASCII and Unicode,
        this specification leaves the problem of converting between the local
        character set and Unicode up to the application or local system.  If
        different applications (or different versions of one application)
        implement different rules for conversions among coded character sets,
        they could interpret the same name differently and contact different
        application servers or other network entities.  This problem is not
        solved by security protocols, such as Transport Layer Security (TLS)
        <xref target='RFC5246'/> and the Simple Authentication and Security
        Layer (SASL) <xref target='RFC4422'/>, that do not take local
        character sets into account.</t>

      </section>
      <section title="Visually Similar Characters" anchor='security-confusables'>
        <t>Some characters are visually similar and thus can cause confusion
        among humans.  Such characters are often called "confusable
        characters" or "confusables".</t>

        <t>The problem of confusable characters is not necessarily caused by
        the use of Unicode code points outside the ASCII range.  For example,
        in some presentations and to some individuals the string "ju1iet"
        (spelled with DIGIT ONE, U+0031, as the third character) might appear
        to be the same as "juliet" (spelled with LATIN SMALL LETTER L,
        U+006C), especially on casual visual inspection.  This phenomenon is
        sometimes called "typejacking".</t>

        <t>However, the problem is made more serious by introducing the full
        range of Unicode code points into protocol strings.  For example, the
        characters U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC U+13D2 from the
        Cherokee block look similar to the ASCII characters "STPETER" as they
        might appear when presented using a "creative" font family.</t>

        <t>In some examples of confusable characters, it is unlikely that the
        average human could tell the difference between the real string and
        the fake string.  (Indeed, there is no programmatic way to distinguish
        with full certainty which is the fake string and which is the real
        string; in some contexts, the string formed of Cherokee characters
        might be the real string and the string formed of ASCII characters
        might be the fake string.)  Because PRECIS-compliant strings can
        contain almost any properly encoded Unicode code point, it can be
        relatively easy to fake or mimic some strings in systems that use the
        PRECIS framework.  The fact that some strings are easily confused
        introduces security vulnerabilities of the kind that have also plagued
        the World Wide Web, specifically the phenomenon known as phishing.</t>

        <t>Despite the fact that some specific suggestions about
        identification and handling of confusable characters appear in the
        Unicode Security Considerations <xref target='UTR36'/> and the Unicode
        Security Mechanisms <xref target='UTS39'/>, it is also true (as noted
        in <xref target='RFC5890'/>) that "there are no comprehensive
        technical solutions to the problems of confusable characters."
        Because it is impossible to map visually similar characters without a
        great deal of context (such as knowing the font families used), the
        PRECIS framework does nothing to map similar-looking characters
        together, nor does it prohibit some characters because they look like
        others.</t>

        <t>Nevertheless, specifications for application protocols that use
        this framework are strongly encouraged to describe how confusable
        characters can be abused to compromise the security of systems that
        use the protocol in question, along with any protocol-specific
        suggestions for overcoming those threats.  In particular, software
        implementations and service deployments that use PRECIS-based
        technologies are strongly encouraged to define and implement
        consistent policies regarding the registration, storage, and
        presentation of visually similar characters.  The following
        recommendations are appropriate:</t>

        <t>
          <list style='numbers'>
            <t>An application service SHOULD define a policy that specifies
            the scripts or blocks of characters that the service will allow to
            be registered (e.g., in an account name) or stored (e.g., in a
            filename).  Such a policy SHOULD be informed by the languages and
            scripts that are used to write registered account names; in
            particular, to reduce confusion, the service SHOULD forbid
            registration or storage of strings that contain characters from
            more than one script and SHOULD restrict registrations to
            characters drawn from a very small number of scripts (e.g.,
            scripts that are well understood by the administrators of the
            service, to improve manageability).</t>

            <t>User-oriented application software SHOULD define a policy that
            specifies how internationalized strings will be presented to a
            human user.  Because every human user of such software has a
            preferred language or a small set of preferred languages, the
            software SHOULD gather that information either explicitly from the
            user or implicitly via the operating system of the user's device.
            Furthermore, because most languages are typically represented by a
            single script or a small set of scripts, and because most scripts
            are typically contained in one or more blocks of characters, the
            software SHOULD warn the user when presenting a string that mixes
            characters from more than one script or block, or that uses
            characters outside the normal range of the user's preferred
            language(s).  (Such a recommendation is not intended to discourage
            communication across different communities of language users;
            instead, it recognizes the existence of such communities and
            encourages due caution when presenting unfamiliar scripts or
            characters to human users.)</t>

          </list>
        </t>
        <t>The challenges inherent in supporting the full range of Unicode
        code points have in the past led some to hope for a way to
        programmatically negotiate more restrictive ranges based on locale,
        script, or other relevant factors; to tag the locale associated with a
        particular string; etc.  As a general-purpose internationalization
        technology, the PRECIS framework does not include such mechanisms.</t>

      </section>
      <section title="Security of Passwords" anchor='security-passwords'>
        <t>Two goals of passwords are to maximize the amount of entropy and to
        minimize the potential for false positives.  These goals can be
        achieved in part by allowing a wide range of code points and by
        ensuring that passwords are handled in such a way that code points are
        not compared aggressively.  Therefore, it is NOT RECOMMENDED for
        application protocols to profile the FreeformClass for use in
        passwords in a way that removes entire categories (e.g., by
        disallowing symbols or punctuation).  Furthermore, it is NOT
        RECOMMENDED for application protocols to map uppercase and titlecase
        code points to their lowercase equivalents in such strings; instead,
        it is RECOMMENDED to preserve the case of all code points contained in
        such strings and to compare them in a case-sensitive manner.</t>

        <t>That said, software implementers need to be aware that there exist
        tradeoffs between entropy and usability.  For example, allowing a user
        to establish a password containing "uncommon" code points might make
        it difficult for the user to access a service when using an unfamiliar
        or constrained input device.</t>

        <t>Some application protocols use passwords directly, whereas others
        reuse technologies that themselves process passwords (one example of
        such a technology is the Simple Authentication and Security Layer
        <xref target='RFC4422'/>).  Moreover, passwords are often carried by a
        sequence of protocols with backend authentication systems or data
        storage systems such as RADIUS <xref target='RFC2865'/> and the
        Lightweight Directory Access Protocol (LDAP) <xref
        target='RFC4510'/>.  Developers of application protocols are
        encouraged to look into reusing these profiles instead of defining new
        ones, so that end-user expectations about passwords are consistent no
        matter which application protocol is used.</t>

        <t>In protocols that provide passwords as input to a cryptographic
        algorithm such as a hash function, the client will need to perform
        proper preparation of the password before applying the algorithm,
        since the password is not available to the server in plaintext
        form.</t>

        <t>Further discussion of password handling can be found in <xref
        target='RFC7613'/>.</t>

      </section>
    </section>

    <section title="Interoperability Considerations" anchor='interop'>
      <section title="Encoding" anchor='interop-encoding'>
        <t>Although strings that are consumed in PRECIS-based application
        protocols are often encoded using UTF-8 <xref target='RFC3629'/>, the
        exact encoding is a matter for the application protocol that uses
        PRECIS, not for the PRECIS framework.</t>

      </section>
      <section title="Character Sets" anchor='interop-characters'>
        <t>It is known that some existing systems are unable to support the
        full Unicode character set, or even any characters outside the ASCII
        range.  If two (or more) applications need to interoperate when
        exchanging data (e.g., for the purpose of authenticating a username or
        password), they will naturally need to have in common at least one
        coded character set (as defined by <xref target='RFC6365'/>).
        Establishing such a baseline is a matter for the application protocol
        that uses PRECIS, not for the PRECIS framework.</t>

      </section>
      <section title="Unicode Versions" anchor='interop-unicode'>
        <t>Changes to the properties of Unicode code points can occur as the
        Unicode Standard is modified from time to time.  For example, three
        code points underwent changes in their GeneralCategory between Unicode
        5.2 (current at the time IDNA2008 was originally published) and
        Unicode 6.0, as described in <xref target='RFC6452'/>.  Implementers
        might need to be aware that the treatment of these characters differs
        depending on which version of Unicode is available on the system that
        is using IDNA2008 or PRECIS.  Other such differences might arise
        between the version of Unicode current at the time of this writing
        (7.0) and future versions.</t>

      </section>
      <section title="Potential Changes to Handling of Certain Unicode Code Points" anchor='interop-handling'>
        <t>As part of the review of Unicode 7.0 for IDNA, a question was
        raised about a newly added code point that led to a re-analysis of the
        normalization rules used by IDNA and inherited by this document (<xref
        target='profiles-principles-normalization'/>).  Some of the general
        issues are described in <xref target='IAB-Statement'/> and pursued in
        more detail in <xref
        target='IDNA-Unicode'/>.</t>

        <t>At the time of writing, these issues have yet to be settled.
        However, implementers need to be aware that this specification is
        likely to be updated in the future to address these issues.  The
        potential changes include the following:</t>

        <t>
          <list style='symbols'>
            <t>The range of characters in the LetterDigits category
            (Sections&nbsp;<xref target='classes-id-valid' format="counter"/>
            and <xref target='A' format="counter"/>) might be
            narrowed.</t>

            <t>Some characters with special properties that are now allowed
            might be excluded.</t>

            <t>More "Additional Mapping Rules" (<xref
            target='profiles-principles-additional'/>) might be defined.</t>

            <t>Alternative normalization methods might be added.</t>
          </list>
        </t>
        <t>Nevertheless, implementations and deployments that are sensitive to
        the advice given in this specification are unlikely to encounter
        significant problems as a consequence of these issues or potential
        changes -- specifically, the advice to use the more restrictive
        IdentifierClass whenever possible or, if using the FreeformClass, to
        allow only a restricted set of characters, particularly avoiding
        characters whose implications they do not actually understand.</t>

      </section>
    </section>

  </middle>

  <back>

    <references title="Normative References">

    <reference  anchor='RFC20' target='http://www.rfc-editor.org/info/rfc20'>
    <front>
    <title>ASCII format for network interchange</title>
    <author initials='V.G.' surname='Cerf' fullname='V.G. Cerf'><organization /></author>
    <date year='1969' month='October' />
    </front>
    <seriesInfo name='STD' value='80'/>
    <seriesInfo name='RFC' value='20'/>
    <seriesInfo name='DOI' value='10.17487/RFC0020'/>
    <format type='ASCII, PDF' octets='18504, 197096'/>
    </reference>

    <reference  anchor='RFC2119' target='http://www.rfc-editor.org/info/rfc2119'>
    <front>
    <title>Key words for use in RFCs to Indicate Requirement Levels</title>
    <author initials='S.' surname='Bradner' fullname='S. Bradner'><organization /></author>
    <date year='1997' month='March' />
    </front>
    <seriesInfo name='BCP' value='14'/>
    <seriesInfo name='RFC' value='2119'/>
    <seriesInfo name='DOI' value='10.17487/RFC2119'/>
    <format type='ASCII' octets='4723'/>
    </reference>
    <reference  anchor='RFC5198' target='http://www.rfc-editor.org/info/rfc5198'>
    <front>
    <title>Unicode Format for Network Interchange</title>
    <author initials='J.' surname='Klensin' fullname='J. Klensin'><organization /></author>
    <author initials='M.' surname='Padlipsky' fullname='M. Padlipsky'><organization /></author>
    <date year='2008' month='March' />
    </front>
    <seriesInfo name='RFC' value='5198'/>
    <seriesInfo name='DOI' value='10.17487/RFC5198'/>
    <format type='ASCII' octets='45708'/>
    </reference>

    <reference  anchor='RFC6365' target='http://www.rfc-editor.org/info/rfc6365'>
    <front>
    <title>Terminology Used in Internationalization in the IETF</title>
    <author initials='P.' surname='Hoffman' fullname='P. Hoffman'><organization /></author>
    <author initials='J.' surname='Klensin' fullname='J. Klensin'><organization /></author>
    <date year='2011' month='September' />
    </front>
    <seriesInfo name='BCP' value='166'/>
    <seriesInfo name='RFC' value='6365'/>
    <seriesInfo name='DOI' value='10.17487/RFC6365'/>
    <format type='ASCII' octets='103155'/>
    </reference>

<reference anchor="Unicode" target="http://www.unicode.org/versions/latest/">
  <front>
    <title>The Unicode Standard</title>
    <author>
      <organization>The Unicode Consortium</organization>
    </author>
<!--    <date year="2015-present" /> -->
  <date/>
  </front>
</reference>

    </references>
    <references title="Informative References">

  <reference anchor="DerivedCoreProperties"
 target="http://www.unicode.org/Public/UCD/latest/ucd/DerivedCoreProperties.txt">
    <front>
      <title>DerivedCoreProperties-7.0.0.txt</title>
      <author>
        <organization>The Unicode Consortium</organization>
      </author>
      <date year="2014" month="February "/>
    </front>
    <seriesInfo name='Unicode Character' value='Database' />
  </reference>

  <reference anchor="PropertyAliases"
 target="http://www.unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt">
    <front>
      <title>PropertyAliases-7.0.0.txt</title>
      <author>
        <organization>The Unicode Consortium</organization>
      </author>
      <date year="2013" month="November"/>
    </front>
    <seriesInfo name='Unicode Character' value='Database' />
  </reference>

<reference anchor="IAB-Statement" target='https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/'>
  <front>
    <title>IAB Statement on Identifiers and Unicode 7.0.0</title>
    <author>
      <organization>Internet Architecture Board</organization>
    </author>
    <date month="February" year="2015" />
  </front>
</reference>

<reference  anchor='RFC7564' target='http://www.rfc-editor.org/info/rfc7564'>
<front>
<title>PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols</title>
<author initials='P.' surname='Saint-Andre' fullname='P. Saint-Andre'><organization /></author>
<author initials='M.' surname='Blanchet' fullname='M. Blanchet'><organization /></author>
<date year='2015' month='May' />
<abstract><t>Application protocols using Unicode characters in protocol strings need to properly handle such strings in order to enforce internationalization rules for strings placed in various protocol slots (such as addresses and identifiers) and to perform valid comparison operations (e.g., for purposes of authentication or authorization).  This document defines a framework enabling application protocols to perform the preparation, enforcement, and comparison of internationalized strings (&quot;PRECIS&quot;) in a way that depends on the properties of Unicode characters and thus is agile with respect to versions of Unicode.  As a result, this framework provides a more sustainable approach to the handling of internationalized strings than the previous framework, known as Stringprep (RFC 3454).  This document obsoletes RFC 3454.</t></abstract>
</front>
<seriesInfo name='RFC' value='7564'/>
<seriesInfo name='DOI' value='10.17487/RFC7564'/>
</reference>

<reference  anchor='RFC7613' target='http://www.rfc-editor.org/info/rfc7613'>
<front>
<title>Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords</title>
<author initials='P.' surname='Saint-Andre' fullname='P. Saint-Andre'><organization /></author>
<author initials='A.' surname='Melnikov' fullname='A. Melnikov'><organization /></author>
<date year='2015' month='August' />
<abstract><t>This document describes updated methods for handling Unicode strings representing usernames and passwords.  The previous approach was known as SASLprep (RFC 4013) and was based on stringprep (RFC 3454). The methods specified in this document provide a more sustainable approach to the handling of internationalized usernames and passwords.  The preparation, enforcement, and comparison of internationalized strings (PRECIS) framework, RFC 7564, obsoletes RFC 3454, and this document obsoletes RFC 4013.</t></abstract>
</front>
<seriesInfo name='RFC' value='7613'/>
<seriesInfo name='DOI' value='10.17487/RFC7613'/>
</reference>

<reference  anchor='RFC7622' target='http://www.rfc-editor.org/info/rfc7622'>
<front>
<title>Extensible Messaging and Presence Protocol (XMPP): Address Format</title>
<author initials='P.' surname='Saint-Andre' fullname='P. Saint-Andre'><organization /></author>
<date year='2015' month='September' />
<abstract><t>This document defines the address format for the Extensible Messaging and Presence Protocol (XMPP), including support for code points outside the ASCII range.  This document obsoletes RFC 6122.</t></abstract>
</front>
<seriesInfo name='RFC' value='7622'/>
<seriesInfo name='DOI' value='10.17487/RFC7622'/>
</reference>

<reference  anchor='RFC7700' target='http://www.rfc-editor.org/info/rfc7700'>
<front>
<title>Preparation, Enforcement, and Comparison of Internationalized Strings Representing Nicknames</title>
<author initials='P.' surname='Saint-Andre' fullname='P. Saint-Andre'><organization /></author>
<date year='2015' month='December' />
<abstract><t>This document describes methods for handling Unicode strings representing memorable, human-friendly names (called &quot;nicknames&quot;, &quot;display names&quot;, or &quot;petnames&quot;) for people, devices, accounts, websites, and other entities.</t></abstract>
</front>
<seriesInfo name='RFC' value='7700'/>
<seriesInfo name='DOI' value='10.17487/RFC7700'/>
</reference>

<reference  anchor='RFC7790' target='http://www.rfc-editor.org/info/rfc7790'>
<front>
<title>Mapping Characters for Classes of the Preparation, Enforcement, and Comparison of Internationalized Strings (PRECIS)</title>
<author initials='Y.' surname='Yoneya' fullname='Y. Yoneya'><organization /></author>
<author initials='T.' surname='Nemoto' fullname='T. Nemoto'><organization /></author>
<date year='2016' month='February' />
<abstract><t>The framework for the preparation, enforcement, and comparison of internationalized strings (PRECIS) defines several classes of strings for use in application protocols.  Because many protocols perform case-sensitive or case-insensitive string comparison, it is necessary to define methods for case mapping.  In addition, both the Internationalized Domain Names in Applications (IDNA) and the PRECIS problem statement describe mappings for internationalized strings that are not limited to case, but include width mapping and mapping of delimiters and other special characters that can be taken into consideration.  This document provides guidelines for designers of PRECIS profiles and describes several mappings that can be applied between receiving user input and passing permitted code points to internationalized protocols.  In particular, this document describes both locale-dependent and context-depending case mappings as well as additional mappings for delimiters and special characters.</t></abstract>
</front>
<seriesInfo name='RFC' value='7790'/>
<seriesInfo name='DOI' value='10.17487/RFC7790'/>
</reference>

<!-- draft-klensin-idna-5892upd-unicode70 (I-D Exists) -->
<reference anchor='IDNA-Unicode'>
<front>
<title>IDNA Update for Unicode 7.0.0</title>
<author initials='J' surname='Klensin' fullname='John Klensin'>
    <organization />
</author>
<author initials='P' surname='Faltstrom' fullname='Patrik Faltstrom'>
    <organization />
</author>
<date month='March' year='2015' />
</front>
<seriesInfo name='Work in Progress,' value='draft-klensin-idna-5892upd-unicode70-04' />
</reference>

    <reference  anchor='RFC2865' target='http://www.rfc-editor.org/info/rfc2865'>
    <front>
    <title>Remote Authentication Dial In User Service (RADIUS)</title>
    <author initials='C.' surname='Rigney' fullname='C. Rigney'><organization /></author>
    <author initials='S.' surname='Willens' fullname='S. Willens'><organization /></author>
    <author initials='A.' surname='Rubens' fullname='A. Rubens'><organization /></author>
    <author initials='W.' surname='Simpson' fullname='W. Simpson'><organization /></author>
    <date year='2000' month='June' />
    </front>
    <seriesInfo name='RFC' value='2865'/>
    <seriesInfo name='DOI' value='10.17487/RFC2865'/>
    <format type='ASCII' octets='146456'/>
    </reference>

    <reference  anchor='RFC3454' target='http://www.rfc-editor.org/info/rfc3454'>
    <front>
    <title>Preparation of Internationalized Strings (&quot;stringprep&quot;)</title>
    <author initials='P.' surname='Hoffman' fullname='P. Hoffman'><organization /></author>
    <author initials='M.' surname='Blanchet' fullname='M. Blanchet'><organization /></author>
    <date year='2002' month='December' />
    </front>
    <seriesInfo name='RFC' value='3454'/>
    <seriesInfo name='DOI' value='10.17487/RFC3454'/>
    <format type='ASCII' octets='138684'/>
    </reference>

    <reference  anchor='RFC3490' target='http://www.rfc-editor.org/info/rfc3490'>
    <front>
    <title>Internationalizing Domain Names in Applications (IDNA)</title>
    <author initials='P.' surname='Faltstrom' fullname='P. Faltstrom'><organization /></author>
    <author initials='P.' surname='Hoffman' fullname='P. Hoffman'><organization /></author>
    <author initials='A.' surname='Costello' fullname='A. Costello'><organization /></author>
    <date year='2003' month='March' />
    </front>
    <seriesInfo name='RFC' value='3490'/>
    <seriesInfo name='DOI' value='10.17487/RFC3490'/>
    <format type='ASCII' octets='51943'/>
    </reference>

    <reference  anchor='RFC3491' target='http://www.rfc-editor.org/info/rfc3491'>
    <front>
    <title>Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)</title>
    <author initials='P.' surname='Hoffman' fullname='P. Hoffman'><organization /></author>
    <author initials='M.' surname='Blanchet' fullname='M. Blanchet'><organization /></author>
    <date year='2003' month='March' />
    </front>
    <seriesInfo name='RFC' value='3491'/>
    <seriesInfo name='DOI' value='10.17487/RFC3491'/>
    <format type='ASCII' octets='10316'/>
    </reference>

    <reference  anchor='RFC3629' target='http://www.rfc-editor.org/info/rfc3629'>
    <front>
    <title>UTF-8, a transformation format of ISO 10646</title>
    <author initials='F.' surname='Yergeau' fullname='F. Yergeau'><organization /></author>
    <date year='2003' month='November' />
    </front>
    <seriesInfo name='STD' value='63'/>
    <seriesInfo name='RFC' value='3629'/>
    <seriesInfo name='DOI' value='10.17487/RFC3629'/>
    <format type='ASCII' octets='33856'/>
    </reference>

    <reference  anchor='RFC4422' target='http://www.rfc-editor.org/info/rfc4422'>
    <front>
    <title>Simple Authentication and Security Layer (SASL)</title>
    <author initials='A.' surname='Melnikov' fullname='A. Melnikov' role='editor'><organization /></author>
    <author initials='K.' surname='Zeilenga' fullname='K. Zeilenga' role='editor'><organization /></author>
    <date year='2006' month='June' />
    </front>
    <seriesInfo name='RFC' value='4422'/>
    <seriesInfo name='DOI' value='10.17487/RFC4422'/>
    <format type='ASCII' octets='73206'/>
    </reference>

    <reference  anchor='RFC4510' target='http://www.rfc-editor.org/info/rfc4510'>
    <front>
    <title>Lightweight Directory Access Protocol (LDAP): Technical Specification Road Map</title>
    <author initials='K.' surname='Zeilenga' fullname='K. Zeilenga' role='editor'><organization /></author>
    <date year='2006' month='June' />
    </front>
    <seriesInfo name='RFC' value='4510'/>
    <seriesInfo name='DOI' value='10.17487/RFC4510'/>
    <format type='ASCII' octets='12354'/>
    </reference>

    <reference  anchor='RFC4690' target='http://www.rfc-editor.org/info/rfc4690'>
    <front>
    <title>Review and Recommendations for Internationalized Domain Names (IDNs)</title>
    <author initials='J.' surname='Klensin' fullname='J. Klensin'><organization /></author>
    <author initials='P.' surname='Faltstrom' fullname='P. Faltstrom'><organization /></author>
    <author initials='C.' surname='Karp' fullname='C. Karp'><organization /></author>
    <author><organization>IAB</organization></author>
    <date year='2006' month='September' />
    </front>
    <seriesInfo name='RFC' value='4690'/>
    <seriesInfo name='DOI' value='10.17487/RFC4690'/>
    <format type='ASCII' octets='100929'/>
    </reference>

    <reference  anchor='RFC5226' target='http://www.rfc-editor.org/info/rfc5226'>
    <front>
    <title>Guidelines for Writing an IANA Considerations Section in RFCs</title>
    <author initials='T.' surname='Narten' fullname='T. Narten'><organization /></author>
    <author initials='H.' surname='Alvestrand' fullname='H. Alvestrand'><organization /></author>
    <date year='2008' month='May' />
    </front>
    <seriesInfo name='BCP' value='26'/>
    <seriesInfo name='RFC' value='5226'/>
    <seriesInfo name='DOI' value='10.17487/RFC5226'/>
    <format type='ASCII' octets='66160'/>
    </reference>

    <reference  anchor='RFC5234' target='http://www.rfc-editor.org/info/rfc5234'>
    <front>
    <title>Augmented BNF for Syntax Specifications: ABNF</title>
    <author initials='D.' surname='Crocker' fullname='D. Crocker' role='editor'><organization /></author>
    <author initials='P.' surname='Overell' fullname='P. Overell'><organization /></author>
    <date year='2008' month='January' />
    </front>
    <seriesInfo name='STD' value='68'/>
    <seriesInfo name='RFC' value='5234'/>
    <seriesInfo name='DOI' value='10.17487/RFC5234'/>
    <format type='ASCII' octets='26359'/>
    </reference>

    <reference  anchor='RFC5246' target='http://www.rfc-editor.org/info/rfc5246'>
    <front>
    <title>The Transport Layer Security (TLS) Protocol Version 1.2</title>
    <author initials='T.' surname='Dierks' fullname='T. Dierks'><organization /></author>
    <author initials='E.' surname='Rescorla' fullname='E. Rescorla'><organization /></author>
    <date year='2008' month='August' />
    </front>
    <seriesInfo name='RFC' value='5246'/>
    <seriesInfo name='DOI' value='10.17487/RFC5246'/>
    <format type='ASCII' octets='222395'/>
    </reference>

    <reference  anchor='RFC5890' target='http://www.rfc-editor.org/info/rfc5890'>
    <front>
    <title>Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework</title>
    <author initials='J.' surname='Klensin' fullname='J. Klensin'><organization /></author>
    <date year='2010' month='August' />
    </front>
    <seriesInfo name='RFC' value='5890'/>
    <seriesInfo name='DOI' value='10.17487/RFC5890'/>
    <format type='ASCII' octets='54245'/>
    </reference>

    <reference  anchor='RFC5891' target='http://www.rfc-editor.org/info/rfc5891'>
    <front>
    <title>Internationalized Domain Names in Applications (IDNA): Protocol</title>
    <author initials='J.' surname='Klensin' fullname='J. Klensin'><organization /></author>
    <date year='2010' month='August' />
    </front>
    <seriesInfo name='RFC' value='5891'/>
    <seriesInfo name='DOI' value='10.17487/RFC5891'/>
    <format type='ASCII' octets='38105'/>
    </reference>

    <reference  anchor='RFC5892' target='http://www.rfc-editor.org/info/rfc5892'>
    <front>
    <title>The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)</title>
    <author initials='P.' surname='Faltstrom' fullname='P. Faltstrom' role='editor'><organization /></author>
    <date year='2010' month='August' />
    </front>
    <seriesInfo name='RFC' value='5892'/>
    <seriesInfo name='DOI' value='10.17487/RFC5892'/>
    <format type='ASCII' octets='187370'/>
    </reference>

    <reference  anchor='RFC5893' target='http://www.rfc-editor.org/info/rfc5893'>
    <front>
    <title>Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)</title>
    <author initials='H.' surname='Alvestrand' fullname='H. Alvestrand' role='editor'><organization /></author>
    <author initials='C.' surname='Karp' fullname='C. Karp'><organization /></author>
    <date year='2010' month='August' />
    </front>
    <seriesInfo name='RFC' value='5893'/>
    <seriesInfo name='DOI' value='10.17487/RFC5893'/>
    <format type='ASCII' octets='38870'/>
    </reference>

    <reference  anchor='RFC5894' target='http://www.rfc-editor.org/info/rfc5894'>
    <front>
    <title>Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale</title>
    <author initials='J.' surname='Klensin' fullname='J. Klensin'><organization /></author>
    <date year='2010' month='August' />
    </front>
    <seriesInfo name='RFC' value='5894'/>
    <seriesInfo name='DOI' value='10.17487/RFC5894'/>
    <format type='ASCII' octets='115174'/>
    </reference>

    <reference  anchor='RFC5895' target='http://www.rfc-editor.org/info/rfc5895'>
    <front>
    <title>Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008</title>
    <author initials='P.' surname='Resnick' fullname='P. Resnick'><organization /></author>
    <author initials='P.' surname='Hoffman' fullname='P. Hoffman'><organization /></author>
    <date year='2010' month='September' />
    </front>
    <seriesInfo name='RFC' value='5895'/>
    <seriesInfo name='DOI' value='10.17487/RFC5895'/>
    <format type='ASCII' octets='16556'/>
    </reference>

    <reference  anchor='RFC6452' target='http://www.rfc-editor.org/info/rfc6452'>
    <front>
    <title>The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) - Unicode 6.0</title>
    <author initials='P.' surname='Faltstrom' fullname='P. Faltstrom' role='editor'><organization /></author>
    <author initials='P.' surname='Hoffman' fullname='P. Hoffman' role='editor'><organization /></author>
    <date year='2011' month='November' />
    </front>
    <seriesInfo name='RFC' value='6452'/>
    <seriesInfo name='DOI' value='10.17487/RFC6452'/>
    <format type='ASCII' octets='6817'/>
    </reference>

    <reference  anchor='RFC6885' target='http://www.rfc-editor.org/info/rfc6885'>
    <front>
    <title>Stringprep Revision and Problem Statement for the Preparation and Comparison of Internationalized Strings (PRECIS)</title>
    <author initials='M.' surname='Blanchet' fullname='M. Blanchet'><organization /></author>
    <author initials='A.' surname='Sullivan' fullname='A. Sullivan'><organization /></author>
    <date year='2013' month='March' />
    </front>
    <seriesInfo name='RFC' value='6885'/>
    <seriesInfo name='DOI' value='10.17487/RFC6885'/>
    <format type='ASCII' octets='72167'/>
    </reference>

    <reference  anchor='RFC6943' target='http://www.rfc-editor.org/info/rfc6943'>
    <front>
    <title>Issues in Identifier Comparison for Security Purposes</title>
    <author initials='D.' surname='Thaler' fullname='D. Thaler' role='editor'><organization /></author>
    <date year='2013' month='May' />
    </front>
    <seriesInfo name='RFC' value='6943'/>
    <seriesInfo name='DOI' value='10.17487/RFC6943'/>
    <format type='ASCII' octets='62676'/>
    </reference>

<reference anchor="UAX9" target='http://unicode.org/reports/tr9/'>
  <front>
    <title>Unicode Bidirectional Algorithm</title>
    <author>
      <organization>Unicode Standard Annex #9</organization>
    </author>
<!--    <date month="June" year="2014" /> -->
  <date/>
  </front>
  <seriesInfo name='edited by Mark Davis, Aharon Lanin, and Andrew Glass.'
     value='An integral part of The Unicode Standard'/>
</reference>

<reference anchor="UAX11" target='http://unicode.org/reports/tr11/'>
  <front>
    <title>East Asian Width</title>
    <author>
      <organization>Unicode Standard Annex #11</organization>
    </author>
<!--    <date month="June" year="2014" /> -->
  <date/>
  </front>
 <seriesInfo name='edited by Ken Lunde.' value='An integral part of The Unicode Standard'/>
</reference>

<reference anchor="UAX15" target='http://unicode.org/reports/tr15/'>
  <front>
    <title>Unicode Normalization Forms</title>
    <author>
      <organization>Unicode Standard Annex #15</organization>
    </author>
<!--    <date month="June" year="2014" /> -->
  <date/>
  </front>
 <seriesInfo name='edited by Mark Davis and Ken Whistler.' value='An
integral part of The Unicode Standard'/>
</reference>

<reference anchor="UTR36" target='http://unicode.org/reports/tr36/'>
  <front>
    <title>Unicode Security Considerations</title>
    <author>
      <organization>Unicode Technical Report #36</organization>
    </author>
<!--    <date month="June" year="2014" /> -->
  <date/>
  </front>
 <seriesInfo name='by Mark Davis' value='and Michel Suignard'/>
</reference>

<reference anchor="UTS39" target='http://unicode.org/reports/tr39/'>
  <front>
    <title>Unicode Security Mechanisms</title>
    <author>
      <organization>Unicode Technical Standard #39</organization>
    </author>
<!--    <date month="June" year="2014" /> -->
  <date/>
  </front>
 <seriesInfo name='edited by Mark Davis' value='and Michel Suignard'/>
</reference>

<reference anchor="Err4568" target="http://www.rfc-editor.org">
     <front>
       <title>Erratum ID 4568</title>
       <author>
         <organization>RFC Errata</organization>
       </author>
       <date month="" year="" />
     </front>
     <seriesInfo name="RFC" value="7564" />
   </reference>

    </references>

    <section title="Differences from RFC 7564" anchor="diffs">
      <t>The following modifications were made from <xref target='RFC7564'/>.</t>
      <t>
        <list style='symbols'>
          <t>Recommended the Unicode toLower() operation over the Unicode toCaseFold() operation.</t>
          <t>Clarified the meaning of "preparation".</t>
        </list>
      </t>
      <t>See <xref target='RFC7613'/> for a description of the differences from <xref target='RFC3454'/>.</t>
    </section>

    <section title="Acknowledgements">
      <t>Thanks to Martin Duerst, John Klensin, Christian Schudt, and Sam Whited for their feedback.  Thanks to Sam Whited also for submitting <xref target="Err4568"/>.</t>
      <t>See <xref target='RFC7564'/> for acknowledgements related to the specification that this document supersedes.</t>

      <t>Some algorithms and textual descriptions have been borrowed from
      <xref target='RFC5892'/>.  Some text regarding security has been
      borrowed from <xref target='RFC5890'/>, <xref target='RFC7613'/>, and 
      <xref target='RFC7622'/>.</t>

    </section>

  </back>
</rfc>
