<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc ipr="trust200902" category="std" docName="draft-ietf-ipfix-text-adt-08.txt">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>

<front>
    <title abbrev="IPFIX Text Types">
      Textual Representation of IPFIX Abstract Data Types
    </title>
    <author initials="B." surname="Trammell" fullname="Brian Trammell">
      <organization abbrev="ETH Zurich">
      Swiss Federal Institute of Technology Zurich
      </organization>
      <address>
        <postal>
          <street>Gloriastrasse 35</street>
          <city>8092 Zurich</city>
          <country>Switzerland</country>
        </postal>
        <phone>+41 44 632 70 13</phone>
        <email>ietf@trammell.ch</email>
      </address>
    </author>

    <date year="2014" month="August" day="4"/>
    <area>Operations</area>
    <workgroup>IPFIX Working Group</workgroup>
    <abstract>

      <t>This document defines UTF-8 representations for IPFIX abstract data
      types, to support interoperable usage of the IPFIX Information Elements
      with protocols based on textual encodings.</t>

    </abstract>
  </front>

  <middle>

    <section anchor="sec-intro" title="Introduction">        

      <t>The IPFIX Information Model<xref target="RFC7012"/> provides a set 
      of abstract data types for  the <xref target="IANA-IPFIX">IANA IPFIX Information
      Element Registry</xref>, which contains a rich set of Information Elements for
      description of information about network entities and network traffic
      data, and abstract data types for these Information Elements. The <xref
      target="RFC7011">IPFIX Protocol Specification</xref>, in turn, defines a 
      big-endian binary encoding for these abstract data types suitable for use 
      with the IPFIX Protocol.</t>

      <t>However, present and future operations and management protocols and
      applications may use textual encodings, and generic framing and
      structure, as in <xref target="RFC7159">JSON</xref> or XML. A 
      definition of canonical textual encodings
      for the IPFIX abstract data types would allow this set of Information
      Elements to be used for such applications, and for these applications to
      interoperate with IPFIX applications at the Information Element
      definition level.</t>

      <t>Note that templating or other mechanisms for data description for
      such applications and protocols are application-specific, and therefore
      out of scope for this document: only Information Element identification
      and value representation are defined here.</t>

      <t>In most cases where a textual representation will be used, an explicit 
      tradeoff is made for human readability or manipulability over compactness; 
      this assumption is used in defining standard representations of IPFIX 
      ADTs.</t>

    </section>

    <section title="Terminology">

      <t>Capitalized terms defined in the <xref target="RFC7011">IPFIX Protocol Specification</xref> and the <xref target="RFC7012">IPFIX Information Model</xref> are used in this document as defined in those documents. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in <xref target="RFC2119"/>. In addition, this document defines the following terminology for its own use:</t>
      
       <t><list style="hanging">

        <t hangText="Enclosing Context"><vspace/>A textual representation of
        Information Element values is applied to use the IPFIX Information Model within
        some existing textual format (e.g. <xref target="W3C-XML">XML</xref>, 
        <xref target="RFC7159">JSON</xref>). This outer format is
        referred to as the Enclosing Context within this document. Enclosing
        Contexts define escaping and quoting rules for represented values.</t>
          
      </list></t>

    </section>

    <section title="Identifying Information Elements">

      <t>The <xref target="IANA-IPFIX">IPFIX Information Element
      Registry</xref> defines a set of Information Elements numbered by
      Information Element Identifiers and named for human-readability. These
      Information Element Identifiers are meant for use with the IPFIX
      protocol, and have little meaning when applying the IPFIX Information
      Element Registry to textual representations.</t>

      <t>Instead, applications using textual representations of Information
      Elements use Information Element names to identify them; see <xref
      target="sec-examples"/> for examples illustrating this principle.</t>

    </section>

    <section title="Data Type Encodings">

      <t>Each subsection of this section defines a textual encoding for the
      abstract data types defined in <xref
      target="RFC7012"/>. This section
      uses ABNF, including the Core Rules in
      Appendix B of <xref target="RFC5234"/>, to describe the format 
      of textual representations of IPFIX
      abstract data types.</t>

      <t>If future documents update <xref target="RFC7012"/> to add new 
      Abstract Data Types to the IPFIX Information Model, and those Abstract
      Data Types are generally useful, this document will also need to be
      updated in order to define textual encodings for those Abstract
      Data Types.</t>

      <section title="octetArray" anchor="sec-octetarray">

        <t>If the Enclosing Context defines a representation for binary
        objects, that representation SHOULD be used.</t>
        
        <t>Otherwise, since the goal of textual representation of Information
        Elements is human-readability over compactness, the values of Information
        Elements of the octetArray data type are represented as a string of
        pairs of hexadecimal digits, one pair per byte, in the order the bytes
        would appear on the wire were the octetArray encoded directly in IPFIX
        per <xref target="RFC7011"/>. Whitespace
        may occur between any pair of digits to assist in human readability of
        the string, but is not necessary. In ABNF:</t>
        
        <t>hex-octet = 2HEXDIG</t>
        
        <t>octetarray = hex-octet *([WSP] hex-octet)</t>
        
      </section>

      <section title="unsigned8, unsigned16, unsigned32, and unsigned64">

        <t>If the Enclosing Context defines a representation for unsigned
        integers, that representation SHOULD be used.</t>

        <t>In the special case that the unsigned Information Element has
        identifier semantics, and refers to a set of codepoints, either in an
        external registry, a sub-registry, or directly in the description of
        the Information Element, then the name or short description for that
        codepoint as a string MAY be used to improve readability. </t>
          
        <t>Otherwise, the values of Information Elements of an unsigned
        integer type may be represented either as unprefixed base-10 (decimal)
        strings, as base-16 (hexadecimal) strings prefixed by "0x", or as 
        base-2 (binary) strings prefixed by "0b". In ABNF:</t>

        <t>unsigned = 1*DIGIT / "0x" 1*HEXDIG / "0b" 1*BIT </t>

        <t>Leading zeroes are allowed in any representation, and do not signify
        base-8 (octal) representation. Binary representation is intended for use with 
        Information Elements with flag semantics, but can be used in any case.</t>

        <t>The encoded value MUST be in range for the corresponding abstract
        data type or Information Element. Out of range values are
        interpreted as clipped to the implicit range for the Information
        Element as defined by the abstract data type, or to the explicit range
        of the Information Element if defined. Minimum and maximum values for
        abstract data types are shown in <xref target="tab-unsigned-range"/>
        below.</t>

        <texttable anchor="tab-unsigned-range" align="center"
          title="Ranges for unsigned abstract data types (in decimal)">
          <ttcol align="right">type</ttcol>
          <ttcol align="right">minimum</ttcol>
          <ttcol align="right">maximum</ttcol>
          <c>unsigned8</c>  <c>0</c> <c>255</c>
          <c>unsigned16</c> <c>0</c> <c>65536</c>
          <c>unsigned32</c> <c>0</c> <c>4294967295</c>
          <c>unsigned64</c> <c>0</c> <c>18446744073709551615</c>
        </texttable>
        
      </section>

      <section title="signed8, signed16, signed32, and signed64">

        <t>If the Enclosing Context defines a representation for signed
        integers, that representation SHOULD be used.</t>

        <t>Otherwise, the values of Information Elements of signed integer
        types are represented as optionally-prefixed base-10 (decimal)
        strings. In ABNF:</t>

        <t>sign = "+" / "-"</t>

        <t>signed = [sign] 1*DIGIT</t>

        <t>If the sign is omitted, it is assumed to be positive. Leading zeroes
        are allowed, and do not signify base-8 (octal) encoding. The representation 
        "-0" is explicitly allowed, and is equal to zero.</t>

        <t>The encoded value MUST be in range for the corresponding abstract
        data type or Information Element. Out of range values are to be
        interpreted as clipped to the implicit range for the Information
        Element as defined by the abstract data type, or to the explicit range
        of the Information Element if defined. Minimum and maximum values for
        abstract data types are shown in <xref target="tab-signed-range"/>
        below.</t>

        <texttable anchor="tab-signed-range" align="center"
          title="Ranges for signed abstract data types (in decimal)">
          <ttcol align="right">type</ttcol>
          <ttcol align="right">minimum</ttcol>
          <ttcol align="right">maximum</ttcol>
          <c>signed8</c>  <c>-128</c> <c>+127</c>
          <c>signed16</c> <c>-32768</c> <c>+32767</c>
          <c>signed32</c> <c>-2147483648</c> <c>+2147483647</c>
          <c>signed64</c> <c>-9223372036854775808</c> <c>+9223372036854775807</c>
        </texttable>

      </section>

      <section title="float32 and float64">

        <t>If the Enclosing Context defines a representation for floating
        point numbers, that representation SHOULD be used.</t>

        <t>Otherwise, the values of Information Elements of float32 or 
          float64 types are represented as optionally sign-prefixed, 
          optionally base-10 exponent-suffixed, floating point decimal 
          numbers, as in <xref target="IEEE.754.2008"/>. The special
          strings "NaN", "+inf", and "-inf" represent not a number, 
          positive infinity and negative infinity, respectively.</t>

          <t>In ABNF:</t>

         <t>sign = "+" / "-"</t>

         <t>exponent = "e" [sign] 1*3DIGIT</t>

         <t>right-decimal = "." 1*DIGIT</t>

         <t>mantissa = 1*DIGIT [right-decimal]</t>

         <t>num = [sign] mantissa [exponent]</t>

         <t>naninf = "NaN" / (sign "inf")</t>

         <t>float = num / naninf </t>

         <t>The expressed value is ( mantissa * 10 ^ exponent ). If the sign is
         omitted, it is assumed to be positive. If the exponent is omitted, it
         is assumed to be zero. Leading zeroes may appear in the mantissa
         and/or the exponent. Values MUST be within range for 
         <xref target="IEEE.754.2008"/> single or double precision 
         numbers; finite values outside the approprate range are to be interpreted as
         clamped to be within the range. Note that no more than three digits are 
         required or allowed for exponents in this encoding due to these ranges.</t>

         <t>Note that, since this representation is meant for human readability,
          writers MAY sacrifice precision to use a more human-readable representation
          of a given value, at the expense of the ability to recover the exact bit
          pattern at the reader. Therefore, decoders MUST NOT assume that the represented 
          values are exactly compararable for equality. </t>

<!--         
         <t>Minimum and maximum values for abstract data types are taken from
         , and shown in <xref target="tab-float-range"/> below.</t>
         
         <texttable anchor="tab-float-range" align="center"
           title="Ranges for floating-point abstract data types">
           <ttcol align="right">type</ttcol>
           <ttcol align="right">minimum nonzero abs(x)</ttcol>
           <ttcol align="right">maximum abs(x)</ttcol>
           <c>float32</c> <c>5.877e-39</c> <c>3.403e38</c>
           <c>float64</c> <c>1.1125e-308</c> <c>+1.798e308</c>
         </texttable>
    -->     

      </section>

      <section title="boolean">

        <t>If the Enclosing Context defines a representation for boolean
        values, that representation SHOULD be represented.</t>

        <t>Otherwise, a true boolean value is represented by the
        literal string "true", and a false boolean value with the literal string "false".
        In ABNF:</t>

        <t>boolean-true = "true"</t>
        
        <t>boolean-false = "false"</t>
        
        <t>boolean = boolean-true / boolean-false</t>

      </section>

      <section title="macAddress">

        <t>MAC addresses are represented as IEEE 802 MAC-48 addresses,
        hexadecimal bytes, most significant byte first, separated by colons. In
        ABNF:</t>

        <t>hex-octet = 2HEXDIG</t>

        <t>macaddress = hex-octet 5( ":" hex-octet )</t>

      </section>

      <section title="string">

        <t>As Information Elements of the string type are simply Unicode 
        strings (encoded as UTF-8 when appearing in Data Sets in IPFIX 
        Messages <xref target="RFC7011"/>), they are represented directly, 
        using the Unicode encoding rules and quoting and escaping rules of the 
        Enclosing Context.</t>

        <t>If the Enclosing Context cannot natively represent Unicode characters, 
        the escaping facility provided by the Enclosing Context MUST be used for non-
        representable characters. Additionally, strings containing characters reserved in
        the Enclosing Context (e.g. control characters, markup characters, quotes) 
        MUST be escaped or quoted according to the rules of the Enclosing Context.</t>

        <t>It is presumed that the Enclosing Context has sufficient restrictions on 
        the use of Unicode to prevent the unsafe use of non-printing and control characters. As there is no accepted solution for the processing and safe display of mixed-direction strings, mixed-direction strings should be avoided using this encoding. Note also that, since this draft presents no additional requirements for the normalization of Unicode strings, care must be taken when comparing strings using this encoding; direct byte-pattern comparisons are not sufficient for determining whether two strings are equivalent. See <xref target="RFC6885"/> and <xref target="I-D.ietf-precis-framework"/> for more on possible unexpected results and related risks in comparing Unicode strings.</t>

      </section>

      <section title="The dateTime ADTs">

        <t>Timestamp abstract data types are represented generally as in <xref
        target="RFC3339"/>, with two important differences. First, all IPFIX
        timestamps are expressed in terms of UTC, so textual representations of
        these Information Elements are explicitly in UTC as well. Time zone
        offsets are therefore not required or supported. Second, there are four
        timestamp abstract data types, separated by the precision which they
        can express. Fractional seconds are omitted in dateTimeSeconds,
        expressed in milliseconds in dateTimeMilliseconds, and so on.</t>
        
        <t>In ABNF, taken from <xref target="RFC3339"/> and modified:</t>

        <figure><artwork>
date-fullyear   = 4DIGIT
date-month      = 2DIGIT  ; 01-12
date-mday       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31
time-hour       = 2DIGIT  ; 00-23
time-minute     = 2DIGIT  ; 00-59
time-second     = 2DIGIT  ; 00-58, 00-59, 00-60
time-msec       = "." 3DIGIT
time-usec       = "." 6DIGIT
time-nsec       = "." 9DIGIT
full-date       = date-fullyear "-" date-month "-" date-mday
integer-time    = time-hour ":" time-minute ":" time-second

datetimeseconds      = full-date "T" integer-time
datetimemilliseconds = full-date "T" integer-time "." time-msec
datetimemicroseconds = full-date "T" integer-time "." time-usec
datetimenanoseconds  = full-date "T" integer-time "." time-nsec
        </artwork></figure>
 
      </section>

      <section title="ipv4Address">

        <t>IP version 4 addresses are represented in dotted-quad format,
        most-significant-byte first, as it would in a Uniform Resource
        Identifier <xref target="RFC3986"/>; the ABNF for an IPv4 address is
        taken from <xref target="RFC3986"/> and reproduced below:</t>
        
        <figure><artwork>
dec-octet   = DIGIT                 ; 0-9
            / %x31-39 DIGIT         ; 10-99
            / "1" 2DIGIT            ; 100-199
            / "2" %x30-34 DIGIT     ; 200-249
            / "25" %x30-35          ; 250-255

ipv4address = dec-octet 3( "." dec-octet )
        </artwork></figure>

      </section>

      <section title="ipv6Address">

        <t>IP version 6 addresses are represented as in section 2.2 of <xref
        target="RFC4291"/>, as updated by section 4 of <xref
        target="RFC5952"/>. The ABNF for an IPv6 address is taken from <xref
        target="RFC3986"/> and reproduced below, using the ipv4address production from the previous section:</t>

        <figure><artwork>
ls32        = ( h16 ":" h16 ) / ipv4address
            ; least-significant 32 bits of address
h16         = 1*4HEXDIG
            ; 16 bits of address represented in hexadecimal
            ; zeroes to suppressed as in RFC 5952

ipv6address =                            6( h16 ":" ) ls32
            /                       "::" 5( h16 ":" ) ls32
            / [               h16 ] "::" 4( h16 ":" ) ls32
            / [     h16 ":"   h16 ] "::" 3( h16 ":" ) ls32
            / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
            / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
            / [ *4( h16 ":" ) h16 ] "::"              ls32
            / [ *5( h16 ":" ) h16 ] "::"              h16
            / [ *6( h16 ":" ) h16 ] "::"
        </artwork></figure>
        
      </section>

      <section title="basicList, subTemplateList, and subTemplateMultiList">

        <t>These abstract data types, defined for <xref target="RFC6313">IPFIX
        Structured Data</xref>, do not represent actual data types; they are
        instead designed to provide a mechanism by which complex structure can
        be represented in IPFIX below the template level. It is assumed that
        protocols using textual Information Element representation will provide
        their own structure. Therefore, Information Elements of these Data
        Types MUST NOT be used in textual representations.</t>

      </section>

    </section>

    <section title="Security Considerations">

      <t>The security considerations for the IPFIX Protocol <xref target="RFC7011"/> 
      apply. </t>

      <t>Implementations of decoders of Information Element values using these 
      representations must take care to correctly handle invalid input, but
      the encodings presented here are not special in that respect.</t>

      <t>The encoding specified in this document, and representations that may be built upon it, are specifically not intended for the storage of data. However, since storage of data in the format in which it is exchanged is a very common practice, and the ubiquity of tools for indexing and searching text significantly increases the ease of searching and the risk of privacy-sensitive data being accidentally indexed or searched, the privacy considerations in Section 11.8 of <xref target="RFC7011"/> are especially important to observe when storing data using the encoding specified in this document that was derived from the measurement of network traffic.</t>
      
      <t>When using representations based on this encoding to transmit or store network traffic data, consider omitting especially privacy-sensitive values by not representing the columns or keys containing those values, as in black-marker anonymization as discussed in Section 4 of <xref target="RFC6235"/>. Other anonymization techinques described in <xref target="RFC6235"/> may also be useful in these situations.</t>

      <t>The encodings for all Abstract Data Types other than 'string' are defined 
        in such a way as to be representable in the US-ASCII character set, and 
        therefore should be unproblematic for all Enclosing Contexts. However, the 
        'string' Abstract Data Type may be vulnerable to problems with ill-formed 
        UTF-8 strings as discussed in section 6.1.6 of <xref target="RFC7011"/>; 
        see <xref target="UTF8-EXPLOIT"/> for background.</t>
      
    </section>
    
    <section title="IANA Considerations">

      <t>This document has no considerations for IANA.</t>

    </section>
    
    <section title="Acknowledgments">
      <t>Thanks to Paul Aitken, Benoit Claise, Andrew Feren, Juergen Quittek, 
      David Black, and the IESG for the review and comments. Thanks to Dave Thaler 
      and Stephan Neuhaus for discussions which improved the floating-point 
      representation section. This work is materially supported by the European 
      Union Seventh Framework Programme under grant agreement 318627 mPlane.</t>
    </section>

  </middle>
  
  <back>
    
    <references title="Normative References">
      <?rfc include="reference.RFC.2119" ?> 
      <?rfc include="reference.RFC.3339" ?> 
      <?rfc include="reference.RFC.3986" ?>      
      <?rfc include="reference.RFC.4291" ?>
      <?rfc include="reference.RFC.5234" ?>
      <?rfc include="reference.RFC.5952" ?>
      <?rfc include="reference.RFC.7011" ?>      

    </references>
    
    <references title="Informative References">
      <?rfc include="reference.RFC.6235" ?>
      <?rfc include="reference.RFC.6313" ?>
      <?rfc include="reference.RFC.6885" ?>
      <?rfc include="reference.RFC.7012" ?>
      <?rfc include="reference.RFC.7013" ?>
      <?rfc include="reference.RFC.7159" ?>      
      <?rfc include="reference.I-D.ietf-precis-framework" ?>
      <reference anchor='W3C-XML'>
        <front>
          <title>W3C Recommendation: Extensible Markup Language (XML) 1.0 (Fifth Edition)</title>
          <author surname="Bray" initials="T."/>
          <author surname="Paoli" initials="J."/>
          <author surname="Sperberg-McQueen" initials="C.M."/>
          <author surname="Maler" initials="E."/>
          <author surname="Yergeau" initials="F."/>
          <date month="September" year="2008"/>
        </front>
      </reference>
      <reference anchor='IEEE.754.2008'>
        <front>
          <title>Standard for Floating-Point Arithmetic (IEEE Standard 754)</title>
          <author surname="Instute of Electrical and Electronic Engineers"/>
          <date month="August" year="2008"/>
        </front>
      </reference>
      <reference anchor='IANA-IPFIX'>
        <front>
          <title>IP Flow Information Export Information Elements (http://www.iana.org/assignments/ipfix/ipfix.xml)</title>
          <author surname="Internet Assigned Numbers Authority"/>
          <date month="November" year="2012"/>
        </front>
      </reference>
      <reference anchor='UTF8-EXPLOIT'>
        <front>
          <title>Unicode Technical Report #36: Unicode Security Considerations</title>
          <author surname="Davis" initials="M."/>
          <author surname="Suignard" initials="M."/>
          <date month="July" year="2012"/>
        </front>
      </reference>
    </references>
    
    <section anchor="sec-examples" title="Example">
      
      <t>In this section, we examine an IPFIX Template and a Data Record
      defined by that Template, and show how that Data Record would be
      represented in JSON according to the specification in this document. Note
      that this is specifically NOT a recommendation for a particular
      representation, merely an illustration of the encodings in this
      document; the quoting and formatting in the example are JSON-specific.</t>

      <t><xref target="fig-tmpl"/> shows a Template in IESpec format as defined in section 10.1 of <xref target="RFC7013"/>; a corresponding JSON Object representing a record defined by this template in the text format specified in this document is shown in <xref target="fig-json"/>.</t>

      <figure title="Sample flow template in IESpec format" anchor="fig-tmpl">
      <artwork><![CDATA[
      flowStartMilliseconds(152)<dateTimeMilliseconds>[8]
      flowEndMilliseconds(153)<dateTimeMilliseconds>[8]
      octetDeltaCount(1)<unsigned64>[4]
      packetDeltaCount(2)<unsigned64>[4]
      sourceIPv6Address(27)<ipv6Address>[16]{key}
      destinationIPv6Address(28)<ipv6Address>[16]{key}
      sourceTransportPort(7)<unsigned16>[2]{key}
      destinationTransportPort(11)<unsigned16>[2]{key}
      protocolIdentifier(4)<unsigned8>[1]{key}
      tcpControlBits(6)<unsigned8>[1]
      flowEndReason(136)<unsigned8>[1]
      ]]></artwork></figure>

      <figure title="JSON object containing sample flow" anchor="fig-json">
      <artwork><![CDATA[
        {
            "flowStartMilliseconds": "2012-11-05T18:31:01.135",
            "flowEndMilliseconds": "2012-11-05T18:31:02.880",
            "octetDeltaCount": 195383,
            "packetDeltaCount": 88,
            "sourceIPv6Address": "2001:db8:c:1337::2",
            "destinationIPv6Address": "2001:db8:c:1337::3",
            "sourceTransportPort": 80,
            "destinationTransportPort": 32991,
            "protocolIdentifier": "tcp",
            "tcpControlBits": 19,
            "flowEndReason": 3
        }
      ]]></artwork></figure>

      
    </section>
    
  </back>
  
</rfc>