<?xml version="1.0" encoding="UTF-8"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
     An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC4506 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4506.xml">
<!ENTITY RFC5662 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5662.xml">
<!ENTITY RFC5666 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5666.xml">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<?rfc strict="yes" ?>
<?rfc toc="yes"?>
<?rfc tocdepth="2"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<rfc ipr="trust200902" 
     category="std"
     docName="draft-dnoveck-nfsv4-rpcrdma-xcharext-02">
  <front>
    <title abbrev="RPC-over-RDMA Transport Characteristics">
      RPC-over-RDMA Extension to Manage Transport Characteristics
    </title>
    <author initials="D." surname="Noveck" fullname="David Noveck">
      <organization abbrev="HPE">
        Hewlett Packard Enterprise
      </organization>
      <address>
        <postal>
          <street>165 Dascomb Road</street> 
          <city>Andover</city>
          <region>MA</region>
          <code>01810</code>
          <country>USA</country>
        </postal>
        <phone>+1 781-572-8038</phone>
        <email>davenoveck@gmail.com</email>
      </address>
    </author>
    <date year="2016"/>

    <area>Transport</area>
    <workgroup>Network File System Version 4</workgroup>
    <abstract>
      <t>
        This document specifies an extension to RPC-over-RDMA Version Two. 
        The extension enables endpoints of an RPC-over-RDMA connection 
        to exchange information which can be used to optimize message transfer.
      </t>
    </abstract>
  </front>
  <middle>
    <section title="Preliminaries" anchor="PRELIM">	
      <section title="Requirements Language" anchor="INTRO-req">
        <t>
          The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", 
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", 
          "MAY", and "OPTIONAL" in this document are to be interpreted 
          as described in <xref target="RFC2119"/>.  
        </t>
      </section>
      <section title="Introduction" anchor="PRELIM-intro">
        <t>
          This document specifies an extension to RPC-over-RDMA Version Two.
          It allows each participating endpoint on a single connection to 
          communicate various characteristics of its implementation, 
          to request changes in characteristics of the other endpoint, 
          and to effect changes and notify the other endpoint of changes 
          to these characteristics during operation.
        </t>	
        <t>
          The extension described herein specifies OPTIONAL message 
          header types to implement this mechanism. The means by which 
          the implementation support status of these OPTIONAL types 
          is ascertained is described in [rpcrmdav2].
        </t>	
        <t>
          Although this document specifies the new OPTIONAL message header
          types to implement these functions, the precise means by which
          the presence of support for these OPTIONAL functions will be 
          ascertained is not
          described here, as would be done more appropriately by the RFC 
          defining a version of RPC-over-RDMA which supports protocol
          extension.
        </t>	
        <t>
          This document is currently written to conform to the extension
          model for RPC-over-RDMA Version Two as described in
          <xref target="rpcrdmav2"/>. 	
        </t>	
      </section>

      <section title="Role Terminology" anchor="PRELIM-term">
        <t>
          A number of different terms are used regarding the roles of the
          two participants in an RPC-over-RMA connection.  Some of these
          roles last for the duration of a connection while others vary 
          from request to request or from
          message to message.
        </t>	        
        <t>
          The roles of the client and server are fixed for the lifetime of
          the connection, with the client defined as the endpoint which 
          initiated the connection. 	
        </t>	
        <t>
          The roles of requester and responder often parallel those of
          client and server, although this is not always the case.  
          Most requests are made in the forward direction, in which the client
          is the requester and the server is the responder.  However,
          backward-direction requests are possible, in which case the server
          is the requester and the client is the responder.  As a result, 
          clients and servers may both act as requesters and responders.
        </t>	
        <t>
          The roles of sender and receiver vary from message.  With 
          regard to the messages described in this document, both the
          client and the server can act as sender and receiver.  With 
          regard to messages used to transfer RPC requests and replies,
          the requester sends requests and receives replies while the
          responder receives requests and sends replies.  	
        </t>	
      </section>
    </section>

    <section title="Transport Characteristics" anchor="CHAR">
      <section title="Characteristics Model" anchor="CHAR-model">
        <t>
          An initial set of receiver and sender characteristics are 
          specified in this document. An extensible approach is used, 
          allowing new characteristics to be defined in future standards 
          track documents.
        </t>
        <t> 
          Such characteristics are specified using:
        <list style="symbols">
          <t>
            A code identifying the particular transport characteristic being
            specified.  
          </t>
          <t>
            A nominally opaque array which contains within it the XDR 
            encoding of
            the specific characteristic indicated by the associated code.
          </t>
        </list>
        </t>
        <t>
          The following XDR types are used by operations that deal with 
          transport characteristics:
        <figure align="left">
          <artwork xml:space="preserve" align="left">
&lt;CODE BEGINS&gt;
 
typedef xcharid         uint32;

struct xcharval {
        xcharid         xchv_which;
        opaque          xchv_data&lt;&gt;;
};

typedef xcharspec       xcharval&lt;&gt;;

typedef uint32          xcharsubset&lt;&gt;;

&lt;CODE ENDS&gt;
          </artwork>
        </figure> 
        </t>
        <t>
          An xcharid specifies a particular transport characteristic.
          In order to allow easier XDR extension of the set 
          of characteristics by concatenating XDR files, 
          specific characteristics
          are defined as const values rather than as elements in an enum.
        </t>
        <t>
          An xcharval specifies a value of a particular transport 
          characteristic
          with the particular characteristic identified by xchv_which, while
          the associated value of that characteristic is contained within
          xchv_data.
        </t>
        <t>
          While xchv_data is defined as opaque within the XDR, the contents
          are interpreted using the XDR typedef associated with the
          characteristic specified by xchv_which.  The receiver of a message
          containing an xcharval MUST report an XDR error if 
          the length of xchv_data is such that it extends beyond the 
          bounds of the message transferred.
        </t>
        <t>
          In cases in which the xcharid specified by xchv_which is understood
          by the receiver, the receiver also MUST report an XDR error
          if either of
          the following occur:  
        <list style="symbols">
          <t>
            The nominally opaque data within xchv_data is not valid
            when interpreted using the characteristic-associated typedef.
          </t>
          <t>
            The length of xchv_data is insufficient to contain the data
            represented by the characteristic-associated typedef.
          </t>
        </list>
        </t>
        <t>
          Note that no error is to be reported if xchv_which is unknown
          to the receiver.
          In that case, that xcharval is not processed and processing 
          continues using the next xcharval, if any.
        </t>
        <t>
          An xcharspec specifies a set of transport characteristics.  No
          particular ordering of the xcharvals within it is imposed. 
        </t>
        <t>
          An xcharsubset identifies a subset of the characteristics in a 
          previously specified xcharspec.  Each bit in the mask denotes
          a particular element in a previously specified xcharspec.  If
          a particular xcharval is at position N in the array, then bit
          number N mod 32 in word N div 32 specifies whether that particular
          xcharval is included in the defined subset.  Words beyond the last
          one specified are treated as containing zero.    
        </t>
        <t>
          xcharsubsets are useful in a number of contexts:
        <list style="symbols">
          <t>
            In an initial specification of transport characteristics,
            they allow the sender to specify what subset of those are 
            subject to later change.
          </t>
          <t>
            In responding to a request to modify a set of transport 
            characteristics, allows the responding endpoint to specify the 
            subset of those characteristics that have been performed, have 
            been requested, or have been accepted for later change,
            with notification of that change to be done asynchronously. 
          </t>
        </list>
        </t>
      </section>
      <section title="Transport Characteristics Groups" anchor="CHAR-groups">
        <t>
          Transport characteristics are divided into a number of groups 
        <list style="symbols">
          <t>
            An initial set of transport characteristics defined in this
            document.  See <xref target="INIT" /> for the complete list.
          </t>
          <t>
            Additional transport characteristics defined in future standards
            track documents as specified in <xref target="EXT-addl" />. 
          </t>
          <t>
            Experimental transport characteristics being explored preparatory 
            to being considered for standards track definition.  See the 
            description in 
            <xref target="EXT-exp" />. 
          </t>
        </list>
        </t>
      </section>
      <section title="Operations Related to Transport Characteristics " 
               anchor="CHAR-ops">
        <t>
          There are a number of operations defined in <xref target="OPS" />
          which are used to communicate and manage transport 
          characteristics.
        </t>
        <t>
           Prime among these is ROPT_INITXCH (defined in 
           <xref target="OPS-init" /> which serves as a means by which 
           an endpoints transport characteristics may be presented to 
           its peer, typically upon establishing a connection. 
        </t>
        <t>
           In addition, there are a set of related operations concerned with
           requesting, effecting and reporting changes in transport 
           characteristics:
        <list style="symbols">
          <t>
            ROPT_REQXCH (defined in <xref target="OPS-req" /> which serves 
            as a way for an endpoint to request that a peer change
            the value of a set of transport characteristics.
          </t>
          <t>
            ROPT_RESPXCH (defined in <xref target="OPS-resp" /> is used to
            report on the disposition of each of the individual transport
            characteristic changes requested in a previous 
            ROPT_REQXCH.
          </t>
          <t>
            ROPT_UPDXCH (defined in <xref target="OPS-upd" /> is used to
            report a change in a transport characteristic.  This may be 
            one requested by a previous ROPT_REQXCH, or an unsolicited one,
            not being requested by a peer.
          </t>
        </list>
        </t>
        <t>
          Unlike many other operation types, the above are not used to
          effect transfer of RPC requests but are internal one-way 
          information transfers.  However, a ROPT_REQXCH and the corresponding
          ROPT_RESPXCH do constitute an RPC-like remote call.  The other 
          operations are not part of a remote call transaction, although
          one or more asynchronous ROPT_UPDXCH operations may result from 
          a ROPT_REQXCH.
        </t>
      </section>
    </section>
    <section title="Initial Transport Characteristics" anchor="INIT">
      <t>
        Although the set of transport characteristics is subject to later 
        extension, an initial set of transport characteristics is  defined
        below in <xref target="chtab"/>.
      </t>
      <t>
        In that table, the columns contain the following information:
        <list style="symbols">
          <t>
            The column labeled "characteristic" identifies the transport 
            characteristic described by the current row.
          </t>
          <t>
            The column labeled "code" specifies the xcharid value
            used to identify this characteristic.
          </t>
          <t>
            The column labeled "XDR type" gives the XDR type of the data used 
            to communicate the value of this characteristic.  This data 
            type overlays the nominally opaque field xchv_data in an
            xcharval. 
          </t>
          <t>
            The column labeled "default" gives the default value for the
            characteristic which is to be assumed by those who do not
            receive, or are unable to interpret, information about the
            actual value of the characteristic.
          </t>
          <t>
            The column labeled "section" indicates the section (within this
            document) that explains the semantics and use of this transport 
            characteristic.
          </t>
        </list>
      </t>
      <texttable align="left" style="full" anchor="chtab">
        <ttcol>
          characteristic
        </ttcol>
        <ttcol>
          code
        </ttcol>
        <ttcol>
          XDR type
        </ttcol>
        <ttcol>
          default
        </ttcol>
        <ttcol>
          section
        </ttcol>
        <c>Receive Buffer Size</c>
        <c>1</c>
        <c>uint32</c>
        <c>4096</c>
        <c><xref target="INIT-rbs" format="counter"/></c>
        <c>Requester Remote Invalidation</c>
        <c>2</c>
        <c>bool</c>
        <c>false</c>
        <c><xref target="INIT-rqri" format="counter"/></c>
        <c>Backward Request Support</c>
        <c>3</c>
        <c>enum bkreqsup</c>
        <c>BKREQSUP_INLINE</c>
        <c><xref target="INIT-brs" format="counter"/></c>
      </texttable>
      <t>
        Note that there is no explicit indication regarding whether 
        a particular characteristic can change or whether a change
        in the value may be requested (see <xref target="OPS-req"/>).
        Such matters are not addressed by the protocol definition.  
        A partner  implementation
        can always request a change but peers MAY reject a request to 
        change a characteristic for any reason. 
        Implementations are always free
        to reject such requests if they cannot or do not wish to effect
        the requested change.  
      </t>
      <t>
        Either of the following will result in effective rejection 
        requests to change specific characteristics:
      <list style="symbols">
        <t>
          If an endpoint does not wish to accept request to 
          change particular characteristics, it may reject such requests
          as described in <xref target="OPS-resp" />. 
        </t>
        <t>
          If an endpoint does not support the ROPT_REQXCH operation, the 
          effect would be the same as if every request to change a 
          set of characteristic were rejected.
        </t>
      </list>
      </t>
      <t>
        With regard to unrequested changes in
        transport characteristics, it is the responsibility 
        of the implementation
        making the change to do so in a fashion that which does not 
        interfere with
        the other partner's continued correct operation (see 
        <xref target="INIT-rbs"/>). 
      </t>
      <section title="Receive Buffer Size" 
               anchor="INIT-rbs">
        <t>
          The Receive Buffer Size specifies the minimum size, in octets, 
          of pre-posted
          receive buffers.  It is the responsibility of the participant 
          sending this value to ensure that its pre-posted receives are 
          at least
          the size specified, allowing the participant receiving this value 
          to send messages that are of this size.
        </t>
        <t>
        <figure align="left">
          <artwork xml:space="preserve" align="left">
&lt;CODE BEGINS&gt;

const uint32    XCHAR_RBSIZ = 1;
typedef uint32  xchrbsiz;

&lt;CODE ENDS&gt;
          </artwork>
        </figure> 
        </t>
        <t>
          The sender may use his knowledge of the receiver's buffer size 
          to determine
          when the message to be sent will fit in the preposted receive
          buffers that the receiver has set up.  In particular, 
        <list style="symbols">
          <t>
            Requesters may use the value to determine when it is necessary
            to provide a Position-Zero read chunk when sending a request.
          </t>
          <t>
            Requesters may use the value to determine when it is necessary
            to provide a Reply chunk when sending a request, based on the
            maximum possible size of the reply.
          </t>
          <t>
            Responders may use the value to determine when it is necessary,
            given the actual size of the reply, to actually use a Reply chunk
            provided by the requester.
          </t>
        </list>
        </t>
        <t>
          Because there may be pre-posted receives with buffer sizes that
          reflect earlier values of the buffer size characteristic, changing
          this characteristics poses special difficulties:
        <list style="symbols">
          <t>
            When the size is being raised, the partner should not be informed
            of the change until all pending receives using the older value 
            have been eliminated.
          </t>
          <t>
            The size should not be reduced until the partner is aware of the
            need to reduce the size of future sends to conform to this
            reduced value.  To ensure this, such a change should only 
            occur in response to an explicit request by the other endpoint
            (See <xref target="OPS-req"/>).  The participant making the 
            request should use that
            lower size as the send size limit until the request is rejected
            (See <xref target="OPS-resp"/>) or an update to a size
            larger than the requested value becomes
            effective and the requested change is no longer pending
            (See <xref target="OPS-upd"/>).            
          </t>
        </list>
        </t>
      </section>
      <section title="Requester Remote Invalidation" anchor="INIT-rqri"> 
        <t>
          The Requester Remote Invalidation characteristic indicates that 
          the current endpoint, when in the role of a requester, is 
          prepared for the responder to use RDMA Send With Invalidate 
          when replying to an RPC-over-RDMA request containing non-empty 
          chuck lists.
        </t>
        <t>
          As RPC-over-RDMA is currently used, memory registrations exposed 
          to peers are not established by the server and explicit RDMA 
          operations are not done to satisfy backward direction requests.  
          This makes it unlikely that servers will present non-default 
          values of the XCHAR_REQREMINV characteristic or that clients will 
          take note of that value when presented by servers.
        </t>
        <figure align="left">
          <artwork xml:space="preserve" align="left">
&lt;CODE BEGINS&gt;

const uint32    XCHAR_REQREMINV = 2;
typedef bool    xchrreqrem;

&lt;CODE ENDS&gt;
          </artwork>
        </figure> 
      
        <t>
          When the Requester Remote Invalidate characteristic is set 
          to false, a responder MUST use Send to convey RPC reply 
          messages to the requester. When the Requester Remote Invalidate 
          characteristic is set to true, a responder MAY use Send With 
          Invalidate instead of Send to convey RPC replies to the requester.
        </t>
        <t>
          The value of the Requester Remote Invalidate characteristic 
          is not likely to change from the value reported by ROPT_INITXCH 
          (see <xref target="OPS-req" />).
        </t>
      </section>
      <section title="Backward Request Support" 
               anchor="INIT-brs">
        <t>
          The value of this characteristic is used to indicate a client
          implementation's readiness to accept and process messages that are 
          part of backward-direction RPC requests.
           
        </t>
        <t>
        <figure align="left">
          <artwork xml:space="preserve" align="left">
&lt;CODE BEGINS&gt;

enum bkreqsup {
        BKREQSUP_NONE    = 0,
        BKREQSUP_INLINE  = 1,
        BKREQSUP_GENL    = 2
};

const uint32    XCHAR_BRS = 3;
typedef bkreqsup xchrbrs;                

&lt;CODE ENDS&gt;
          </artwork>
        </figure> 
        </t>
        <t>
          Multiple levels of support are distinguished:
        <list style="symbols">
          <t>
            The value BKREQSUP_NONE indicates that receipt of 
            backward-direction requests and replies is not supported.         
          </t>
          <t>

            The value BKREQSUP_INLINE indicates that receipt of 
            backward-direction requests or replies is 
            only supported using inline messages and that use of
            explicit RDMA operations for backward direction requests
            or responses is not supported. 
          </t>
          <t>
            The value BKREQSUP_GENL that receipt of backward-direction 
            requests or replies is supported in the same
            ways that forward-direction requests or replies typically are.
          </t>
        </list>
        </t>
        <t>
          The support level of servers can be inferred from the backward-
          direction requests that they issue, assuming that issuing a 
          request implicitly indicates support for receiving the 
          corresponding reply.  On this basis, support for receiving
          inline replies can be assumed when requests without read
          chunks, write chunks, or Reply chunks are issued, while 
          requests with any of these elements allow the client to assume
          that general support for backward-direction replies is 
          present on the server.
        </t>
      </section>
    </section>
    <section title="New Operations" anchor="OPS">
      <t>
        The proposed  new operation are set forth in <xref target="optab"/>
        below. In that table, the columns contain the following
        information:
      <list style="symbols">
        <t>
          The column labeled "operation" specifies the particular operation. 
        </t>
        <t>
          The column labeled "code" specifies the value of opttype for this
          operation.
        </t>
        <t>
          The column labeled "XDR type" gives the XDR type of the data 
          structure 
          used to describe the information in this new message type.  
          This data overlays the nominally opaque field optinfo in an
          RDMA_OPTIONAL message.
        </t>
        <t>
          The column labeled "msg" indicates whether this operation is
          followed (or not) by an RPC message payload.
        </t>
        <t>
          The column labeled "section" indicates the section (within this
          document) that explains the semantics and use of this optional 
          operation.
 
        </t>
      </list>
      </t>
      <texttable align="left" style="full" anchor="optab">
        <ttcol>
          operation
        </ttcol>
        <ttcol>
          code
        </ttcol>
        <ttcol>
          XDR type
        </ttcol>
        <ttcol>
          msg
        </ttcol>
        <ttcol>
          section
        </ttcol>
        <c>Specify Initial Characteristics</c>
        <c>1</c>
        <c>optinfo_initxch</c>
        <c>No</c>
        <c><xref target="OPS-init" format="counter"/></c>
        <c>Request Characteristic Modification</c>
        <c>2</c>
        <c>optinfo_reqxch</c>
        <c>No</c>
        <c><xref target="OPS-req" format="counter"/></c>
        <c>Respond to Modification Request</c>
        <c>3</c>
        <c>optinfo_respxch</c>
        <c>No</c>
        <c><xref target="OPS-resp" format="counter"/></c>
        <c>Report Updated Characteristics</c>
        <c>4</c>
        <c>optinfo_updxch</c>
        <c>No</c>
        <c><xref target="OPS-upd" format="counter"/></c>
      </texttable>
      <t>
        Support for all of the operations above is OPTIONAL.  RPC-over-RDMA 
        Version Two implementations that receive an operation that is not
        supported MUST
        respond with RDMA_ERROR message with an error code of
        RDMA_ERR_INVAL_OPTION as specified in <xref target="rpcrdmav2"/>
      </t>
      <t>
        The only operation support requirements are as follows:
        <list style="symbols">
          <t>
            Implementations which send ROPT_REQXCH messages must support
            ROPT_RESPXCH and ROPT_UPDXCH messages.
          </t>
          <t>
            Implementations which support ROPT_RESPXCH or ROPT_UPDXCH messages
            must also support ROPT_INITXCH messages.
          </t>
        </list>
      </t>
      <section title="ROPT_INITXCH: Specify Initial Characteristics" 
               anchor="OPS-init">
        <t>
          The ROPT_INITXCH message type allows an RPC-over-RDMA participant,
          whether client or server, to indicate to its partner relevant
          transport characteristics that the partner might need to be 
          aware of.
        </t>
        <t>
          The message definition for this operation is as follows: 
        <figure align="left">
          <artwork xml:space="preserve" align="left">
&lt;CODE BEGINS&gt;

const uint32     ROPT_INITXCH = 1;

struct optinfo_initxch {
        xcharspec       optixch_start;
        xcharsubset     optixch_nochg;
};        
 
&lt;CODE ENDS&gt;
          </artwork>
        </figure> 
        </t>
        <t>
          All relevant transport characteristics that the sender is aware of
          should be included in optixch_start. Since support of this request 
          is OPTIONAL, and since each of the characteristics is OPTIONAL 
          as well,
          the sender cannot assume that the receiver will necessarily take 
          note of these characteristics and so the sender 
          should be prepared for cases in
          which the partner continues to assume that the default value for a 
          particular characteristic is still in effect.   
        </t>
        <t> 
          The subset of transport characteristic specified by optixch_nochg
          is not expected to change during the lifetime of the connection.
        </t>
        <t>
          Generally, a participant will send a ROPT_INITXCH message as the
          first message after a connection is established.  Given that fact,
          the sender should make sure that the message can be received by
          partners who use the default Receive Buffer Size. The connection's 
          initial receive buffer size is typically 1KB, but it depends 
          on the initial connection state of the RPC-over-RDMA version 
          in use. See <xref target="rpcrdmav2" /> for details.
        </t>
        <t>
          Those receiving an ROPT_INITXCH may encounter characteristics that 
          they do not support or are unaware of.  In such cases, these
          characteristics are simply ignored without any error response 
          being generated.
        </t>
      </section>
      <section title="ROPT_REQXCH: Request Modification of Characteristics" 
               anchor="OPS-req">
        <t>
          The ROPT_REQXCH message type allows an RPC-over-RDMA participant,
          whether client or server, to request of its partner that relevant 
          transport characteristics be changed.
        </t>
        <t>
          The partner need not change the characteristics as requested by the
          sender but if it does support the message type, it will generate
          a ROPT_RESPXCH message, indicating the disposition of the request. 
        </t>
        <t>
          The message definition for this operation is as follows: 
        <figure align="left">
          <artwork xml:space="preserve" align="left">
&lt;CODE BEGINS&gt;

const uint32     ROPT_REQXCH = 2;

struct optinfo_reqxch {
        xcharspec       optreqxch_want;
};        
 
&lt;CODE ENDS&gt;
          </artwork>
        </figure> 
        </t>
        <t>
          The xcharspec optreqxch_want is a set of transport characteristics 
          together with the desired values requested by the sender.
        </t>
      </section>
      <section title="ROPT_RESPXCH: Respond to Request to Modify Transport Characteristics" 
               anchor="OPS-resp">
        <t>
          The ROPT_RESPXCH message type allows an RPC-over-RDMA participant
          to respond to a request to change characteristics by its partner,
          indicating how the request was dealt with. 
        </t>
        <t>
          The message definition for this operation is as follows: 
        <figure align="left">
          <artwork xml:space="preserve" align="left">
&lt;CODE BEGINS&gt;

const uint32     ROPT_RESPXCH = 3;

struct optinfo_respxch {
        xcharsubset     optrespxch_done;
        xcharsubset     optrespxch_rej;
        xcharsubset     optrespxch_pend;                        
};        
 
&lt;CODE ENDS&gt;
          </artwork>
        </figure> 
        </t>
        <t>
          The rdma_xid field of this message must match that used in the 
          ROPT_REQXCH message to which this message is responding.
        </t>
        <t>
          The optrespxch_done field indicates which of the requested transport
          characteristic changes have been immediately effected.  For each
          such characteristic, the receiver is entitled to conclude that the 
          requested change has been made and that future transmissions may be 
          made based on the new value.
        </t>
        <t>
          The optrespxch_rej field indicates which of the requested transport
          characteristic changes have been rejected by the sender.  This may 
          be because of any of the following reasons:
        <list style="symbols">
          <t>
            The particular characteristic specified is not known or supported
            by the receiver of the ROPT_REQXCH message.
          </t>
          <t>
            The implementation receiving the ROPT_REQXCH message does not
            support modification of this characteristic.
          </t>
          <t>
            The implementation receiving the ROPT_REQXCHG message has
            rejected the modification for another reason.
          </t>
        </list>
        </t>
        <t>
          The optrespxch_pend field indicates which of the requested transport
          characteristic modifications remain pending, since they were neither 
          rejected nor effected immediately.  The receiver can expect the 
          modification to be effected by a later ROPT_UPDXCH message, although
          there is no way to determine when this will happen.  For each
          characteristic bit set in this field, one or more ROPT_UPXCH can
          be expected, the last of which will have optupdxch_pendclr flag set.
        </t>
        <t>
          The subsets of characteristics specified by optrespxch_done,
          optrespxch_rej, optrespxch_pend should not overlap and, when ored
          together, should cover the entire set of characteristics
          specified by optreqxch_want in the corresponding request. 
        </t>
      </section>
      <section title="ROPT_UPDXCH: Update Transport Characteristics" 
               anchor="OPS-upd">
        <t>
          The ROPT_UPDXCH message type allows an RPC-over-RDMA participant
          to notify the other participant that a change to the transport 
          characteristics has occurred.
        </t>
        <t>
          This may be because:
        <list style="symbols">
          <t>
            A change requested by a ROPT_REQXCH message, has, after some 
            delay, been effected.
          </t>
          <t>
            The sender has decided, independently, to modify the transport 
            characteristic and is notifying the receiver of this change.
          </t>
        </list>
        </t>
        <t>
          One should pay particular attention to the fact that there is no
          there no way to tie a message reporting a change
          to the specific request which 
          asked
          for the change.  In particular, the rdma_xid field in this message is
          independent of that for any earlier ROPT_REQXCH message. 
        </t>
        <t>
          The message definition for this operation is as follows: 
        <figure align="left">
          <artwork xml:space="preserve" align="left">

&lt;CODE BEGINS&gt;
 
const uint32     ROPT_UPDXCH = 4;

struct optinfo_updxch {
        xcharval        optupdxch_now;
        bool            optupdxch_pendclr;
};        

&lt;CODE ENDS&gt;

          </artwork>
        </figure> 
        </t>
        <t>
          optupdxch_now defines the new characteristic value to be used.
        </t>
        <t>
          optupdxch_pendclr, if true, indicates that a previous request to
          update the characteristic specified by optupdxch_now.xchv_which is
          no longer to be considered pending.  This may be set true even
          if the characteristic value is not changed from the previous 
          value.
        </t>
        <t>
          Some instances of ROPT_UPDXCH are the result of a previous a 
          previous ROPT_REQXCH while others are unsolicited.  This 
          distinction relates to the setting of optupdxch_pendclr as
          follows:  
        <list style="symbols">
          <t>
            If a characteristic update is unsolicited, then optupdxch_pendclr 
            will always be false.
          </t>
          <t>
            If a characteristic update is prompted by a previous ROPT_REQXCH
            and optupdxch_pendclr is true, then the current message indicates
            the (asynchronous) completion of that previous change request.
          <vspace blankLines="1" /> 
            In this case the disposition of the change request can be
            determined using optupdxch_now.  If the value is that requested
            by the associated ROPT_REQXCH then the request was successful,
            while if the value is unchanged from the original value, the
            change can be considered rejected. 
          <vspace blankLines="1" /> 
            In cases in which the characteristic has a range of values,
            intermediate value are possible, indicating a partial 
            satisfaction of the original request.         
          </t>
          <t>
            If a characteristic update is prompted by a previous ROPT_REQXCH
            and optupdxch_pendclr is false, then the current message does not
            indicate completion of a previous change request.
          <vspace blankLines="1" /> 
            In such cases, the characteristic value indicates the current
            value of the characteristic, which the receiver is entitled to
            rely upon, just as would have been the case if the change had
            been unsolicited.
          <vspace blankLines="1" /> 
            Nevertheless, the change request is still active and will remain
            so until a ROPT_UPDXCH with optupdxch_pendclr is received.
          </t>
        </list>
        </t>
      </section>
    </section>
    <section title="XDR" anchor="XDR">

      <t>
        This section contains an XDR <xref target="RFC4506"/>  description 
        of the proposed extension.
      </t>
      <t>
       This description is provided in a way that makes it simple to 
       extract into ready-to-use form.  The reader can apply the 
       following shell script to this document to produce a machine-readable 
       XDR description of extension which can be combined with 
       XDR for the base protocol to produce an XDR that combines the base 
       protocol with the optional extensions.
     <figure align="left">
       <artwork xml:space="preserve" align="left">

&lt;CODE BEGINS&gt;

#!/bin/sh
grep '^ *///' | sed 's?^ /// ??' | sed 's?^ *///$??'

&lt;CODE ENDS&gt;

       </artwork>
      </figure>
      </t>
      <t>
        That is, if the above script is stored in a file called 
        "extract.sh" and this document is in a file called "ext.txt" then 
        the reader can do the following to extract an XDR description file
        for this extension: 
      <figure align="left">
        <artwork xml:space="preserve" align="left">

&lt;CODE BEGINS&gt;

sh extract.sh &lt; ext.txt &gt; charext.x

&lt;CODE ENDS&gt;

        </artwork>
      </figure>
      </t>
      <section title="Code Component License" toc="default">
        <t>
          Code components extracted from this document must include 
          the following license text.  When the extracted XDR code is 
          combined with other complementary XDR code which itself has 
          an identical license, only a single copy of the license text 
          need be preserved.  
        <figure align="left">
          <artwork xml:space="preserve" align="left">

&lt;CODE BEGINS&gt;

/// /*
///  * Copyright (c) 2010, 2016 IETF Trust and the persons
///  * identified as authors of the code.  All rights reserved.
///  *
///  * The author of the code is: D. Noveck.
///  *
///  * Redistribution and use in source and binary forms, with
///  * or without modification, are permitted provided that the
///  * following conditions are met:
///  *
///  * - Redistributions of source code must retain the above
///  *   copyright notice, this list of conditions and the
///  *   following disclaimer.
///  *
///  * - Redistributions in binary form must reproduce the above
///  *   copyright notice, this list of conditions and the
///  *   following disclaimer in the documentation and/or other
///  *   materials provided with the distribution.
///  *
///  * - Neither the name of Internet Society, IETF or IETF
///  *   Trust, nor the names of specific contributors, may be
///  *   used to endorse or promote products derived from this
///  *   software without specific prior written permission.
///  *
///  *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
///  *   AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
///  *   WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
///  *   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
///  *   FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
///  *   EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
///  *   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
///  *   EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
///  *   NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
///  *   SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
///  *   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
///  *   LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
///  *   OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
///  *   IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
///  *   ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
///  */

&lt;CODE ENDS&gt;

          </artwork>
        </figure> 
        </t>
      </section>
      <section title="XDR Proper for Extension">
        <t>
        <figure align="left">
          <artwork xml:space="preserve" align="left">

&lt;CODE BEGINS&gt;

///
////*
/// * Basic transport characteristic types
/// */ 
///typedef xcharid         uint32;
///
///struct xcharval {
///        xcharid         xchv_which;
///        opaque          xchv_data&lt;&gt;;
///};
///
///typedef xcharspec       xcharval&lt;&gt;;
///
///typedef xcharsubset     uint32&lt;&gt;;
///
////*
/// * Transport characteristic codes
/// */
///const uint32    XCHAR_RBSIZ = 1;
///const uint32    XCHAR_REQREMINV = 2;
///const uint32    XCHAR_BRS = 3;
///
////*
/// * Other transport characteristic types
/// */
///enum bkreqsup {
///        BKREQSUP_NONE    = 0,
///        BKREQSUP_INLINE  = 1,
///        BKREQSUP_GENL    = 2
///};
///
////*
/// * Transport characteristic typedefs
/// */
///typedef uint32   xchrbsiz;
///typedef bool     xchrreqrem;
///typedef bkreqsup xchrbrs;                
///
////*
/// * Optional operation codes
/// */
///const uint32     ROPT_INITXCH = 1;
///const uint32     ROPT_REQXCH = 2;
///const uint32     ROPT_RESPXCH = 3;
///const uint32     ROPT_UPDXCH = 4;
///
////*
/// * Optional operation message structures
/// */
///struct optinfo_initxch {
///        xcharspec       optixch_start;
///        xcharsubset     optixch_nochg;
///};       
///
///struct optinfo_reqxch {
///        xcharspec       optreqxch_want;
///};        
///
///struct optinfo_respxch {
///        xcharsubset     optrespxch_done;
///        xcharsubset     optrespxch_rej;
///        xcharsubset     optrespxch_pend;                        
///};        
///
///struct optinfo_updxch {
///        xcharval        optupdxch_now;
///        bool            optupdxch_pendclr;
///};        

&lt;CODE ENDS&gt;
          </artwork>
        </figure> 
        </t>
      </section>
    </section>
    <section title="Extensibility" anchor="EXT">
      <section title="Additional Characteristics" anchor="EXT-addl">
        <t>
          The set of transport characteristics is designed to be
          extensible.  As a result, once new characteristics are defined in 
          standards track documents, the operations defined in this
          document may reference these new transport characteristics,
          as well as the ones described in this document.
        </t>
        <t>
          A standards track document defining a new transport characteristic 
          should include
          the following information paralleling that provided in this document
          for the transport characteristics defined herein.
        <list style="symbols">
          <t>
            The xcharid value used to identify this characteristic.
          </t>
          <t>
            The XDR typedef specifying the form in which the characteristic
            value is communicated. 
          </t>
          <t>
            A description of the transport characteristic that is communicated
            by the sender of ROPT_INITXCH and ROPT_UPDXCH and requested by
            the sender of ROP_REQXCH. 
          </t>
          <t>
            An explanation of how this knowledge could be used by the
            participant receiving this information.
          </t>
          <t>
            Information giving rules governing possible changes of values 
            of this
            characteristic. 
          </t>
        </list>
        </t>
        <t>
          The definition of transport characteristic structures is such as
          to make it easy to assign unique values.  There is no requirement 
          that a continuous set of values be used and 
          implementations should not 
          rely on all
          such values being small integers.  A unique value should
          be selected when the defining document is first published as an
          internet draft.  When the document becomes a standards track 
          document working group should insure that: 
        <list style="symbols">
          <t>
            The xcharids specified in the document do not conflict with
            those currently assigned or in use by other pending working group
            documents defining transport characteristics.
          </t>
          <t>
            The xcharids specified in the document do not conflict with the
            range reserved for experimental use, as defined in 
            <xref target="EXT-exp"/>.
          </t>
        </list>
        </t>
        <t>
          Documents defining new characteristics fall into a number of 
          categories.
        <list style="symbols">
          <t>
            Those defining new characteristics and explaining (only) how they 
            affect use of existing message types.
          </t>
          <t>
            Those defining new OPTIONAL message types and new characteristics
            applicable to the operation of those new message types.
          </t>
          <t>
            Those defining new OPTIONAL message types and new characteristics
            applicable both to new and existing message types.
          </t>
        </list>
        </t>
        <t>
          When additional transport characteristics are proposed,
          the review of the associated standards track document should deal 
          with possible security issues raised by those new transport 
          characteristics. 
        </t>
      </section>
      <section title="Experimental Characteristics" anchor="EXT-exp">
        <t>
          Given the design of the transport characteristics data 
          structure, it possible to use the operations to implement
          experimental, possibly unpublished, transport characteristics.
        </t>
        <t>
          xcharids in the range from 4,294,967,040 to 4,294,967,295 are
          reserved for experimental use and these values should not be
          assigned to new characteristics in standards track documents.
        </t>
        <t>
          When values in this range are used there is no guarantee if
          successful interoperation among independent implementations. 
        </t>
      </section>
    </section>
    <section title="Security Considerations" anchor="SEC">
      <t>
        Like other fields that appear in each RPC-over-RDMA header, 
        characteristic information is sent in the clear on the fabric 
        with no integrity protection, making it vulnerable to 
        man-in-the-middle attacks.
      </t>
      <t>
        For example, if a man-in-the-middle were to change the value of 
        the Receive buffer size or the Requester Remote Invalidation boolean, 
        it could reduce connection performance or trigger loss of connection. 
        Repeated connection loss can impact performance or even prevent a 
        new connection from being established. Recourse is to deploy on a 
        private network or use link-layer encryption.
      </t>
    </section>
    <section title="IANA Considerations" anchor="IANA">
      <t>
        This document does not require any actions by IANA.
      </t>
    </section>
  </middle>
  <back>
    <references title="Normative References">
      &RFC2119;
      &RFC4506;
      <reference anchor="rfc5666bis" 
                 target="http://www.ietf.org/id/draft-ietf-nfsv4-rfc5666bis-07.txt">
        <front>
          <title>
            Remote Direct Memory Access Transport for Remote Procedure Call
          </title>

          <author initials="C." surname="Lever" role="editor">
            <organization>Oracle</organization>
          </author>
          <author initials="W." surname="Simpson">
            <organization>DayDreamer</organization>
          </author>
          <author initials="T." surname="Talpey">
            <organization>Oracle</organization>
          </author>
          <date month="May" year="2016" />
        </front>
        <annotation>
          Work in progress.
        </annotation>
      </reference>
      <reference anchor="bidir" 
                 target="http://www.ietf.org/id/draft-ietf-nfsv4-rpcrdma-bidirection-02.txt">
        <front>
          <title>
            Size-Limited Bi-directional Remote Procedure Call 
            On Remote Direct Memory Access Transports
          </title>

          <author initials="C." surname="Lever">
            <organization>Oracle</organization>
          </author>
          <date month="April" year="2016" />
        </front>
        <annotation>
          Work in progress.
        </annotation>
      </reference>
      <reference anchor="rpcrdmav2" 
                 target="http://www.ietf.org/id/draft-cel-nfsv4-rpcrdma-version-two-01.txt">
        <front>
          <title>
            RPC-over-RDMA Version Two
          </title>

          <author initials="C." surname="Lever" role="editor">
            <organization>Oracle</organization>
          </author>
          <author initials="D." surname="Noveck">
            <organization>Hewlett Packard Enterprise</organization>
          </author>
          <date month="June" year="2016" />
        </front>
        <annotation>
          Work in progress.
        </annotation>
      </reference>
    </references>

    <references title="Informative References">
      &RFC5662;
      &RFC5666;
    </references>
    <section title="Acknowledgments" anchor="ACK">
      <t>
        The author gratefully acknowledges the work of Brent Callaghan and
        Tom Talpey producing the original RPC-over-RDMA Version One 
        specification <xref target="RFC5666" /> and also Tom's work in
        helping to clarify that specification. 
      </t>
      <t>
        The author also wishes to thank Chuck Lever for his work resurrecting 
        NFS support for RDMA in <xref target="rfc5666bis"/> and for his
        helpful review of and suggestions for this document. 
      </t>
      <t>
        The extract.sh shell script and formatting conventions were first
        described by the authors of the NFSv4.1 XDR specification 
        <xref target="RFC5662"/>.
      </t>
    </section>
  </back>
</rfc>
