<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
     An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC4656 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4656.xml">
<!ENTITY RFC5357 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5357.xml">
<!ENTITY RFC6812 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6812.xml">
<!ENTITY RFC3954 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3954.xml">
<!ENTITY RFC4148 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4148.xml">
<!ENTITY RFC7297 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7297.xml">
]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-irtf-nmrg-autonomic-sla-violation-detection-04" ipr="trust200902">
  <!-- category values: std, bcp, info, exp, and historic
     ipr values: full3667, noModification3667, noDerivatives3667
     you can add the attributes updates="NNNN" and obsoletes="NNNN" 
     they will automatically be output with "(if approved)" -->

  <!-- ***** FRONT MATTER ***** -->

  <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
         full title is longer than 39 characters -->

    <title abbrev="AN Use Case Detection of SLA Violations">Autonomic Networking Use Case for Distributed Detection of SLA Violations</title>

    <!-- add 'role="editor"' below for the editors if appropriate -->

    <!-- Another author who claims to be an editor -->

    <author fullname="Jeferson Campos Nobre" initials="J.C." surname="Nobre">
      <organization>University of Vale do Rio dos Sinos</organization>

      <address>
        <postal>
          <street></street>

          <!-- Reorder these if your country does things differently -->

          <city>Porto Alegre</city>

          <region></region>

          <code></code>

          <country>Brazil</country>
        </postal>

        <phone></phone>

        <email>jcnobre@unisinos.br</email>

        <!-- uri and facsimile elements may also be added -->
      </address>
    </author>


    <author fullname="Lisandro Zambenedetti Granvile" initials="L.Z." surname="Granville">
      <organization>Federal University of Rio Grande do Sul</organization>

      <address>
        <postal>
          <street></street>

          <!-- Reorder these if your country does things differently -->

          <city>Porto Alegre</city>

          <region></region>

          <code></code>

          <country>Brazil</country>
        </postal>

        <phone></phone>

        <email>granville@inf.ufrgs.br</email>

        <!-- uri and facsimile elements may also be added -->
      </address>
    </author>

    <author fullname="Alexander Clemm" initials="A." surname="Clemm">
      <organization>Sympotech</organization>

      <address>
        <postal>
          <street></street>

          <!-- Reorder these if your country does things differently -->

          <city>Los Gatos</city>

          <region></region>

          <code></code>

          <country>USA</country>
        </postal>

        <phone></phone>

        <email>alex@sympotech.com</email>

        <!-- uri and facsimile elements may also be added -->
      </address>
    </author>

    <author fullname="Alberto Gonzalez Prieto" initials="A.G." surname="Prieto">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street></street>

          <!-- Reorder these if your country does things differently -->

          <city>San Jose</city>

          <region></region>

          <code></code>

          <country>USA</country>
        </postal>

        <phone></phone>

        <email>albertgo@cisco.com</email>

        <!-- uri and facsimile elements may also be added -->
      </address>
    </author>


    <date month="October" year="2016" />

    <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
         in the current day for you. If only the current year is specified, xml2rfc will fill 
	 in the current day and month for you. If the year is not the current one, it is 
	 necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
	 purpose of calculating the expiry date).  With drafts it is normally sufficient to 
	 specify just the year. -->

    <!-- Meta-data Declarations -->

    <area>General</area>

    <workgroup>Network Management Research Group</workgroup>

    <!-- WG name at the upperleft corner of the doc,
         IETF is fine for individual submissions.  
	 If this element is not present, the default is "Network Working Group",
         which is used by the RFC Editor as a nod to the history of the IETF. -->

    <keyword>Autonomic Networking</keyword>
    <keyword>SLA</keyword>
    <keyword>P2P</keyword>

    <!-- Keywords will be incorporated into HTML output
         files in a meta tag but they have no effect on text or nroff
         output. If you submit your draft to the RFC Editor, the
         keywords will be used for the search engine. -->

    <abstract>
      <t>This document describes a use case for autonomic networking in distributed detection of Service Level Agreement (SLA) violations.  It is one of a series of use cases intended to illustrate requirements for autonomic networking.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>The Internet has been growing dramatically in terms of size and capacity, and accessibility in the last years. Communication requirements of distributed services and applications running on top of the Internet have become increasingly demanding. Some examples are real-time interactive video or financial trading.  Providing such services involves stringent requirements in terms of acceptable latency, loss, or jitter.  Those requirements lead to the articulation of Service Level Objectives (SLOs) which are to be met. Those SLOs become part of Service Level Agreements (SLAs) that articulate a contract between the provider and the consumer of a service. To fulfill a service, it needs to be ensured that the SLOs are met. Examples of service fulfillment clauses can be found on <xref target="RFC7297" />). Violations of SLOs can be associated with significant financial loss, which can by divided in two types. First, there is the loss incurred by the service users (e.g., the trader whose orders are not executed in a timely manner) and the loss incurred by the service provider in terms of penalties for not meeting the service and loss of revenues due to reduced customer satisfaction. Thus, the service level requirements of critical network services have become a key concern for network administrators. To ensure that SLAs are not being violated, service levels need to be constantly monitored at the network infrastructure layer. To that end, network measurements must take place.</t>

<t>Network measurement mechanisms are performed through either active or passive measurement techniques. In passive measurements, production traffic is observed.  Network conditions are checked in a non intrusive way because no monitoring traffic is created by the measurement process itself. In the context of IP Flow Information EXport (IPFIX) WG, several documents were produced to define passive measurement mechanisms (e.g., flow records specification <xref target="RFC3954" />). Active measurement, on the other hand, is intrusive because it injects synthetic traffic into the network to measure the network performance. The IP Performance Metrics (IPPM) WG produced documents that describe active measurement mechanisms, such as: One-Way Active Measurement Protocol (OWAMP) <xref target="RFC4656" />, Two-Way Active Measurement Protocol (TWAMP) <xref target="RFC5357" />, and Cisco Service Level Assurance Protocol (SLA) <xref target="RFC6812" />. Besides that, there are some mechanisms that do not fit into either active or passive categories, such as Performance and Diagnostic Metrics Destination Option (PDM) techniques <xref target="draft-ietf-ippm-6man-pdm-option" />.</t>

<t>Active measurement mechanisms offer a high level of control of what and how to measure. It also does not require inspecting production traffic. Because of this, it usually offers better accuracy and privacy than passive measurement mechanisms. Traffic encryption and regulations that limit the amount of payload inspection that can occur are non-issues. Furthermore, active measurement mechanisms are able to detect end-to-end network performance problems in a fine-grained way (e.g., simulating the traffic that must be handled considering specific Service Level Objectives - SLOs). As a result, active measurements are often preferred over passive measurement for SLA monitoring. Measurement probes must be hosted in network devices and measurement sessions must be activated to compute the current network metrics (e.g., considering those described in <xref target="RFC4148" />). This activation should be dynamic in order to follow changes in network conditions, such as those related with routes being added or new customer demands.</t>

<t>The activation of active measurement sessions (hosted in senders and responders considering the architecture described by Cisco <xref target="RFC6812" />) is expensive in terms of the resource consumption, e.g., CPU cycle and memory footprint, and monitoring functions compete for resources with other functions, including routing and switching. Besides that, the activated sessions also increase the network load because of the injected traffic. The resources required and traffic generated by the active measurement sessions are a function of the number of measured network destinations, i.e., with more destinations the larger will be the resources and the traffic needed to deploy the sessions. Thus, to have a better monitoring coverage it is necessary to deploy more sessions what consequently turns increases consumed resources. Otherwise, enabling the observation of just a small subset of all network flows can lead to an insufficient coverage.  Hence, the decision how to place measurement probes becomes an important management activity, so that with a limited amount of measurement overhead the maximum benefits in terms of service level monitoring are obtained. </t>
    </section>
	
	<section title="Definitions and Acronyms">
<!--	<list style="symbols"> -->
	<t>Active Measurements: Techniques to measure service levels that involves generating and observing synthetic test traffic</t>
	<t>Passive Measurements: Techniques used to measure levels based on observation of production traffic</t>
	<t>SLA: Service Level Parameter</t>
	<t>SLO: Service Level Objective </t>
	<t>P2P: Peer-to-Peer</t>
<!--	</list> -->
	</section>

    <section title="Current Approaches">

<t>The current best practice in feasible deployments of active measurement solutions to distribute the available measurement sessions along the network consists in relying entirely on the human administrator expertise to infer which would be the best location to activate such sessions. This is done through several steps. First, it is necessary to collect traffic information in order to grasp the traffic matrix. Then, the administrator uses this information to infer which are the best destinations for measurement sessions. After that, the administrator activates sessions on the chosen subset of destinations considering the available resources. This practice, however, does not scale well because it is still labor intensive and error-prone for the administrator to compute which sessions should be activated given the set of critical flows that needs to be measured. Even worse, this practice completely fails in networks whose critical flows are too short in time and dynamic in terms of traversing network path, like in modern cloud environments. That is so because fast reactions are necessary to reconfigure the sessions and administrators are not just enough in computing and activating the new set of required sessions every time the network traffic pattern changes. Finally, the current active measurements practice usually covers only a fraction of the network flows that should be observed, which invariably leads to the damaging consequence of undetected SLA violations.</t>

    </section>

    <section title="Problem Statement">

<t>The problem  to solve involves automating the placement of active measurement probes in the most effective manner possible. Specifically, assuming a bounded resource budget that is available for measurements, the problem becomes how to place those measurement probes such that the likelihood of detecting service level violations is maximized, and subsequently performing the required configurations. The method should be embeddable as management software inside network devices that controls the deployment of active measurement mechanisms. The method shall furthermore be dynamic and be able to adapt to changing network conditions.</t>

    </section>

    <section title="Benefits of an Autonomic Solution">
<t>The use case considered here is the distributed autonomic detection of SLA violations. The use of Autonomic Networking (AN) properties can help such detection through an efficient activation of measurement sessions <xref target="P2PBNM-Nobre-2012" />. The problem to be solved by AN in the present use case is how to steer the process of measurement session activation by a complete solution that sets all necessary parameters for this activation to operate efficiently, reliably and securely, with no required human intervention, while allowing for their input.</t>

<t>We advocate for embedding Peer-to-Peer (P2P) technology in network devices in order to improve the measurement session activation decisions using autonomic control loops. The provisioning of the P2P management overlay should be transparent for the network administrator. It would be possible to control the measurement session activation using local data and logic and to share measurement results among different network devices. </t> 

<t>An autonomic solution for the distributed detection of SLA violations can provide several benefits. First, efficiency: this solution could optimize the resource consumption and avoid resource starvation on the network devices. In practice, the solution should maximize the benefits of SLA monitoring (i.e., maximize the likelihood of SLA violations being detected) by operating within a given resource budget. This optimization comes from different sources: taking into account past measurement results, taking into account other observations (such as, observations of link utilizations and passive measurements, where available) sharing of measurement results between network devices, better efficiency in the probe activation decisions, etc. Second, effectiveness: the number of detected SLA violations could be increased. This increase is related with a better coverage of the network. Third, the solution could decrease the time necessary to detect SLA violations. Adaptivity features of an autonomic loop could capture faster the network dynamics than an human administrator. Finally, the solution could help to reduce the workload of human administrator, or, at least, to avoid their need to perform operational tasks.</t> 

    </section>

    <section title="Intended User and Administrator Experience">
<t>The autonomic solution should not require the human intervention in the distributed detection of SLA violations. Besides that, it could enable the control of SLA monitoring by less experienced human administrators. However, some information may be provided from the human administrator. For example, the human administrator may provide the SLOs regarding the SLA being monitored. 

The configuration and bootstrapping of network devices using the autonomic solution should be  minimal for the human administrator. Probably it would be necessary just to inform the address of a device which is already using the solution and the devices themselves could exchange configuration data.</t>

    </section>

    <section title="Analysis of Parameters and Information Involved">

<t>The active measurement model assumes that a typical infrastructure will have multiple network segments and Autonomous Systems (ASs), and a reasonably large number of several of routers and hosts. It also considers that multiple SLOs can be in place in a given time. Since interoperability in a heterogenous network is a goal, features found on different active measurement mechanisms (e.g. OWAMP, TWAMP, and IPSLA) and programability interfaces (e.g., Cisco's EEM and onePK) could be used for the implementation. The autonomic solution should include and/or reference specific algorithms, protocols, metrics and technologies for the implementation of distributed detection of SLA violations as a whole.</t>

    <section title="Device Based Self-Knowledge and Decisions">

<t>Each device has self-knowledge about the local SLA monitoring. This could be in the form of historical measurement data and SLOs. Besides that, the devices would have algorithms that could decide which probes should be activated in a given time. The choice of which algorithm is better for a specific situation would be also autonomic.</t>
      </section>

    <section title="Interaction with other devices">
<t>Network devices should share information about service level measurement results. This information can speed up the detection of SLA violations and increase the number of detected SLA violations. In any case, it is necessary to assure that the results from remote devices have local relevancy.

The definition of network devices that exchange measurement data, i.e., management peers, creates a new topology. Different approaches could be used to define this topology (e.g., correlated peers <xref target="P2PBNM-Nobre-2012" />). To bootstrap peer selection, each device should use its known endpoints neighbors (e.g., FIB and RIB tables) as the initial seed to get possible peers.</t>
    </section>
    </section>

    <section title="Comparison with current solutions">
        <t>There is no standartized solution for distributed autonomic detection of SLA violations. Current solutions are restricted to ad hoc scripts running on a per node fashion to automate some administrator's actions. There some proposals for passive probe activation (e.g., DECON and CSAMP), but without the focus on autonomic features. It is also mentioning a proposal from Barford et al. to detect and localize links which cause anomalies along a network path.</t>
    </section>

    <section title="Related IETF Work">
        <t>The following paragraphs discuss related IETF work and are provided for reference. This section is not exhaustive, rather it provides an overview of the various initiatives and how they relate to autonomic distributed detection of SLA violations.

1. [LMAP]: The Large-Scale Measurement of Broadband Performance Working Group aims at the standards for performance management. Since their mechanisms also consist in deploying measurement probes the autonomic solution could be relevant for LMAP specially considering SLA violation screening. Besides that, a solution to decrease the workload of human administrators in service providers is probably highly desirable.

2. [IPFIX]: IP Flow Information EXport (IPFIX) aims at the process of standardization of IP flows (i.e., netflows). IPFIX uses measurement probes (i.e., metering exporters) to gather flow data. In this context, the autonomic solution for the activation of active measurement probes could be possibly extended to address also passive measurement probes. Besides that, flow information could be used in the decision making of probe activation.

3. [ALTO]: The Application Layer Traffic Optimization Working Group aims to provide topological information at a higher abstraction layer, which can be based upon network policy, and with application-relevant service functions located in it. Their work could be leveraged for the definition of the topology regarding the network devices which exchange measurement data.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>We wish to acknowledge the helpful contributions, comments, and suggestions that were received from Mohamed Boucadair, Bruno Klauser, Eric Voit, and Hanlin Fang.</t>
    </section>

    <!-- Possibly a 'Contributors' section ... -->

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>The bootstrapping of a new device follows the approach proposed on anima wg <xref target="draft-anima-boot" />, thus in order to exchange data a device should register first. This registration could be performed by a "Registrar" device or a cloud service provided by the organization to facilitate autonomic mechanisms. The new device sends its own credentials to the Registrar, and after successful authentication, receives domain information, to enable subsequent enrolment to the domain.  The Registrar sends all required information: a device name, domain name, plus some parameters for the operation. Measurement data should be exchanged signed and encripted among devices since these data could carry sensible information about network infrastructures.

Some attacks should be considering when analyzing the security of the autonomic solution. Denial of service (DoS) attacks could be performed if the solution be tempered to active more local probe than the available resources allow. Besides that, results could be forged by a device (attacker) in order to this device be considered peer of a specific device (target). This could be done to gain information about a network.</t>
    </section>
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>
    <!-- References split into informative and normative -->

    <!-- There are 2 ways to insert reference entries from the citation libraries:
     1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
     2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
        (for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")

     Both are cited textually in the same manner: by using xref elements.
     If you use the PI option, xml2rfc will, by default, try to find included files in the same
     directory as the including file. You can also define the XML_LIBRARY environment variable
     with a value containing a set of directories to search.  These can be either in the local
     filing system or remote ones accessed by http (http://domain/dir/... ).-->

    <!--     <references title="Normative References"> -->
      
    <!--         </references> -->

    <references title="Normative References">
      <!-- Here we use entities that we defined at the beginning. -->

      &RFC4656;
      &RFC5357;
      &RFC6812;
      &RFC7297;
 
      <reference anchor="P2PBNM-Nobre-2012"
                 target="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6379997">
        <front>
          <title>Decentralized Detection of SLA Violations Using P2P Technology, 8th International Conference Network and Service Management (CNSM)</title>
          <author initials="J.C." surname="Nobre">
            <organization>Federal University of Rio Grande do Sul</organization>
          </author>
          <author initials="L.Z." surname="Granville">
            <organization>Federal University of Rio Grande do Sul</organization>
          </author>
          <author initials="A." surname="Clemm">
            <organization>Cisco Systems</organization>
          </author>
          <author initials="A.G." surname="Prieto">
            <organization>Cisco Systems</organization>
          </author>
          <date year="2012" />
        </front>
      </reference>

<reference anchor='draft-anima-boot'>
<front>
<title abbrev='draft-anima-boot'>draft-ietf-anima-bootstrapping-keyinfra</title>

<author initials='M.' surname='Pritikin'>
<organization>Cisco Systems</organization>
</author>
 
<author initials='M.' surname='Richardson'>
<organization>SSW</organization>
</author>

<author initials='M.' surname='Behringer'>
<organization>Cisco Systems</organization>
</author>

<author initials='S.' surname='Bjarnason'>
<organization>Cisco Systems</organization>
</author>
                                    
<date year='2016' month='June' day='30' />

<abstract><t> This document specifies automated bootstrapping of a key infrastructure (BSKI) using vendor installed IEEE 802.1AR manufacturing installed certificates, in combination with a vendor based service on the Internet. Before being authenticated, a new device has only link-local connectivity, and does not require a routable address.  When a vendor provides an Internet based service, devices can be forced to join only specific domains but in limited/disconnected networks or legacy environments we describe a variety of options that allow bootstrapping to proceed.</t></abstract>
</front>

<seriesInfo name="Internet-Draft" value="draft-ietf-anima-bootstrapping-keyinfra-03"/>
<format type="TXT" target="https://www.ietf.org/id/draft-ietf-anima-bootstrapping-keyinfra-03.txt"/>

</reference>

<reference anchor='draft-ietf-ippm-6man-pdm-option'>
<front>
<title abbrev='draft pdm'>draft-ietf-ippm-6man-pdm-option</title>

<author initials='N.' surname='Elkins'>
<organization>Inside Products</organization>
</author>

<author initials='R.' surname='Hamilton'>
<organization>Chemical Abstracts Service</organization>
</author>

<author initials='M.' surname='Ackermann'>
<organization>BCBS Michigan</organization>
</author>
                                    
<date year='2016' month='September' day='23' />

<abstract><t> To assess performance problems, measurements based on optional sequence numbers and timing may be embedded in each packet. Such measurements may be interpreted in real-time or after the fact. An implementation of the existing IPv6 Destination Options extension header, the Performance and Diagnostic Metrics (PDM) Destination Options extension header as well as the field limits, calculations, and usage of the PDM in measurement are included in this document. </t></abstract>
</front>

<seriesInfo name="Internet-Draft" value="draft-ietf-ippm-6man-pdm-option-06"/>
<format type="TXT" target="https://tools.ietf.org/id/draft-ietf-ippm-6man-pdm-option-06.txt"/>

</reference>

</references>

 <references title="Informative References">
      <!-- Here we use entities that we defined at the beginning. -->

      &RFC4148;
      &RFC3954;

</references>

</back>
</rfc>
