Diet-ESP: a flexible and compressed format for IPsec/ESP

Diet-ESP: a flexible and compressed format for IPsec/ESP Ericsson

8400 boulevard Decarie Montreal, QC H4P 2N2 Canada daniel.migault@ericsson.com

LMU Munich

Oettingenstr. 67 80538 München Bavaria Germany guggemos@mnm-team.org

Universitaet Bremen TZI

Postfach 330440 Bremen D-28359 Germany +49-421-218-63921 cabo@tzi.org

INTERNET 6lo This document defines Diet-ESP that adapts IPsec/ESP for IoT. Diet-ESP compresses fields of the Standard ESP. The compression is defined by profiles based ROHC and ROHCoverIPsec as well as parameters mentioned in the Diet-ESP Context agreed between the two Diet-ESP peers for example using IKEv2.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .

The IPsec/ESP is represented in . It was designed to: 1) provide general purposes security protocol with high level of security, 2) favor interoperability between implementations and 3) scale on large infrastructures. In order to match these goals, ESP format favor mandatory fields with fixed sizes that are designed considering extreme or worst case scenarios. This results in a kind of "unique" packet format common to all considered scenarios using ESP. On the other hand ESP ends up carrying "unnecessary" or "larger than required" fields. This cost of additional bytes were considered as negligible versus interoperability, making ESP very successful over the years. With IoT, requirements become slightly different. For most devices, like sensors, sending extra bytes directly impacts the battery and so the life time of the sensor. As a result, IoT may look at reducing the number of bytes sent on the wire. This document describes Diet-ESP which compresses fields of the Standard ESP. The compression is defined by profiles based ROHC and ROHCoverIPsec as well as parameters mentioned in the Diet-ESP Context agreed between the two Diet-ESP peers for example using IKEv2

This document describes how to compress ESP fields sent on the wire. Concerned fields are those of the ESP Header and the ESP Trailer. Compression of the Payload Data, including the Initialization Vector (IV) or the Integrity Check Value (ICV) are out of scope of the document. The compression mechanisms defined in this document are based on ROHC , and ROHCoverIPsec , , . ROHC defines mechanisms to compress/decompress fields of an IP packet. These compressors are placed between the MAC layer and the IP layer. In the case of ESP, ROHC can be used to compress/decompress SPI, SN. However, ROHC cannot be used to compress encrypted fields like Padding, Pad Length, Next Header -- and later the Clear Text Data before encryption. In fact, at the MAC layer, these fields are encrypted and their encrypted value is used generate the ICV. As a result, compression an ESP packet at the MAC layer requires to decrypt the packet to be able to compress fields like Clear Text Data, Pad Length in order to be able to eventually remove the Padding Field. Similarly, decompressing a compress ESP packet at the MAC layer would require to decrypt the received packet, decompress the packet the Clear Text Data as well as the other ESP fields, before forwarding the ESP packet to the IP stack. Note that in some case decompression is not feasible. Consider for example an ESP implementation that generates a random Padding. If this field is removed by the compressor, it can hardly be recovered by the decompressor. Using a different Padding field would result is ESP rejecting the packet as the ICV check will not succeed. As a result ROHC cannot be used alone. On the other hand, ROHCoverIPsec makes compression possible before the ESP payload is encrypted, and so the Clear Text Data can be compressed, but not the ESP related fields like Padding, Pad Length and Next Header. ROHC and ROHCoverIPsec have been designed for bandwidth optimization, but not necessarily for constraint devices. As a result, defining ROHC and ROHCoverIPsec profiles is not sufficient to fulfill the complete set of Diet-ESP requirements listed in . In fact Diet-ESP MUST result in an light implementation that does not require implementation of the full ROHC and ROHCoverIPsec frameworks. In order to achieve ESP field compression, this document describes the Diet-ESP Context. This context contains all necessary parameters to compress an ESP packet. This Diet-ESP Context can be provided as input to proceed to Diet-ESP compression / decompression. This document uses the ROHC and ROHCoverIPsec framework to compress the ESP packet. The advantage of using ROHC and ROHCoverIPsec is that compression behavior follows a standardized compression framework. On the other hand, ROHC and ROHCoverIPsec frameworks are used in a stand alone mode, which means no ROHC communications between compressor and decompressor are considered. This enables specific and lighter implementations to perform Diet-ESP compression without implementing the ROHC or ROHCoverIPsec frameworks. All Diet-ESP implementations only have to agree on the Diet-ESP Context to become inter-operable. The remaining of the document is as follows. described the Diet-ESP Context. describes how the parameters of the Diet-ESP Context are used by the ROHC and ROHCoverIPsec framework to compress the ESP packet. This requires definition of new profiles and extensions. Finally, and provides a security and privacy analysis on Diet-ESP over standard ESP. Informational material have been added into the Appendix section. illustrates how an minimal Diet-ESP implementation may be used in IoT devices. lists the differences between Diet-ESP and Standard ESP. describes the interactions of Diet-ESP interacts with other compression protocols such as 6lowPAN and ROHC compression for other protocols than ESP. Finally, describes how Diet-ESP matches the requirements for Diet-ESP .

This document uses the following terminology: Internet of Things Last Significant Bytes The necessary alignment for IPv4 (32 bits) resp. IPv6 (64 bits) designates the original data that are carried by ESP. carries the encrypted Data Payload including cryptographic material like the IV and the ESP Trailer.

The Diet-ESP context provides the necessary parameters for the compressor and decompressor to perform the appropriated compression and decompression of the ESP packet. the different parameters of the Diet-ESP Context. Context Field Name Overview ALIGN Necessary Alignment for the specific device. SPI_SIZE Size in bytes of the SPI field in the sent packet. SN_SIZE Size in byte of the SN field in the sent packet. NH Presence of the Next Header field in the ESP Trailer. PAD Presence of the Pad Length field present in the ESP trailer. Alignment is the minimum alignment accepted by the hardware. Constrains may come from various reasons. Hardware may have some specific requirements, but also operating systems. For most servers CPU and OS have been designed with 32 bit or 64 bit alignment. As a result, IP headers have been standardized with 32 bits (resp. 64 bits in IPv6) alignment for each IP extension header. ESP is one of these extension headers with an Header (SPI and SN) of 64 bits and the Trailer (NH, PL, PAD) of (2 + PL) bytes. Since the trailer is part of the ESP extension header, it must provide the necessary padding for a correct alignment of the NH field to 32 (resp. 64) bits. The alignment may also be relevant if Block-Ciphers like AES-CBC needs an aligned payload to perform the encryption. To inter-oprate with the standard ESP, IP alignment must be 32 bit for IPv4 and 64 bit for IPv6. Diet-ESP reduces the ALIGN value from 32 bits for IPv4 or 64 bits for IPv6 to 8, 16, 32 and 64 bit alignment. Motivations to do so is to remove the Padding and other mandatory fields of the ESP packet. Then, most IoT embeds small 8 or 16 bit CPUs. Finally, even though ESP is an extension header, it is often the last extension header of a header-only IP packet. The ESP header is only read by the real receiver and is uninteresting for other devices like routers, placed between the to peers. As a result, there seems no real impact on the system if ESP extension header is not aligned. Note that the benefices of ALIGN also depends on the used cryptographic mode. More specifically AES-CTR has a 8 bit block whereas AES-CBC has a 128 bit block. As a result the use of AES-CBC with small Clear Text Data results in large encrypted Data with embedded padding. In other words, the alignment for one packet is always MAX(CIPHER_BLOCK_SIZE, ALIGN). ESP Security Policy Index is 4 byte long to identify the SAD-entry for incoming traffic. To interoperate with the standard ESP, the SPI_SIZE must be of 4 bytes. Diet-ESP omits, leaves unchanged, or reduces the SPI sent on the wire to the 0, 1, 2, 3 or 4 LSB. Compression only impacts the data sent on the wire and therefore SHOULD only deal with 4 byte decompressed SPIs in the SAD. This allows systems to send and receive multiple SPI_SIZE with different hosts. Decompressing the SPI at the receiver may involve IP addresses (see ). Compressing the SPI has significant security impacts as detailed in . It should be guided by 1) the number of simultaneous inbound SA the device is expected to handle and 2) reliability of the IP addresses in order to identify the proper SA for incoming packets. More specifically, a sensor with a single connection to a Security Gateway, may bind incoming packets to the proper SA based only in its IP addresses. In that case, the SPI may not be necessary. Other scenarios may consider using the SPI to index the SAs or may consider having multiple ESP channels with the same host from a single host. In that case one may choose a reduced length for the SPI. Note also that the value 0 for the SPI is not allowed to be sent on the wire as described in . ESP Sequence Number is 32 bit and extended SN is 64 bit long and used for anti-replay protection. To interoperate with standard ESP the SN_SIZE must be of 4 bytes. Diet-ESP omits, leaves unchanged or reduces SN sent on the wire to 0, 1, 2, 3 or 4 LSB. Decompressing the SN at the receiver is guided by a linear extrapolation of the expected received Sequence Number and the LSB-SN sent on the wire. To avoid packet overhead, this configuration is stored within the SA, whereas it remains valid during its lifetime. Therefore an implementer should consider the LSB window such that two consecutive received SN should not present a difference of more than the LSB window. In some cases, the received SN may increase by a high number e.g. using the time as the SN or because of a high number of packet loss. See for the related security considerations. Note that SN and SPI MUST be aligned to a multiple of the Alignment value (ALIGN). Next Header in ESP is used to identify the first header inside the ESP payload. To interoperate with standard ESP, the Next Header must be indicated and present. Diet-ESP is able to remove the Next Header field from the ESP-Trailer. Removing the Next Header is possible only if the underlying protocol can be derived from the Traffic Selector (TS) within the Security Association (SA). More specifically, the Next Header indicates whether the encrypted ESP payload is an IP packet, a UDP packet, a TCP packet or no next header. The NH can only be removed if this has been explicitly specified in the SA or if the device has a single application. Note that removing the Next Header impacts how encryption is performed. For example, the use of AES-CBC mode requires the last block to be padded, reaching a 128 bit alignment. In this case removing the Next Header increases the padding by the Next Header length, which is 8 bits. In this case, removing the Next Header provides few advantages, as it does not reduce the ESP packet length. With AES-CBC, the only advantage of removing the Next Header would be for data with the last block of 15 bytes. In that case, ESP pads with 15 modulo 16 bytes, sets the 1 byte pad length field to 15 and add the one byte Next Header field. This leads to 15 + 15 + 1 + 1 = 32 bytes to be sent. On the other hand, removing the Next Header would require only the concatenation of the pad length byte with a 0 value, which leads to 16 bytes to be sent. Other modes like AES-CTR do not have block alignment requirements, so the only alignment constraint comes from the device hardware alignment (ALIGN). Suppose A designates the alignment constraint from OS, hardware, encryption, packet format...). A is fixed and consider then any data of length k * A + A - 1 bytes with k an integer. Sending this data using ESP takes advantage of removing the Next Header as it reduces the number of bytes to be sent by A over the traditional ESP. As a result, for 8 bit alignment hardware (A = 1) removing the Next Header always prevent an unnecessary byte to be sent. With ESP, all packets have a Pad Length field. This field is usually present because ESP requires IP alignment which is ensured with padding. In order to interoperate with the standard ESP, the Pad Length must be indicated and be present. Diet-ESP considers removing the Padding and the Pad Length field. If PAD is present, then it is computed according to ALIGN. In fact, some devices might use an 8 bits alignment, in which case padding is not necessary. Similarly, sensors may send application data of fixed length matching the alignment. Note that alignment may be required by the device (8-bit, 16-bit, or more generally 32-bit), but it may also be required by the encryption block size (AES-CBC uses 128 bit blocks). With ESP these scenarios would result in an unnecessary Pad Length field always set to zero. Diet-ESP considers those case with no padding, and thus the Pad Length field can be omitted. Some additional parameters may be added to the Diet-ESP Context. Such parameters are defined in other documents, like the compression of protocol headers inside the encrypted ESP payload.

This section defines Diet-ESP on the top of the ROHC and ROHCoverIPsec framework. presents and explains the choice of these frameworks. This section is informational and its only goal is to position our work toward ROHC and ROHCoverIPsec.

ROHC enables the compression of different protocols of all layers. It is designed as a framework, where protocol compression is defined as profile. Each profile is defined for a specific layer, and ROHC compression in defines profiles for the following protocols: uncompressed, UDP/IP, ESP/IP and RTP/UDP/IP. The compression occurs between the IP and the MAC layer, and so remains independent of an eventual IP alignment. The general idea of ROHC is to classify the different protocol fields. According to the classification, they can either completely and always be omitted, omitted only after the fields has been sent once and registered by the receiver or partly sent and be regenerated by the receiver. For example, a static field value may be negotiated out of band (for example IP version) and thus not be sent at all. In some cases, the value is not negotiated out of band and is carried in the first packet (for example SPI, UDP ports). As a result, the first packet is usually not so highly compressed with ROHC. Finally, some variable fields (for example Sequence Number) can be represented partially by their Last Significant Bits (LSB) and regenerated by the receiver. The main issue encountered with ESP and ROHC is that ESP may contain encrypted data which makes compression between the IP and MAC layer complex to achieve. Therefore ROHC defines different compression of the ESP protocol (see ), so compression of the Clear Text Data can occur before the ESP encryption. Regular ROHC can compress the ESP header. If the packet is not encrypted, the rate of compression is extremely high as the whole packet including padding can be compressed in the regular ROHC stack, too. For encrypted payload ROHC defines ROHCoverIPsec (, , ) to compress the ESP payload before it is going to be encrypted. This leads to a second ESP stack, where another ROHC compressor (resp. decompressor) works (see ). Excluding the first packet which initializes the ROHC context, this makes ESP compression highly efficient.

The first drawback for ROHC and ROHCoverIPsec is, that it leads to two ROHC compression layers (ROHCoverIPsec before ESP encryption and ROHC before the MAC layer) in addition to two ESP implementations (Standard ESP and ROHCoverIPsec ESP). Both frameworks are quite complex and require a lot of resources which does not fit IoT requirements. Then ROHCoverIPsec also limits the compression of the ESP protocol, according to the IP restrictions. Padding remains necessary as IPsec is part of the IP stack which requires a 32 bits (resp. 64 bits for IPv6) aligned packet. This makes compression quite inefficient when small amount of data are sent. Note that mechanism to compress encrypted fields may be possible with ROHC only and without ROHCoverIPsec. Such mechanisms may be possible for fields like the Next Header or the Padding and Pad Length when the data sent is of fixed size. As the sizes of the fields are known the compressor may simply remove these fields. However, even in this case, it almost doubles the amount of computation on the receiver's side. In fact, the ROHC compressor would almost decompress and re-encrypt the compressed ESP payload before forwarding it to the IP stack. In addition, since the receiver has to re-encrypt the decompressed information before integrity of the packet can be checked, one can easily construct a DoS attack. Flooding the receiver with invalid packet causes the receiver to perform the complex encryption and authentication algorithm for each packet.

This section defines how the compression of all ESP fields is performed within the ROHC and ROHCoverIPsec frameworks. More especially fields that are in the ESP Header (i.e. the SPI and the SN) and the ICV are compressed by the ROHC framework. The other fields, that is to say those of the ESP Trailer, are compressed by the ROHCoverIPsec framework. Diet-ESP fits in the ROHC and the ROHCoverIPsec in a very specific way. Diet-ESP does not need any ROHC signaling between the peers. More specifically, ROHC Initialization and Refresh (IR), or ROHC IR-DYN or ROHC Feed back packet are not considered with Diet-ESP. The first reason is that fields are either STATIC or PATTERN and their value or profile is defined through the Diet-ESP Context agreement. This agreement is out of scope of ROHC, it is expected to be agreed by other protocols like IKEv2 and thus is considered as an out-of band agreement by the peers. Then, the profiles are applied for each Security Association that is unidirectional. In fact an IKEv2 negotiation results in two unidirectional SA. As a result, each SA the packets are sent in one direction only, which corresponds to the Unidirectional mode -- U-mode of ROHC. Diet-ESP only exchanges compressed data. How the compression / decompression occurs is defined by the Diet-ESP Context. Once the Diet-ESP Context has been agreed, both peers are in a Second Order (SO) State and exchange only compressed data. Diet-ESP only compresses ESP packets, it may include inner packet compression, but Diet-ESP does not make any assumption on the IP compression. This is made in order to make Diet-ESP interoperable with multiple IP compression protocols. Diet-ESP compresses partially STATIC fields as they are used as indexes by the receiver, and may not completely be removed.

The ROHC header field classifications are defined in Appendix A.1 of and Appendix A of . Field ROHC class Framework Encoding Method Diet-ESP Context Parameters SPI STATIC-DEF ROHC LSB SPI_SIZE SN PATTERN ROHC LSB SN_SIZE Padding PATTERN ROHCoverIPsec Removed PAD, ALIGN Pad Length PATTERN ROHCoverIPsec Removed PAD, ALIGN Next Header STATIC-DEF ROHCoverIPsec Removed NH The SPI indexes the SA, is negotiated by the two peers (e.g. via IKEv2 or manually) and remains the same during the session. Therefore, as defined in Appendix A.6 of this field is classified as STATIC-DEF. The compressed SPI consists in the SPI_SIZE LSB of the negotiated 32 bit SPI, and SPI_SIZE is provided by the Diet-ESP Context. The SN is used for anti-replay protection and is modified in every packet. In default cases, the ESP Sequence Number will be incremented by one for each packet sent. Therefore, as defined in Appendix A.6 of this field is classified as PATTERN. The compressed SN consists in the SN_SIZE LSB of the 32 bit or 64 bit SN, and SN_SIZE is provided by the Diet-ESP Context. Padding is used for alignment purposes and is computed on a per-packet basis. Therefore it is classified as PATTERN. The compressed Padding is defined by PAD and ALIGN provided by the the Diet-ESP Context. If PAD is set the Padding and Pad Length fields are removed. If PAD is unset, Padding is computed according to the ALIGN and the padding length is indicated in the PAD Length field. Pad Length indicates the length of the Padding field and is computed on a per-packet basis. Therefore it is classified as PATTERN. See Padding for the compressed Pad Length. The Next Header indicates the next layer in the inner ESP Payload. To be compressed the Next Header MUST remain the same during the session. This means that it MUST have been negotiated (e.g. by IKEv2) and can be derived from the Traffic Selectors. If this condition is met and the Next Header compression is requested by the peers with NH set in the Diet-ESP Context, then the Next Header field MUST be removed.

There are no IANA consideration for this document.

This section lists security considerations related to the Diet-ESP protocol. The Security Parameter Index (SPI) is used by the receiver to index the Security Association that contains appropriated cryptographic material. If the SPI is not found, the packet is rejected as no further checks can be performed. In Diet-ESP, the value of the SPI is not reduced, but compressed why the SPI value may not be fully provided between the compressor and the decompressor. On the other hand, its uncompressed value is provided to the ESP-procession and no weakness is introduced to ESP itself. On an implementation perspective, it is strongly recommended that decompression is deterministic. Compression and decompression adds some additional treatment to the ESP packet, which might be used by an attacker. In order to minimize the load associated to decompression, decompression is expected to be deterministic. The incoming compressed SPI with the associated IP addresses should output a single and unique uncompressed SPI value. If n uncompressed SPI values have to be considered, then the receiver could end in n signature checks which may be used by an attacker for a DoS attack. On a privacy perspective, until Diet-ESP is not deployed outside the scope of IoT and small devices, the use of a compressed SPI may provide an indication that one of the endpoint is a sensor. Such information may be used, for example, to evaluate the number of appliances deployed, or - in addition with other information, such as the time interval, the geographic location - be used to derive the type of data transmitted. The Sequence Number (SN) is used as an anti-replay attack mechanism. Compression and decompression of the SN is already part of the standard ESP namely the Extended Sequence Number (ESN). The SN in a standard ESP packet is 32 bit long, whether Diet-ESP enables to reduce it to 0 bytes and the main limitation to the compression a deterministic decompression. SN compression consists in indicating the least significant bits of the uncompressed SN on the wire. The size of the compressed SN must consider the maximum reordering index such that the probability that a later sent packet arrives before an earlier one. In addition the size of SN should also consider maximum consecutive packets lost during transmission. In the case of ESP, this number is set to 2^32 which is, in most real world case, largely over-provisioned. When the compression of the SN is not appropriately provisioned, the most significant bit value may be desynchronized between the sending and receiving parties. Although IKEv2 provides some re-synchronization mechanisms, in case of IoT the de-synchronization will most likely result in a renegotiation and thus DoS possibilities. Note that IoT communication may also use some external parameters, i.e. other than the compressed SN, to define whether a packet be considered or not and eventually derive the SN. One such scenario may be the use of time windows. Suppose a device is expected to send some information every hour or every week. In this case, for example, the SN may be compressed to zero bytes. Instead the SN may be derived by incrementing the SN every hour after the last received valid packet. Considering the time the packet is received make it possible to consider the time derivation of the sensor clock. Note also that the anti-replay mechanism needs to define the size of the anti-replay window. provides guidance to set the window size and are similar to those used to define the size of the compressed SN Padding, Pad Length and Next Header are fields stored inside the encrypted payload. They are part of the ESP payload so that ESP can be seen as an IP option. IP extension headers must have 32 bit Byte-Alignment in IPv4 [43] and 64 bit Byte-Alignment in IPv6 [6]. The main motivation for the alignment is the improvement of packet processing on 32 or 64 bit processors. As a result, compression of these static fields does not impact the security of Diet-ESP compared to the one provided by ESP.

Until Diet-ESP is not deployed outside the scope of IoT and small devices, the use of a compressed SPI may provide an indication that one of the endpoint is a sensor. Such information may be used, for example, to evaluate the number of appliances deployed, or - in addition with other information, such as the time interval, the geographic location - be used to derive the type of data transmitted. If incremented for each ESP packet, the SN may leak some information like the amount of transmitted data or the age of the sensor. The age of the sensor may be correlated with the software used and the potential bugs. On the other hand, re-keying will re-initialize the SN, but the cost of a re-keying may not be negligible and thus, frequent re-keying can be considered. In addition to the re-key operation, the SN may be generated in order to reduce the accuracy of the information leaked. In fact, the SN does not have to be incremented by one for each packet it just has to be an increasing function. Using a function such as a clock may prevent characterizing the age or the use of the sensor. Note that the use of such function may also impact the compression efficiency and result in larger compressed SN.

The current work on Diet-ESP results from exchange and cooperation between Orange, Ludwig-Maximilians-Universitaet Munich, Universite Pierre et Marie Curie. We thank Daniel Palomares and Carsten Bormann for their useful remarks, comments and guidances on the design. We thank Sylvain Killian for implementing an open source Diet-ESP on Contiki and testing it on the FIT IoT-LAB funded by the French Ministry of Higher Education and Research. We thank the IoT-Lab Team and the INRIA for maintaining the FIT IoT-LAB platform and for providing feed backs in an efficient way. We thank Ana Minaburo for her ROHC review.

&RFC2119; &RFC3095; &RFC3602; &RFC3686; &RFC4104; &RFC4301; &RFC4303; &RFC4309; &RFC4555; &RFC7321; &RFC5225; &RFC5857; &RFC5858; &RFC7296; &RFC5856; &I-D.raza-6lowpan-ipsec; Diet-IPsec: ESP Payload Compression of IPv6 / UDP / TCP / UDP-Lite &I-D.mglt-6lo-diet-esp-requirements; Future Internet of Things (FIT) IoT-LAB

Diet-ESP has been designed to enable light implementation. This section illustrates the case of a sensor sending a specific amount of data periodically. This section is not normative and has only an illustrative purpose. In this scenario the sensor measures a temperature every minute and sends its value to a gateway, which is assumed to collect the data. The data is sent in an UDP packet and there is no other connection between the two peers. The communication between the sensor and the gateway should be secured by a Diet-ESP connection in transport mode. Therefore the following context is chosen: Sensors are not expected to be 32 or 64 bit CPU, and micro-controllers are expected to support 8 bit alignment. As it is a single connection, the SA can be identified by using the IP addresses. As a result the SPI is not needed. Because only one packet every minute is sent, the packets will arrive at the receiver in an ordered way. The receiver can rebuild the SN which should be present in the packet, assuming the SN is incremented by one for each packet. Note that setting SN to 0 does not mean there is no anti replay protection. In fact, the SN is needed for the computation of the Diet-ESP ICV. Since the protocol is always UDP, the Next header can be omitted. With 8 bit alignment Padding has always a Pad Length of 0. Setting PAD to "Remove Padding" removes the Pad Length field. The ICV is chosen to be 32 bits in order to find a fair trade-off between security and energy costs. Encapsulating the outgoing Diet-ESP packet is proceeded as follows: SAD lookup for outgoing traffic Compress ESP payload incl. Transport Header (UDP) Encrypt IP payload Build ESP header Calculate Diet-ESP ICV Compress ESP header Add ${Diet-ESP_ICV_SIZE} LSB of ICV to the packet.

Incoming Diet-ESP packet is processed as follows: SAD lookup for incoming traffic traffic Decompress ESP-header incl. Transport Header (UDP) Calculate packet Diet-ESP ICV Check integrity with ${Diet-ESP_ICV_SIZE} LSB of Diet-ESP ICV Check anti-replay Decrypt IP payload (excluding ICV) Decompress ESP payload

This section details how to use Diet-ESP to send and receive messages. The use of Diet-ESP is based on the IPsec architecture and ESP . We suppose the reader to be familiar with these documents and we list here possible adaptations that may be involved by Diet-ESP.

Each ESP packet has a fixed alignment to 32 bits (resp. 64 bits in IPv6). For Diet-ESP each device has an internal parameter that defines the minimal acceptable alignment. ALIGN SHOULD be a the maximum of the peer's minimal alignment. Diet-ESP Context with SPI_SIZE + SN_SIZE that is not a multiple of ALIGN MUST be rejected.

For devices that are configured with a single SPI_SIZE value can process inbound packet as defined in . As such, no modifications is required by Diet-ESP. Detecting Inbound Security Association: Identifying the SA for incoming packets is a one of the main reasons the SPI is send in each packet on the wire. For regular ESP (and AH) packets, the Security Association is detected as follows: Search the SAD for a match on {SPI, destination address, source address}. If an SAD entry matches, then process the inbound ESP packet with that matching SAD entry. Otherwise, proceed to step 2. Search the SAD for a match on {SPI, destination address}. If the SAD entry matches, then process the inbound ESP packet with that matching SAD entry. Otherwise, proceed to step 3. Search the SAD for a match on only {SPI} if the receiver has chosen to maintain a single SPI space for AH and ESP, or on {SPI, protocol} otherwise. If an SAD entry matches, then process the inbound ESP packet with that matching SAD entry. Otherwise, discard the packet and log an audible event. For device that are dealing with different SPI_SIZE SPI, the way inbound packets are handled differs from the . In fact, when a inbound packet is received, the peer does not know the SPI_SIZE. As a result, it does not know the SPI that applies to the incoming packet. The different values could be the 0 (resp. 1, 2, 3 and 4) first bytes of the IP payload. Since the size of the SPI is not known for incoming packets, the detection of inbound SAs has to be redefined in a Diet-ESP environment. In order to ensure a detection of a SA the above described regular detection have to be done for each supported SPI size (in most cases 5 times). In most common cases this will return a unique Security Association. If there is more than one SA matching the lookup, the authentication MUST be performed for all found SAs to detect the SA with the correct key. In case there is no match, the packet MUST be dropped. Of course this can lead into DoS vulnerability as an attacker recognizes an overlap of one or more IP-SPI combinations. Therefore it is highly recommended to avoid different values of the SPI_SIZE for one tuple of Source and Destination IP address. Furthermore this recommendation becomes mandatory if NULL authentication is supported. This is easy to implement as long as the sensors are not mobile and do not change their IP address. The following optimizations MAY be considered for sensor that are not likely to perform mobility or multihoming features provided by MOBIKE or any change of IP address during the lifetime of the SA. The SPI_SIZE is defined as part of the SPI sent in each packet. Therefore the receiver has to choose the most significant 2 bits of the SPI in the following way in order to recognize the right size for incoming Diet-ESP packets: SPI_SIZE of 1 byte is used. SPI_SIZE of 2 byte is used. SPI_SIZE of 3 byte is used. SPI_SIZE of 4 byte is used. If the the value 0 is chosen for the SPI_SIZE this option is not feasible. IP address based search is one optimization one may choose to avoid several SAD lookups. It is based on the IP address and the stored SPI_SIZE, which MUST be the same value for each SA of one IP address. Otherwise it can't neither be ensured that an SA is found nor that the correct one is found. Note that in case of mobile IP the SPI_SIZE MUST be updated for all SAs related to the new IP address which may cause renegotiation. shows this lookup described below. Search most significant SA as follows: Search the first SA for a match on {destination address, source address}. If an SA entry matches, then process to step 2. Otherwise, proceed to step 1.2. Search the first SA for a match on {source address}. If an SA entry matches, then process to step 2. Otherwise, drop the packet. Identify the size of the compressed SPI for the found SA, stored in the Diet-ESP context. Note that all SAs to one IP address MUST have the same value for the SPI_SIZE. Then go to step 3. If the SPI_SIZE is NOT zero, read the SPI_SIZE SPI from the packet and perform a regular SAD lookup as described in . If the SPI_SIZE is zero, the SA from step 1 is unique and can be used. Note that some implementations may collect all SPI matching the IP addresses in step 2 to avoid an additional lookup over the whole SAD. This is implementation dependent. If the sensor is likely to change its IP address, the outcome may be a given IP address associated to different SPI_SIZE. This case may occur if one IP address has been used by a device not anymore online, but the SA has not been removed. The IP has then been provided to another device. In this case the Diet-ESP Context SHOULD NOT be accepted by the Security Gateway when the new Diet-ESP Context is provided to the Security Gateway. At least the Security Gateway can check the previous peer is reachable and then delete the SA before accepting the new SA. Another case may be that a sensor got two interfaces with different IP addresses, negotiates a different SPI_SIZE on each interface and then use MOBIKE to move the IPsec channels from one interface to the other. In this case, the Security Gateway SHOULD NOT accept the update, or force a renegotiation of the SPI_SIZE for all SAs, basically by re-keying the SAs.

|'---'| | \ = 0 ? / / | SAD | | \_________/ / + + | |no / '---' | V / | +-----------------+ / | | find 1st SA to |--/ | | SPI +IP | | | address | | +-----------------+ | | | V | +-----------------+ | | Diet-ESP packet | +-->| procession | +-----------------+ ]]>

Outgoing lookups for the SPI are performed in the same way as it is done in ESP. The Traffic Selector for the packet is searched and the right SA is read from the SA. The SPI used in the packet MUST be reduced to the value stored in SPI_SIZE.

Sequence number in ESP can be of 4 bytes or 8 bytes for extended ESP. Diet-ESP introduces different sizes. One way to deal with this is to add a MAX_SN value that stores the maximum value the SN can have. Any new value of the SN will be check against this MAX_SN.

NH, TH, IH, P indicate fields or payloads that are removed from the Diet-ESP packet. How the Diet-ESP packet is generated depends on the length Payload Data LPD, BLCK the block size of the encryption algorithm and the device alignment ALIGN. We note M = MAX(BLCK, ALIGN). Compress the headers inside the ESP payload. if PAD and NH are set to present: Diet-ESP considers both fields Pad Length and Next Header. The Diet-ESP Payload is the encryption of the following clear text: Payload Data | Padding of Pad Length bytes | Pad Length field | Next Header field. The Pad Length value is (LPD + 2) mod [M]. if PAD is set to present and NH is set to removed: Diet-ESP considers the Pad Length field but removes the Next Header field. The ESP Payload is the encryption of the following clear text: Payload Data | Padding of Pad Length bytes | Pad Length field | Next Header field. The Pad Length value is (LPD + 1) mod [M]. if PAD is set to removed and NH is set to present: Diet-ESP considers the Next Header but do not consider the Pad Length field or the Padding Field. This is valid as long as (LPD + 1) mod [M] = 0. If M = 1 as it is the case for AES-CTR this equation is always true. On the other hand the use of specific block size requires the application to send specific length of application data. if PAD and NH are set to removed: Diet-ESP does consider neither the Next Header field nor the Pad Length field nor the Padding Field. This is valid as long as LPD mod [M] = 0. If M = 1 as it is the case for AES-CTR this equation is always true. On the other hand the use of specific block size requires the application to send specific length of application data. Encrypt the Diet-ESP payload. Add ESP header. Generate and add Diet-ESP ICV. Compress ESP header.

Decryption is for performed the other way around. After SAD lookup, authenticating and decrypting the Diet-ESP payload the original packet is rebuild as follows: Decompress ESP header. Generate Diet-ESP ICV and check ICV send in the packet. Check anti-replay Remove compressed header. Encrypt the Diet-ESP payload. if PAD and NH are set to removed: Diet-ESP does consider neither the Next Header field nor the Pad Length field nor the Padding Field. The Next Header field of the IP packet is set to the protocol defined for incoming traffic within the Traffic Selector of the SA. Because there is no Padding it is disregarded. if PAD is set to removed and NH is set to present: Diet-ESP considers the Next Header but do not consider the Pad Length field or the Padding Field. The Next Header field of the IP packet is set to the value within the Diet-ESP trailer. if PAD is set to present and NH is set to removed: Diet-ESP considers the Pad Length field but removes the Next Header field. The Next Header field of the IP packet is set to the protocol defined for incoming traffic within the Traffic Selector of the SA. The Pad Length field is read and the Padding is removed from the Data Payload which results the original Data Payload. if PAD and NH are set to present: Diet-ESP considers both fields Pad Length and Next Header. The Next Header field of the IP packet is set to the value within the Diet-ESP trailer. The Pad Length field is read and the Padding is removed from the Data Payload which results the original Data Payload. Decompress the headers inside the ESP payload.

Diet-ESP exclusively defines compression for the ESP protocol as well as the ESP payload. It does not consider compression of the IP protocol. ROHC or 6LoWPAN may be used by a sensor to compress the IP (resp. IPv6) header. Since compression usually occurs between the MAC and IP layers, there are no expected complications with this family of compression protocols.

Diet-ESP smoothly interacts with 6LoWPAN. Every 6LoWPAN compression header (NHC_EH) has an NH bit. This one is set to 1 if the following header is compressed with 6LoWPAN. Similarly, the NH bit is set to 0 if the following header is not compressed with 6LowPAN. Thus, interactions between 6LowPAN and Diet-ESP considers two case: 1) NH set to 0: 6LowPAN indicates the Diet-ESP payload is not compressed and 2) NH set to 1: 6LowPAN indicates the Diet-ESP payload is compressed. Suppose 6LowPAN indicates the Next Header ESP is not compressed by 6LowPAN. If the peers have agreed to use Diet-ESP, then the ESP layer on each peers receives the expected Diet-ESP packet. Diet-ESP is fully compatible with 6LowPAN ESP compression disabled. Suppose 6LowPAN indicates the Next Header ESP is compressed by 6LowPAN. ESP compression with 6LowPAN considers the compression of the ESP Header, that is to say the compression of the SPI and SN fields. As a result 6LowPAN compression expects a 4 byte SPI and a 4 byte SN from the ESP layer. Similarly 6LowPAN decompression provides a 4 byte SPI and a 4 byte SN to the ESP layer. If the peers have agreed to use Diet-ESP and one of them uses 6LowPAN ESP compression, then the Diet-ESP MUST use SPI SIZE and the SN SIZE MUST be set to 4 bytes.

ROHC and ROHCoverIPsec have been used to describe Diet-ESP. This means the ROHC and ROHCoverIPsec concepts and terminology have been used to describe Diet-ESP. In that sens Diet-ESP is compatible with the ROHC and ROHCoverIPsec framework. The remaining of the section describes how Diet-ESP interacts with ROHC and ROHCoverIPsec profiles and payloads. ROHC compress packets between the MAC and the IP layer. Compression can only be performed over non encrypted packets. As a results, this section considers the case of an ESP encrypted packet and an ESP non encrypted packet. For encrypted ESP packet, ROHC profiles that enable ESP compression (e.g. profile 0x0003 and 0x1003) compresses only the ESP Header and the IP header. To enable ROHC compression a Diet-ESP packet MUST present an similar header as the ESP Header, that is a 4 byte SPI and a 4 byte SN. This is accomplished by setting SPI_SIZE = 4 and SN_SIZE = 4 in the Diet-ESP Context. Reversely, if the Diet-ESP packet presents a 4 byte SPI and a 4 byte SN, ROHC can proceed to the compression. Note that Diet-ESP does not consider the IP header, then ESP and Diet-ESP are encrypted, thus ROHC can hardly make the difference between Diet-ESP and ESP packets. For encrypted packets, the only difference at the MAC layer might be the alignment. For non encrypted ESP packet, ROHC MAY proceed to the compression of different fields of ESP and other layers, as the payload appears in clear. ROHC compressor are unlikely to deal with ESP fields compressed by Diet-ESP. As a result, it is recommended not to combine Diet-ESP and ROHC ESP compression with non encrypted ESP packets.

ROHC or 6LoWPAN are not able to compress the ESP payload, as long as it is encrypted. Diet-ESP describes how to compress the ESP-Trailer, which is part of the encrypted payload can be compressed. 6LowPANoverIPsec (section 2 of ) and ROHCoverIPsec define the compression of the ESP payload, more specifically the upper layer headers (e.g. IP header or Transport layer header). These protocols need a second, modified ESP stack in order to make the payload compression possible. Then the packets with compressed payload are forwarded to this second ESP stack which can compress or decompress the payload. Diet-ESP and its extensions also needs a modified ESP stack in order to perform the compression of ESP payload possible. In addition, fields that are subject to compression are most likely to be the same with Diet6ESP and 6LowPANoverIPsec and/or ROHCoverIPsec. Therefore, if a device implements Diet-ESP and 6LowPANoverIPsec and/or ROHCoverIPsec the developer needs to define an order the various frameworks perform the compression. Currently this order has not been defined, and Diet-ESP is unlikely to be compatible with 6LowPANoverIPsec and/or ROHCoverIPsec. Integration of Diet-ESP and 6LowPANoverIPsec and/or ROHCoverIPsec has not been considered in the current document as Diet-ESP has been designed to avoid implementations of 6LowPANoverIPsec and/or ROHCoverIPsec frameworks to be implemented into the devices. Diet-ESP has been designed to be more lightweight than 6LowPANoverIPsec and/or ROHCoverIPsec by avoiding negotiations between compressors and decompressors.

lists the requirements for Diet-ESP. This section position Diet-ESP described in this document toward these requirements. Diet-ESP is completly based on IPsec/ESP and as such benefits from the standard IPsec security. Diet-ESP does not introduces vulnerabilities to IPsec/ESP. The only difference is that compression results in sending less bytes on the wire. In return the bytes sent over the wire are decompress to feed the IPsec stack with the appropriated bytes. Compression MAY require a non standard IPsec/ESP implementation, as some fields may have been removed. This fields are packet descriptors (like padding, length...), and are not related to the security of the standard IPsec/ESP. Diet-ESP is fully derived from IPsec/ESP ROHC and ROHCoverIPsec and as such relies on the security of these protocols. Diet-ESP is able to handle alignments of 8, 16, 32 and 64 bits. is not in the scope of Diet-ESP. Announcement of the Byte-Alignment should be performed by IKEv2. Diet-ESP does not modify how encryption occurs. It only changes the encrypted payload, which is one of the parameters for the encryption function. Therefore Diet-ESP is able to work with any encryption defined in which also includes AES-CCM . Combined Mode algorithm (e.g. AES-CCM, AES-GCM) have an additional parameter, called Addition Authentication Data (AAD). This AAD requires the uncompressed ESP header that is to say the full SPI and SN. These parameters are not removed by Diet-ESP. There are well known by the two peer. The ESP Header MUST be uncompressed before proceeding to encryption/ decryption. Diet-ESP can remove all static and compress fields from the protocol. The inner payload compression mechanisms are not defined in this document. This aspect is the purpose of [draft-mglt-inner-compression] Diet-ESP compressed packet fields are always a number of bytes -- that is Diet-ESP do not result in compressed fields that are not expressed in a natural number of bytes. Diet-ESP allows the developer define the maximum compression within the Diet-ESP context. The way the agreement is done, is out of scope of this document and is described in [draft-mglt-diet-esp-ikev2]. Each field in the packet can be compressed separately, which provides high flexibility. Since Diet-ESP does not propose compression method flexibility. The proposed methods are generic enough and there is not advantage for this flexibility and so it does not seems appropriated for Diet-ESP. Each Diet-ESP client can have his own set of supported contexts. The negotiation is out of scope of this document and described in [draft-mglt-diet-esp-ikev2]. Diet-ESP adds small complexity to Standard ESP, like described in . In- and Outbound packet procession is straight-forward, like shown in and . provides a implementation guideline for a minimal use case. This one can be ported to any other use case. Diet-ESP is easy to configure and provides a default-context if a developer does not want to dive into the details of Diet-ESP. Diet-ESP can interact with 6LoWPAN and ROHC IP compression, but SHOULD be able to interact with all future compression applying after the IP layer as well. Compatibility with Standard ESP 1: Diet-ESP can be implemented instead of, nearby or like an add-on to an existing Standard ESP implementation. Compatibility with Standard ESP 2: Diet-ESP is able to work without compression and works with 32 and 64 bits alignment, which makes it compatible with Standard ESP.

[draft-mglt-6lo-diet-esp-00.txt]: Changing affiliation [draft-mglt-6lo-diet-esp-00.txt]: Updating references [draft-mglt-ipsecme-diet-esp-01.txt]: Diet ESP described in the ROHC framework ESP is not modified. [draft-mglt-ipsecme-diet-esp-00.txt]: NAT consideration added. Comparison actualized to new Version of 6LoWPAN ESP. [draft-mglt-dice-diet-esp-00.txt]: First version published.