Internet Engineering Task Force AVT Working Group Baugher, McGrew, INTERNET-DRAFT Oran (Cisco) Expires: April 2002 Blom, Carrara,Naslund, Norrman (Ericsson) November 2001 The Secure Real Time Transport Protocol Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document describes the Secure Real Time Transport Protocol (SRTP), a profile of the Real Time Transport Protocol (RTP), which can provide confidentiality, message authentication, and replay protection. SRTP can achieve high throughput and low packet expansion. SRTP proves to be a suitable protection for heterogeneous environments, i.e. environments including both wired and wireless links. To get such features, default transforms are described, based on an additive stream cipher for encryption, a keyed-hash based function for message Baugher, et al. [Page 1] INTERNET-DRAFT SRTP November, 2001 authentication, and an 'implicit' index for sequencing based on the RTP sequence number. TABLE OF CONTENTS 1. Notational Conventions.........................................3 2. Goals..........................................................3 3. SRTP Framework.................................................4 3.1 SRTP Cryptographic Contexts...................................6 3.1.1 Transform-independent parameters............................6 3.1.2 Transform-dependent parameters..............................7 3.1.3 Mapping SRTP Packets to Cryptographic Contexts..............7 3.2 SRTP Packet Processing........................................7 3.2.1 Packet Index Determination..................................8 3.2.2 Cryptographic Transforms....................................9 3.2.3 Replay Protection...........................................10 3.3 Secure RTCP...................................................10 4. Pre-Defined Transforms.........................................13 4.1 Encryption....................................................13 4.1.1 AES in Counter Mode.........................................15 4.1.1.1 Keystream generation......................................15 4.1.2 AES in f8-Mode..............................................15 4.1.2.1 Keystream Generation......................................16 4.1.2.2 SRTP IV Formation.........................................17 4.1.2.3 SRTCP IV Formation........................................17 4.1.3 NULL Cipher.................................................18 4.2 Message Authentication and Integrity..........................18 4.2.1. HMAC/SHA1..................................................18 4.2.2 TMMH/16.....................................................18 4.3 Key Derivation................................................20 4.3.1 Key Derivation Algorithm....................................20 4.3.2 AES-CM PRF..................................................21 4.3.3 SRTCP Key Derivation........................................21 5. Default and Mandatory Transforms...............................22 5.1 Encryption: AES-CM............................................22 5.2 Authentication/Integrity: HMAC/SHA1...........................22 5.3 Key Derivation: AES-CM PRF....................................22 6. SRTP Parameters................................................22 7. Adding SRTP Transforms.........................................23 8. Rationale......................................................23 8.1 Key derivation................................................23 8.2 Salting key...................................................24 8.3 TMMH _ Message Integrity from Universal Hashing...............24 8.4 Data Origin Authentication considerations.....................24 9. Key Management Considerations..................................25 10. Security Considerations.......................................25 10.1 Key Usage....................................................25 10.2 SSRC collision and two-time pad..............................26 10.3 Confidentiality of the RTP Payload...........................26 10.4 Confidentiality of the RTP Header............................27 Baugher, et al. [Page 2] INTERNET-DRAFT SRTP November, 2001 10.5 Integrity of the RTP packet..................................27 10.5.1 Integrity of the RTP header: IHA...........................28 11. Interaction with Forward Error Correction mechanisms..........28 12. IANA Considerations...........................................29 13. Open issue....................................................29 14. Acknowledgements..............................................29 15. Author's Addresses............................................29 16. References....................................................30 Appendix A: Pseudocode for Index Determination, and ROC and s_l Update............................................32 Appendix B: Test Vectors..........................................32 B.1 AES-f8 Test Vectors...........................................32 B.2 AES-CM Test Vectors...........................................33 B.3 TMMH/16 Test Vectors..........................................34 1. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Terminology is conform to [RFC2828]. By convention, the most left bit (byte) is the most significant one. By XOR we mean bitwise addition modulo 2 of binary strings, and || denotes concatenation. E.g. if C = A || B, then the most significant bits of C are the bits of A, and the least significant bits of C equals the bits of B. Hexadecimal numbers are prefixed by 0x. At the time of writing, NIST has not published the Advanced Encryption Standard, AES [AES]. However, as it is clear that AES will be the Rijndael algorithm as specified in [AES], we shall throughout this document let AES denote the block cipher Rijndael. 2. Goals The security goals for SRTP are to ensure: * the confidentiality of the RTP payload, and * the integrity protection of the entire RTP packet, together with protection against replayed RTP packets. Each of these security services is optional and independent. Other, functional, goals for the protocol are: * a framework that permits upgrade to new cryptographic transforms, Baugher, et al. [Page 3] INTERNET-DRAFT SRTP November, 2001 * low bandwidth cost, i.e. a framework preserving RTP header compression efficiency, and, asserted by the pre-defined transforms: * a low computational cost, * a small footprint (i.e. small code size and data memory for keying information and replay lists), * limited packet expansion to support the bandwidth economy goal, * independence from the underlying transport, network, and physical layer used by RTP, in particular high tolerance to packet loss and re-ordering, and robustness to transmission bit-errors. The described security services are also provided for RTCP, the control protocol defined for RTP [RFC1889], with the exception that integrity and replay protection for the RTCP packets are mandatory when SRTP services are applied to the RTP packets of the corresponding session. These properties ensure that SRTP is a suitable protection scheme for RTP in both wired and wireless scenarios. 3. SRTP Framework RTP is the Real Time Transport Protocol [RFC1889]. We define SRTP as a profile of RTP, in a way analogous to RFC1890 which defines the audio/video profile for RTP. Conceptually, we consider a 'bump in the stack' implementation which resides between the RTP application and the transport layer, which intercepts RTP packets and then forwards an equivalent SRTP packet on the sending side, and which intercepts SRTP packets and passes an equivalent RTP packet up the stack on the receiving side. Baugher, et al. [Page 4] INTERNET-DRAFT SRTP November, 2001 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |V=2|P|X| CC |M| PT | sequence number | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | timestamp | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | synchronization source (SSRC) identifier | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | contributing source (CSRC) identifiers | | | .... | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | RTP extension (optional) | | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | payload | | | | .... | +>+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | authentication tag (optional) | | | | | | | | .... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +- Encrypted Portion +---- Authenticated Portion Figure 1. The format of an SRTP packet. The format of an SRTP packet is illustrated in Figure 1. The optional authentication tag is the only field defined by SRTP that is not in RTP. The added field is: Authentication tag: variable length, optional The authentication tag shall be used to carry authentication data. The Authenticated Portion of an SRTP packet consists of the entire equivalent RTP packet. Note that, if encryption and authentication are applied, then 'payload' in the Authenticated Portion refers to the correspondent encrypted payload. The authentication tag provides authentication of the RTP header and payload, and it indirectly provides replay protection by authenticating the sequence number. Baugher, et al. [Page 5] INTERNET-DRAFT SRTP November, 2001 The Encrypted Portion of an SRTP packet consists of the RTP payload of the equivalent RTP packet. 3.1 SRTP Cryptographic Contexts Each SRTP session requires the sender and receiver to maintain cryptographic state information. This information is called the cryptographic context. By a session key, we mean a key that is to enter a cryptographic transform (e.g. encryption or authentication), and a master key is a random bit string (given by the key management protocol) from which session keys are derived in a cryptographically secure way. 3.1.1 Transform-independent parameters The transport-independent parameters of the cryptographic context consists of: * a 32-bit rollover counter, ROC, which records how many times the 16-bit RTP sequence number has been reset to zero after passing through 65,535. Unlike the sequence number, SEQ, which SRTP extracts from the RTP packet header, the ROC is maintained by SRTP. This ROC is thus a parameter internal to SRTP. * for the receiver only, a sequence number s_l, which is the last received sequence number (possibly authenticated, if authentication is provided). Here, 'sequence number' refers to the 16-bit SEQ carried in the RTP packet header. * identifier for the encryption algorithm, i.e. the cipher and its mode of operation, and related parameters, * identifier for the authentication protection algorithm, and related parameters (when authentication is provided), * a replay list L, maintained by the receiver only (when authentication is provided), * integers n_e and n_a, determining the length of the session keys for encryption and authentication, * the master key(s), * a 16-bit integer, the session key derivation-rate, * FirstSEQ+ROC and LastSEQ+ROC as key lifetime for each of the master keys (FirstSEQ and LastSEQ are the RTP sequence numbers inside whose Baugher, et al. [Page 6] INTERNET-DRAFT SRTP November, 2001 range the master key is valid, and ROC is the rollover counter). These values are absolute quantities, not relative. 3.1.2 Transform-dependent parameters Any encryption, authentication/integrity, and key derivation parameters that depend on the transform definitions are defined in the Transforms section. Future SRTP transform specifications MUST include a section to list the cryptographic context's parameters for that transform. 3.1.3 Mapping SRTP Packets to Cryptographic Contexts Recall that an RTP session for each participant is defined [RFC1889] by a pair of destination transport addresses (one network address plus a port pair for RTP and RTCP), and that a multimedia session is defined as a collection of RTP sessions. For example, a particular multimedia session could include an audio RTP session, a video RTP session, and a text RTP session. A cryptographic context shall be uniquely identified by the triplet context identifier: where the destination network address and the destination transport port are the ones in the current packet. It is assumed that, when presented with this information, the key management returns a context with the information as described in Section 3.1. 3.2 SRTP Packet Processing To construct a proper SRTP packet, given an RTP packet, the sender does the following: 1. Determine which cryptographic context to use as described in Section 3.1.3. 2. Determine the index of the SRTP packet as described in Section 3.2.1, using the rollover counter in the cryptographic context and the sequence number in the RTP packet. 3. Determine the session keys, as described in Section 4.3. 4. Encrypt the Encrypted Portion of the packet (see Section 4, for the defined ciphers), using the encryption keys found in Step 3. Baugher, et al. [Page 7] INTERNET-DRAFT SRTP November, 2001 5. If authentication is provided, compute the authentication tag for the Authenticated Portion of the packet, as described in Section 4, using the index determined in Step 2 and the authentication key found in Step 3. Note that the Encrypted Portion is encrypted before the authentication tag is computed. To authenticate and decrypt a SRTP packet, the receiver does the following: 1. Determine which cryptographic context to use as described in Section 3.1.3. 2. Estimate the index of the SRTP packet from the rollover counter in the cryptographic context and the sequence number in the RTP packet, as described in Section 3.2.1. 3. Determine the session keys, as described in Section 4.3. 4. If authentication is provided, check if the packet has been replayed, by checking the Replay List to ensure that no packet with that index has been received and authenticated before. If that index is in the list, then the packet has been replayed and is invalid. It MUST be discarded, and the event SHOULD be logged. Next, perform verification of the authentication tag, using the authentication key and packet index from Step 2. If the result is 'AUTHENTICATION FAILURE' (see Section 4), the packet MUST be discarded from further processing and the event SHOULD be logged. 5. Decrypt the Encrypted Portion of the packet (see Section 4, for the defined ciphers), using the decryption keys found in Step 3. 6. Update the rollover counter and last sequence number, s_l, in the local context to the values used in the packet index estimated in Step 2. 3.2.1 Packet Index Determination SRTP implementations use an 'implicit' packet index for sequencing. When the session starts, the sender side shall set the rollover counter, ROC, to zero. Each time the RTP sequence number, SEQ, wraps modulo 2^16, the sender side shall increment ROC by one. The sender's packet index is then defined as i = 65,536 * ROC + SEQ. Receiver-side implementations use the RTP sequence number to reconstruct the correct index (that is, location in the sequence of all RTP packets). Also here, the index is defined as SEQ + ROC * 65,536, where the RTP sequence number is SEQ and the rollover Baugher, et al. [Page 8] INTERNET-DRAFT SRTP November, 2001 counter is ROC, maintained locally by the receiver as described below. A robust approach for the proper use of a rollover counter requires its handling and use to be well defined. In particular, out-of-order RTP packets with sequence numbers close to 65,536 or zero must be properly dealt with. A receiver reconstructs the index i of a packet with sequence number SEQ using the estimate i = 65,536 * v + SEQ, where v is chosen from the set { ROC-1, ROC, ROC+1 } such that i is closest to the value 65,536 * ROC + s_l. If the value ROC+1 is used, then the rollover counter ROC in the cryptographic context is incremented by one (see Appendix A). The index i is used in replay protection (Section 3.2.3), encryption and authentication (Section 4), and for the key derivation (Section 4.3). As the rollover counter is 32 bits long, the maximum number of packets in any given SRTP session is 2^48 = 281,474,976,710,656. After that number of SRTP packets have been sent with a given key, the sender MUST not send any more packets with that key. This limitation enforces a security benefit by providing an upper bound on the amount of traffic that can pass before cryptographic keys are changed. Re-keying (see Section 9) MUST be triggered, no later than after this amount of traffic, and MAY be triggered earlier, e.g. for increased security and access control to media. Re-occurring key derivation, as determined by a non-zero derivation rate (see Section 4.3), gives even stronger security benefits, but does NOT change the above absolute maximum value. For the receiver, the 'implicit index' approach works as long as the reorder and loss of the packets is not too great. In particular, 32,768 packets would need to be lost, or a packet would need to be 32,768 packets out of sequence in order for synchronization to be lost. Such drastic loss or reorder is likely to disrupt the RTP application itself. 3.2.2 Cryptographic Transforms While there are numerous encryption and message authentication algorithms that can be used in SRTP, we define (Section 4) default algorithms in order to avoid the complexity of specifying the encodings for the signaling of algorithm and parameter identifiers. The defined algorithms have been chosen as they fulfil the goals Baugher, et al. [Page 9] INTERNET-DRAFT SRTP November, 2001 listed in Section 2. Recommendation on how to extend SRTP with new transforms are given in Section 7. 3.2.3 Replay Protection Robust replay protection is possible when authentication of RTP packets is present. A packet is 'replayed' when it is stored by an adversary, and then re-injected into the network. SRTP provides protection against such attacks whenever authentication is provided, through the storage of the indices of the most recently received and authenticated packets. Each SRTP receiver maintains a Replay List, which conceptually contains the indices of all of the packets which have been received and authenticated. In practice, the list can use a 'sliding window' approach, so that a fixed amount of storage suffices for replay protection. Packet indices which lag behind the packet index in the context by more than SRTP-WINDOW-SIZE can be assumed to have been received, where SRTP-WINDOW-SIZE is a parameter that MUST be at least 64, and which MAY be set to a higher value. The Replay List can be efficiently implemented by using a bitmap to represent which packets have been received, as described in the Security Architecture for IP [RFC2401]. Note that there are no provisions for managing transmitted Sequence Number values among multiple senders using the same crypto contexts, thus the anti-replay service SHOULD NOT be used in a multi-sender environment that employs a single crypto context. 3.3 Secure RTCP Secure RTCP follows the definition of Secure RTP. SRTCP is defined as a profile of RTCP, and it adds two mandatory new fields to the RTCP packet definition, the SRTCP index and the authentication tag. Those fields are appended to an RTCP packet in order to form an equivalent SRTCP packet, so that they follow any other profile specific extensions. An SRTCP packet is illustrated in Figure 2. Baugher, et al. [Page 10] INTERNET-DRAFT SRTP November, 2001 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |V=2|P| RC | PT=SR=200 | length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | SSRC of sender | | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | ... | | | | sender info | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | report block 1 | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | report block 2 | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | ... | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |V=2|P| SC | PT=SDES=202 | length | | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | SSRC/CSRC_1 | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | SDES items | | | | ... | | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | | | | | | ... | | | | | | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | | |E| SRTCP index | +-|>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ... | | | | authentication field | | | | | | | | ... | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-- Encrypted Portion (optional) +---- Authenticated Portion (mandatory when SRTP is used for RTP session) Figure 2. The format of a Secure RTCP packet, consisting of underlying RTCP compound packet with Sender Report and SDES packet. Baugher, et al. [Page 11] INTERNET-DRAFT SRTP November, 2001 The added fields are: E bit and SRTCP index: 32 bits, mandatory The SRTCP index is a 31-bit counter for the SRTCP packets. The index is explicitly included in each packet, in contrast to the 'implicit' index approach used for SRTP. As Section 9.1 of [RFC1889] allows the split of a compound RTCP packet into two lower-layer packets, one to be encrypted and one to be sent in the clear, indices with their most significant bit (E bit) set to '1' are reserved for encrypted packets, and indices with most significant bit set to '0' are used for non-encrypted packets. With this restriction, the rest of the bits are set to zero before the first SRTCP packet is sent, and is incremented by one after each SRTCP is sent. Except for differences in the most significant (E) bit, indices form a strictly increasing sequence. Authentication Tag: variable length, mandatory The authentication tag shall be used to carry message authentication data. The Authenticated Portion of an SRTCP packet consists of the entire equivalent (eventually compound) RTP packet and SRTCP index. The Encrypted Portion of an SRTCP packet consists of the RTCP payload of the equivalent compound RTCP packet, from the first RTCP packet, i.e. from the ninth (9) byte to the end of the compound packet. SRTCP packet processing is identical to that of SRTP packet processing, with the following changes: * SRTCP replay protection is as defined in Section 3.2.3, but using the SRTCP index as the index i and maintains separate values for s_l and the replay list specific to SRTCP. SRTCP replay protection is mandatory. * SRTCP encryption is as defined in Section 4, but using the definition of the SRTCP Encrypted Portion as defined in this section, using the SRTCP index as the index i. The encryption transforms shall be the same selected for the protection of the associated SRTP stream(s) (when RTP is encrypted too), while the NULL algorithm shall be applied to the RTCP packets to be authenticated but not encrypted. * The SRTCP authentication tag is defined as in Section 4, but with the Authenticated Portion of the SRTCP packet defined in this section, and using the SRTCP index as the index i. SRTCP authentication is mandatory. The authentication transforms and related parameters (e.g., key size) shall be the same selected for the protection of the associated SRTP stream(s) (when SRTP is authenticated too). Baugher, et al. [Page 12] INTERNET-DRAFT SRTP November, 2001 * SRTCP decryption is performed as in Section 4, but only if the SRTCP index has its most significant bit (E bit) equal to 1. If so, the encrypted portion is decrypted, using the SRTCP index as the index i. In case the most significant bit of the index is 0, the payload is simply copied. There MAY also exist some minor transform specific changes, see Section 4 for the defined transforms. The encryption prefix (Section 6.1 of [RFC1889]), a random 32-bit quantity intended to improve privacy, MUST NOT be used. This is because we strongly recommend ciphers secure against known plaintext attacks. The pre-defined SRTP encryption uses a secure, additive stream cipher, and thus the prefix offers no benefit at all. The maximum number of SRTCP packets with a fixed key is limited to 2^31 = 2,147,483,648. Authentication MUST be applied to RTCP, as it is the control protocol (e.g. it has a BYE packet). Note however, the cost for RTCP authentication is not of the same order of RTP authentication, as the session bandwidth allocated to RTCP recommended is at 5% and the RTCP packets have less frequency. However, when adding authentication to RTCP, the overhead in bandwidth SHOULD be considered (it will be more than 5%). 4. Pre-Defined Transforms 4.1 Encryption Generic parameters, common to all pre-defined, non-NULL, encryption transforms: * BLOCK CIPHER is the block cipher used * n_b is the bit-size of the block for the block cipher * k_e is the session encrypting key * n_e is the length of k_e (the default is 128 bits) * k_s is the so called salting key * n_s is the length of the salting key. The default value is equal to n_b. Another (shorter) value MUST be explicitly signaled. * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix, an (at least) 8-bit integer, inferred from the message authentication code in use. The session key is by default derived as specified in Section 4.3. The salting key is obtained directly from the cryptographic context. The encryption transforms defined in SRTP use a "seekable" segmented keystream generator, which for each secret key maps the RTP packet Baugher, et al. [Page 13] INTERNET-DRAFT SRTP November, 2001 index into a pseudorandom keystream segment, used to encrypt a single RTP packet (with that packet index). The process of encrypting a packet consists of generating the keystream segment corresponding to the packet, and then bitwise exclusive-oring that keystream segment onto the Encrypted Portion of the RTP packet. Decryption is done the same way, but swapping the roles of the plaintext and ciphertext. The definition of how the keystream is generated, given the index, depends on the cipher and its mode of operation. Below, two such key stream generators are defined. The NULL cipher is also defined, to be used when encryption of RTP is not required. The initial octets of each keystream segment MAY be reserved for use in a message authentication code, in which case the keystream used for encryption starts immediately after the last reserved octet. The initial reserved octets are called the keystream prefix, and the remaining octets are called the keystream suffix. This process is illustrated in Figure 3. +----+ +------------------+---------------------------------+ | KG |-->| Keystream Prefix | Keystream Suffix |---+ +----+ +------------------+---------------------------------+ | | +---------------------------------+ v | Encrypted Portion of RTP Packet |->(*) +---------------------------------+ | | +---------------------------------+ | | Encrypted Portion of SRTP Packet|<--+ +---------------------------------+ Figure 3: SRTP Encryption. Here KG denotes the keystream generator, and (*) denotes bitwise exclusive-or. The number of octets in the keystream prefix is denoted as SRTP_PREFIX_LENGTH. The key stream prefix is reserved for use with certain message authentication transforms, indicated by positive, non-zero value of this latter parameter. This means that even if confidentiality is not to be provided, the keystream generator output MAY still need to be computed, in which case the default keystream generator SHALL be used. The default cipher is the Advanced Encryption Standard (AES), and we define two modes of running AES, Segmented Integer Counter Mode AES and AES in f8-mode. In the sequel, let E(k,x) be AES applied to key k and input block x. Baugher, et al. [Page 14] INTERNET-DRAFT SRTP November, 2001 4.1.1 AES in Counter Mode The default keystream generator cipher SHALL be AES [AES] used in the Segmented Integer Counter Mode, with a n_e = 128-bit key size and a n_b = 128-bit block size. Conceptually, counter mode consists of encrypting successive integers. The actual definition is somewhat more complicated, in order to randomize the starting point of the integer sequence. Each packet is encrypted with a distinct keystream segment, which is computed as follows. 4.1.1.1 Keystream generation A keystream segment is the concatenation of the 128-bit output blocks of the AES cipher in the encrypt direction, using key k = k_e, in which the block indices are in increasing order. Symbolically, each keystream segment looks like E(k,A) || E(k,A + 1 mod 2^128) || E(k,A + 2 mod 2^128) ... The 128-bit integer value A is defined as 2^16 times the packet index, i, plus k_s (the salting key), modulo 2^128: A = (k_s + (i * 2^16)) modulo 2^128. Note that the initial value A is fixed for each packet. The number of blocks of keystream generated for any fixed value of A MUST NOT exceed 2^16. The AES has a block size of 128 bits, so 2^16 output blocks are sufficient to generate the 2^23 bits of keystream needed to encrypt the largest possible RTP packet (actually, except for IPv6 'jumbograms' [RFC2675], which are not likely to be used for RTP-based multimedia traffic). This restriction on the maximum number of RTP packets ensures the security of the encryption method by limiting the effectiveness of probabilistic attacks [BR98]. 4.1.2 AES in f8-mode To encrypt UMTS (Universal Mobile Telecommunications System, as 3G networks) data, a solution (see [ES3D]) known as the f8-algorithm has been developed. On a high level, the proposed scheme is a variant of Output Feedback Mode (OFB) [HAC], with a more elaborate initialization and feedback function. As in normal OFB, the core Baugher, et al. [Page 15] INTERNET-DRAFT SRTP November, 2001 consists of a block cipher. We also here define the use of AES as default block cipher to be used in f8-mode for RTP encryption, with 128-bit key and block size. Figure 2 shows the structure of block cipher, E, running in what we shall call "f8-mode of operation". IV | | v +------+ | | +--->| E | | | | | +------+ | | m --> * +-----------+-------------+-- ... ------+ | IV' | | | | | | j=1 --> * j=2 --> * ... j=L-1 --> * | | | | | | | +--> * +--> * ... +--> * | | | | | | | | | v | v | v | v | +------+ | +------+ | +------+ | +------+ | | | | | | | | | | | | k_e ---+--->| E | | | E | | | E | | | E | | | | | | | | | | | | +------+ | +------+ | +------+ | +------+ | | | | | | | +------+ +--------+ +-- ... ----+ | | | | | v v v v S(0) S(1) S(2) . . . S(L-1) Figure 2. f8-mode of operation (asterisk, *, denotes bitwise XOR). 4.1.2.1 Keystream Generation As above, let E(k_e,x) be the 128-bit output of AES in the encrypt direction when applied to the n_e = 128-bit key k_e and n_b = 128-bit plaintext block x. The Initialization Vector (IV) is determined as described in Section 4.1.2.2. Let IV', S(j), and m denote n_b-bit blocks, determined below. The keystream, S(0) || ... || S(L-1), for an N-bit message is defined by setting IV' = E(k_e XOR m, IV), and S(-1) = 00..0. For j = 0,1,.., L-1 where L = N/n_b (rounded up to nearest integer) compute Baugher, et al. [Page 16] INTERNET-DRAFT SRTP November, 2001 S(j) = E(k_e, IV' XOR j XOR S(j-1)) Notice that the IV is not used directly. Instead it is fed through E under another key to produce an internal, "masked" value (denoted IV') to prevent an attacker from gaining known input/output pairs. The role of the internal counter is to prevent short keystream cycles. The value of the key mask m is defined to be m = k_s || 0x555..5, i.e. the salting key, appended by the binary pattern 0101.. to fill the entire desired key size, n_e. The maximum allowable packet size can be determined as follows. The AES has a block size of 128 bits, and assuming that AES behaves like a random function, it is (heuristically) secure to generate about 2^64 output blocks, which is sufficient to generate 2^71 bits of keystream. For practical sizes of the RTP packets, much fewer blocks are required though, and the counter j above will often be sufficient if implemented as a 16- or 32-bit counter. 4.1.2.2 SRTP IV Formation The purpose of the following IV formation is to provide a feature which we call implict header authentication (IHA), see Section 10.5.1. The IV for 128-bit block AES-f8 is formed in the following way: IV = 0x00 || M || PT || SEQ || TS || SSRC || ROC M, PT, SEQ, TS, SSRC are taken from the RTP header; ROC is from the crypto context. The presence of the SSRC as part of the IV allows AES_f8 to be used when a master key is shared between multiple streams, see Section 10.2. 4.1.2.3 SRTCP IV Formation The IV for 128-bit block AES-f8 is formed in the following way: IV = 0x00000000 || E || SRTCP index || V || P || RC || PT || length || SSRC V, P, RC, PT, length, SSRC are taken from the first header in the RTCP compound packet. E || SRTCP index is the added 32-bit index to the packet. Baugher, et al. [Page 17] INTERNET-DRAFT SRTP November, 2001 4.1.3 NULL Cipher The NULL cipher is used when no confidentiality for RTP is requested. The keystream can be thought of as "000..0", e.g. the encryption simply copies the plaintext input into the ciphertext output. 4.2 Message Authentication and Integrity Common parameters * k_a is the session authentication key. * n_a is the bit-length of the authentication key. The default is 128 bits. * n_tag is the bit-length of the output authentication tag. The default is 32 bits. * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix as defined above. * M is the Authenticated Portion as specified in Section 3 for RTP and 3.3 for RTCP. The session key is by default derived as specified in Section 4.3. The values of n_a, n_tag, and SRTP_PREFIX_LENGTH MUST be fixed for any particular fixed value of the key. Below we describe the process of computing authentication tags. The SRTP receiver verifies a message/authentication tag pair as follows. A new authentication tag is computed using one of the algorithms below, and it is compared to the tag associated with the message. If the two tags are equal, then the message/tag pair is valid; otherwise, it is not and the error audit message "AUTHENTICATION FAILURE" MUST be returned. 4.2.1. HMAC/SHA1 The default authentication code is HMAC with SHA1 [HMAC]. When HMAC/SHA1 is used, the SRTP_PREFIX_LENGTH is 0. For RTP, the HMAC is applied to the concatenation of the Authenticated Portion of the packet (M) and the rollover counter in the cryptographic context, i.e. HMAC(k_a, M || ROC). For RTCP, we apply HMAC to the corresponding M, only. By default, the output shall be truncated to the n_tag left-most bits. 4.2.2 TMMH/16 TMMH is a simple function that maps a key and a message to a hash value. This hash value is encrypted by combining it with the keystream prefix to make the authentication tag, as described below. Baugher, et al. [Page 18] INTERNET-DRAFT SRTP November, 2001 TMMH/16 uses sixteen bit unsigned words as a basic data unit, and besides the above common parameters we define the following parameters for convenience: - MESSAGE_LENGTH is the octet length of M. - K is the key, i.e. k_a. - KEY_LENGTH is the octet length of K, i.e. n_a divided by 8. - TAG is the authentication tag, which is the output of TMMH/16 - TAG_LENGTH is the octet length of the authentication tag, i.e. n_tag divided by 8. This value defines SRTP_PREFIX_LENGTH to be equal to TAG_LENGTH. - PREFIX is the key stream prefix for the current packet as defined in Section 4.1. The values of KEY_LENGTH and TAG_LENGTH MUST obey the alignment restrictions described below. For TMMH/16, a word is 16-bits long; with the word being 2-bytes long, the TAG_LENGTH and KEY_LENGTH MUST be even; if MESSAGE_LENGTH is odd, the MESSAGE MUST be padded with a zero octet, but this does not change the value of MESSAGE_LENGTH. The words of the key are denoted as K[0], K[1], ..., K[KEY_WORDS], and the words of the message (after zero padding, if needed) are denoted as M[1], M[2], ..., M[MSG_WORDS], where MSG_WORDS is the smallest number such that 2 * MSG_WORDS is at least MESSAGE_LENGTH, and KEY_WORDS is KEY_LENGTH / 2. If MESSAGE_LENGTH is greater than KEY_LENGTH - TAG_LENGTH, then the value of TMMH/16 is undefined. Implementations MUST indicate an error if asked to hash a message with such a length. Otherwise, the hash value is defined to be the length TAG_WORDS sequence of words in which the j-th word in the sequence is defined as T[j] = [[ K[j] * MESSAGE_LENGTH +32 K[j+1] * M[1] +32 K[j+2] * M[2] +32 ... K[j+MSG_WORDS] * M[MSG_WORDS] ] modulo p ] modulo 2^16 where j ranges from zero to TAG_WORDS-1. Here, TAG_WORDS is equal to TAG_LENGTH/2, and p is equal to 2^16 + 1. The symbol * denotes multiplication and the symbol +32 denotes addition modulo 2^32. To compute the authentication tag of an SRTP packet, the TMMH hash value of that message is computed, then that value is combined with Baugher, et al. [Page 19] INTERNET-DRAFT SRTP November, 2001 the keystream prefix as defined in Section 4.1. The combining operation is word-wise addition modulo 2^16 (for TMMH/16). TAG[j] = T[j] +16 PREFIX[j], where j ranges from zero to TAG_WORDS-1. Note that for RTP, where HMAC is applied to M || ROC, TMMH is applied to M only. This is so, because the dependence on ROC is for TMMH inherent to the PREFIX quantity. 4.3 Key Derivation 4.3.1 Key Derivation Algorithm Regardless of the encryption or authentication transform that is employed (it may be a defined transform or newly introduced according to Section 7), SRTP key derivation is the process of generating session keys, without extra communication between the parties and in a sender-receiver synchronized way. packet index ---+ | | v +-----------+ +--------+ session encr_key | ext | master | |----------> | key mgmt | key | key | | (optional |-------->| deriv |----------> | rekey) | | | session auth_key +-----------+ +--------+ Figure 4: SRTP key derivation. At least one initial key derivation is always performed by SRTP. Further applications of the key derivation MAY be performed, according to the 'key derivation rate' value in the crypto context. Let m >= 64, and n be positive integers. A pseudo random function family is a set of keyed functions {PRF_m^n(k,x)} such that for (secret) random key k, given m-bit x, PRF_m^n(k,x) is an n-bit string, computationally indistinguishable from random n-bit strings. Let a DIV t denote integer division of a by t, rounded down, and with the convention that a DIV 0 = 0 for all a. We also make the convention of treating a DIV t as a bit string of the same length as a, and thus "a DIV t" will in general have leading zeros. Key Baugher, et al. [Page 20] INTERNET-DRAFT SRTP November, 2001 derivation is defined as follows. To generate session key(s) for the current packet, let the n-bit SRTP key for this packet be PRF_m^n(k_master,