Network Working Group H. Hannu (Editor), Ericsson INTERNET-DRAFT Z. Liu, Nokia Expires: April 2002 R. Price, Siemens/Roke Manor J. Rosenberg, Dynamicsoft J. Christoffersson, Ericsson C. Clanton, Nokia S. Forsgren. Ericsson K. Le, Nokia K. Leung, Nokia K. Svanbro, Ericsson October 19, 2001 Signaling Compression (SigComp) draft-ietf-rohc-sigcomp-00.txt Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is a submission of the IETF ROHC WG. Comments should be directed to its mailing list, rohc@cdt.luth.se. Hannu, Liu, Price, Rosenberg, et al. [Page 1] INTERNET-DRAFT SigComp October 19 , 2001 Abstract The Session Initiation Protocol (SIP), along with many other IP protocols used for multimedia communications, such as RTSP, are textual protocols engineered for bandwidth rich links. With the planned usage of these protocols in wireless handsets as part of 2.5G and 3G wireless, the large size of these messages is problematic. This draft provides a robust and efficient message compression scheme, suitable for compression of ASCII based protocols' messages. 0. Document History - October 19, 2001, version 00 First version. The draft describes the current ideas, from people involved in the ROHC WG, of how to perform compression of application signaling messages. TABLE OF CONTENTS 1. Introduction..................................................3 2. Terminology...................................................3 3. High-level description........................................3 4. SigComp Components............................................6 5. Protocol Component............................................6 6. Compression Framework Component...............................12 7. Security considerations.......................................21 8. IANA considerations...........................................21 9. Authors addresses.............................................22 10. Intellectual Property Right Considerations....................22 11. References....................................................22 Hannu, Liu, Price, Rosenberg, et al. [Page 2] INTERNET-DRAFT SigComp October 19 , 2001 1. Introduction The Session Initiation Protocol (SIP) [SIP], along with many other IP protocols used for multimedia communications, such as RTSP [RTSP], are textual protocols engineered for bandwidth rich links. As a result, these messages have not been optimized for message size. Typical SIP messages are from a few hundred bytes to as high as 2000. To date, this has not been a significant problem. With the planned usage of these protocols in wireless handsets as part of 2.5G and 3G wireless, the large size of these messages is problematic. The problem is not bandwidth efficiency (the media stream still consumes the majority of the bandwidth), but rather latency. With low-rate IP connectivity, store-and-forward delays are significant. For CDMA2000, for example, the basic channel will support only 9.6 kbps. At this rate, transmission of each byte requires 0.8ms. A 500 byte SIP message requires nearly half a second to transmit. Taking into account retransmits, and the multiplicity of messages that are required in some flows, call setup and feature invocation are adversely affected. Therefore, we believe there is merit in investigating improvements in message sizes. This document defines SigComp, an efficient and robust scheme for message compression when the transmission path between the compressor and decompressor is unreliable, i.e. prone to errors, losses and misordering. SigComp fulfills the requirements stated in [REQ] 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119]. Byte buffer The SigComp decompressor maintains a byte buffer containing any previously received text strings that might be useful for future compression. Token A token is an instruction transmitted from the compressor to the decompressor. 3. High-level description SigComp is useful for compression of ASCII-based application signaling protocol messages. Compression and decompression is performed at two points, e-net and e-ms, see Figure 1. The compressor uses the context to compress a message to a SigComp message. The Hannu, Liu, Price, Rosenberg, et al. [Page 3] INTERNET-DRAFT SigComp October 19 , 2001 reverse process is done at the decompressor. The context of SigComp is defined as the information necessary to perform compression and decompression. The correct context to use is identified by the Source and Destination addresses of the IP header. By using this identification method there is no need to add an explicit context identifier in the SigComp message. To improve the compression efficiency, SigComp uses previous messages in the compression/decompression process of later message. To be robust against context inconsistency in this process, SigComp has an internal acknowledgement scheme. +------------------+ +--------------------+ | +--------------+ | | +----------------+ | | | Context | | | | Context | | | +--------------+ | | +----------------+ | | | | | | | | | original | | | | compressed | | | | message | +----------+ | | message | +------------+ | | ------------+>|Compressor|-) - + - - - - - -+>|Decompressor|-)---+-> | +----------+ | | | +------------+ | | | | | | | | decompressed | | | compressed | | | message | +--------------+ | message | +----------------+ | <------------+-| Decompressor |<+- - - - - - +-| Compressor |<+-- | +--------------+ | | +----------------+ | +------------------+ +--------------------+ E-ms E-net Figure 1. SigComp High-level view. Although Figure 1. implies that the context is shared between compressor and decompressor located in the same entity, this must not necessarily be the case. SigComp can be applied in an environment where shared context is not possible or unwanted, with just one change: The context is identified by the IP-source and IP-destination address pair. Thus, if the IP-source and IP-destination addresses would switch, it will generate a new context. However, it is recommended that SigComp is used with shared context because of the increase in compression ratio this gives. 3.1. Context data The Context of SigComp is defined as the information necessary to perform compression and decompression. The context at compressor and the corresponding decompressor must be kept consistent. The context is built up by the parts described in the following subsections. Hannu, Liu, Price, Rosenberg, et al. [Page 4] INTERNET-DRAFT SigComp October 19 , 2001 3.1.1. Static Context Data The context of SigComp is populated with static information to use in the compression/decompression process, i.e. static context data. The static information comes from knowing what protocols are to be compressed. Information such as protocol-specific header field names, Methods, Status codes etc. are typical static context data. It does not change over time, is relevant for all users of SigComp, and can be applied to all sessions. Thus, at least the static context data can be applied when compressing messages. 3.1.2. User specific Context Data To further increase the compression efficiency SigComp has the possibility to pre-populate the context with information useful in the compression process. This type of information is regarded as user specific context data as it is not defined within SigComp. Information about commonly used connections, such as SIP URLs and appliction settings (speech codecs, etc) are typical examples of user specific context data. Note: How pre-population is performed is yet to be determined. 3.1.3. Dynamic Context Data Messages associated with protocols such as SIP tend to have similar characteristics within a session. Therefore SigComp makes use of messages sent from/to a user. To be able to decompress SigComp messages correctly, information in previous messages must not be used in the compression process until the message(s) has been acknowledged. This part of the context is referred to as the dynamic context data part. 3.1.4. Cross session Context The messages also have similar characteristics between sessions. The idea is to take advantage of this with Cross session context. Basically this means that the context is kept between sessions, e.g. between two SIP INVITES. Note: How to utilize Cross Session Context is yet to be determined. 3.1.5. Keeping Context TBW. Hannu, Liu, Price, Rosenberg, et al. [Page 5] INTERNET-DRAFT SigComp October 19 , 2001 4. SigComp Components The SigComp compression scheme is divided into two components, Protocol and Compression framework component. Basically, the protocol component sees to that the contexts are consistent even for a certain amount of message loss and misordering and also make it possible to use previously sent and received message in the compression/decompression of later messages. The compression framework component defines the structure of the part of the context that is used in the compression/decompression process and what information to update the context with. Figure 2 depicts which parts of the SigComp message that relates to which component. +-----------------+ | SigComp Header | * Protocol Component +-----------------+ | | : Compressed : * Compression framework Component | information | +-----------------+ Figure 2. SigComp message. 5. Protocol Component This chapter describes the protocol component of SigComp. The protocol component include functions, such as SigComp message acknowledgement scheme and context verification function. The four main task that the protocol component performs are: 1) Process a message and sending it compressed, as a SigComp message. 2) Receiving a SigComp message and pass it on uncompressed. 3) Acknowledge received SigComp messages. 4) Receiving acknowledgements for SigComp messages. The sections in this chapter describes functions that are needed for the SigComp protocol component to perform its tasks. 5.1. SigComp header To achieve robustness and keep track of sent and received messages, a header is added to the compressed message. The SigComp header consist of the following fields: * Message IDentification (MID) number: The MID number is used to keep track of sent and received messages, and is useful for detecting message loss and/or misordering. Hannu, Liu, Price, Rosenberg, et al. [Page 6] INTERNET-DRAFT SigComp October 19 , 2001 * CRC-verification of messages (CRC-M): This CRC is calculated over the uncompressed message, e.g. the SIP INVITE message, and is used to verify the correctness of decompressed SigComp messages. * Acknowledgement: A SigComp acknowledgement can either be carried within a SigComp message (Piggybacked) or be sent by itself (Standalone). Note: It is assumed that the total length of a SigComp message is provided by the lower layer. Since the length of the header part is self-contained the length of the compressed message can be derived by subtracting the header length from the total SigComp message length. The compressed information length could be zero in the case of a Standalone acknowledgement. Formats are described in the following sub-sections. 5.1.1 Normal message formats These are the normal formats to use. Format 1 should be used when there has been no gap in the received MID numbers up to the generation of this SigComp message. Format 2 should be used to acknowledge received SigComp messages when there has been a gap in the received MID numbers. Format 1: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | MID | Cumulative ACK| +---+---+---+---+---+---+---+---+ | CRC-M | +---+---+---+---+---+---+---+---+ / Compressed message / / according to section 6 / +---+---+---+---+---+---+---+---+ Format 2: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 1 1 1 0 | MID | +---+---+---+---+---+---+---+---+ | CRC-M | +---+---+---+---+---+---+---+---+ | AC | RMID (1) | +---+---+---+---+---+---+---+---+ | RMID (2) | RMID (3) | +---+---+---+---+---+---+---+---+ : : Hannu, Liu, Price, Rosenberg, et al. [Page 7] INTERNET-DRAFT SigComp October 19 , 2001 / / : : +---+---+---+---+---+---+---+---+ | RMID (C-1) | RMID (C) | +---+---+---+---+---+---+---+---+ / Compressed message / / according to section 6 / +---+---+---+---+---+---+---+---+ * MID: "0000" to "1101". The Message IDentification (MID) number is commonly increased with one for each sent message. However, there is the exception when the next MID to be assigned is not "free", see section X. * Cumulative ACK: Acknowledges all SigComp messages with MID number equal to or less than the value of this field. The value of this field must not be set to a value so that non-received SigComp messages are acknowledged. A received acknowledgement with higher value than the maximum MID must be ignored and not be regarded as acknowledgement for SigComp messages. * CRC-M: TBD AC: Number of received SigComp message being acknowledged (RMID), which are included in the List. If AC is an even number the last RMID, RMID(C), is only padding, in order to get octet aligned. RMID: Received MID of a SigComp message, which are being acknowledged. 5.1.2 Extended message formats Format 3: Reserved. 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 1 1 1 1 | TBD | +---+---+---+---+ + / TBD / : : / / +---+---+---+---+---+---+---+---+ Note: The usage of this format is yet to be determined. 5.2. Acknowledgement procedure There is one basic rule for the acknowledgement procedure: Hannu, Liu, Price, Rosenberg, et al. [Page 8] INTERNET-DRAFT SigComp October 19 , 2001 "All SigComp messages should be acknowledged". Two functions are needed: 1) A mapping function between sending MID numbers and acknowledged MID numbers, called MAPP-Function. 2) A function that keeps track of when a received SigComp MID number is not to be acknowledged anymore, called TRACK-Function. It is up to the implementation to realize the two functions. The acknowledgement procedure is best described using an example, see Figure 3. There is one purpose with the procedure: "Make it possible to use information in previous messages in the compression and decompression process of future messages". e-A e-B | | Step (0) |<----------- MID 1B ----------| | | Step (1) |-------- MID 1A, ACK 1B ---->| | | Step (2) |<--------MID 2B, ACK 1A ------| | | Figure 3. Example of the acknowledgement procedure. Step (0): A SigComp message with Message IDenitifcation number 1 is sent from e-B to e-A, denoted MID 1B in Figure 3. MID 1B must be acknowledged by e-A until e-A is positive that e-B knows that MID 1B is received. Step (1): Entity e-A acknowledges MID 1B with SigComp message MID 1A. A mapping between MID 1A and MID 1B is stored at e-A. Note that MID 1A do not have to carry compressed information, it can be a Standalone Acknowledgement. Step (2): Entity e-B acknowledges MID 1A and stores a mapping between MID 2B and MID 1A. When Entity e-A receives MID 2B it uses the mapping to find out which SigComp message(s) from e-B was acknowledged by MID 1A. The mapping is then removed and MID 1B does not have to be acknowledged anymore. To summarize: When an entity is to generate an acknowledgement it uses the TRACK- Function to find out the MID number it should acknowledge. Upon the reception of a SigComp message the header is scanned to find the acknowledgements and the MAPP-Function is used to find out the corresponding received SigComp messages that do not have to be acknowledged anymore. Hannu, Liu, Price, Rosenberg, et al. [Page 9] INTERNET-DRAFT SigComp October 19 , 2001 5.2.1. Reuse of Message IDentification numbers This section describe when a MID can be reused to avoid misinterpretations in case of wraparound of MID numbers combined with message loss and/or misordering. Basically a MID number MUST not be reused until the SigComp message using that particular MID number has been acknowledged or that the message is regarded as lost. Figure 4 is a continuation of Figure 3 and depicts when it is ok to reuse a MID number that has been acknowledged. e-A e-B | | Step (3) |-------- MID 2A, ACK 2B ---->| | | Figure 4. Continuation of Figure 3. Step (3): When Entity e-B receives an acknowledgement for MID 2B it knows that the mapping for MID 1B is removed at e-A. Therefore it is safe for entity e-B to reuse MID 1B. In the case of MID number reuse when the previous message using that MID number has not been acknowledged. Then the previous message should be regarded as lost, if the entity has received acknowledgments for higher MID numbers than the MID number the entity wishes to reuse. However, as SigComp should stand against moderate misordering according to [REQ], a MID number X should not be regarded as lost until an acknowledgement for message(s) with MID numbers > (X+2) has been received. 5.2.2. Specification of ordering constraints Inconsistency in the dynamic context data can happen if the context is shared and messages, which will update the dynamic context data, are sent concurrently by both entities. The approach to deal with this problem is to specify ordering constraints of all update messages sent by both entities so that the order is total and the same at both entities, at least for the eligible updates. Three rules are defined, local, causality and concurrent rule. With the first two rules and the use of the 3-way handshake, it is possible to ensure consistency during updates of dynamic context data, but there is one case that these do not resolve; dynamic context data updates issued concurrently by both entities. With an update request in the description of the rules below means; Every message that carries information that is used to update the dynamic context data. Hannu, Liu, Price, Rosenberg, et al. [Page 10] INTERNET-DRAFT SigComp October 19 , 2001 * Local Rule: When a dynamic context data update request R' is sent before R" at the same entity, R' is scheduled to perform an update before R" at both entities. A sending entity schedules its own update requests according to their sequence numbers, whereas a receiving entity schedules these requests according to the request sequence numbers, which can be inferred from the sequence numbers of those SigComp messages containing these requests. * Causality rule: When a SigComp message containing a dynamic context data update request R' piggybacks with acknowledgements, R' is scheduled to perform an update after all the updates corresponding to the acknowledged update requests. * Concurrent rule: The problem explained above can be eliminated by requiring the maintenance of a total order relationship for a sequence of all known context update requests at each entity. This requires that one entity is the master and one is the slave. An example showing how the scheme with the total order relationship solves the inconsistent context update problem is shown in Figure 5. Suppose all messages exchanged between Entity X (a master entity) and Entity Y (a slave entity) are employed for subsequent compression/decompression, and all received messages are acknowledged and piggybacked in all subsequent messages sent by the receiver. Whenever concurrent dictionary updates are happened, update requests initiated from the master entity are ordered first. For example, the update requests for Messages 1, 2, and 3 are ordered before those for Messages 101, 102, and 103 respectively in the list of the total order relationship for a sequence of dictionary updates (TOL). DD is a logical representation of the Dynamic Context Data, TSS is the logical representation of sent SigComp messages and TRS is the logical representation storage of received SigComp messages. Since the update request for Message 1 is ordered before that for Message 101 according to the total order relationship, the update request for Message 101 at Entity Y is blocked until that for Message 1 becomes ready to be moved, as indicated by the reception of the acknowledgement piggybacked with Message 3. | | DD = {} | | DD = {} TSS = {1} | | TSS = {101} TRS = {} | | TRS = {} TOL = {1} | | TOL = {} | [1] [101] | |-----------\ /-----------| | \/ | | /\ | |<----------/ \---------->| | | Hannu, Liu, Price, Rosenberg, et al. [Page 11] INTERNET-DRAFT SigComp October 19 , 2001 DD = {} | | DD = {} TSS = {1, 2} | | TSS = {101, 102} TRS = {101} | | TRS = {1} TOL = {1, 101, 2} | | TOL = {1} | [2] [102] | |-----------\ /-----------| | \/ | | /\ | |<----------/ \---------->| | | DD = {1} | | DD = {} TSS = {2, 3} | | TSS = {101 - 103} TRS = {101, 102} | | TRS = {1, 2} TOL = {101, 2, | | TOL = {1, 101, 2} 102, 3} | | | [3] [103] | |-----------\ /-----------| | \/ | | /\ | |<----------/ \---------->| | | DD = {1, 101, 2} | | DD = {1, 101} TSS = {3} | | TSS = {102, 103} TRS = {102, 103} | | TRS = {2, 3} TOL = {102,3,103} | | TOL = {2, 102, 3} | | CD Entity X CD Entity Y Figure 5. An Example of Proposed Scheme with Total Order Relationship for Dynamic Context Data Updates. 6. Compression Framework Component This chapter introduces a simple but flexible dictionary structure for the SigComp compression scheme. The goal with the flexible dictionary structure is to standardize a decompressor capable of interoperating with a wide range of compression algorithms. Consequently this chapter describes the decompressor operation only, i.e. the actions which the decompressor takes upon receiving a certain instruction from the compressor. 6.1. Information stored in the SigComp dictionary An important feature of SigComp is that it offers a standard decompressor which can interoperate with a wide range of compression algorithms. The precise method for compression is left as an Hannu, Liu, Price, Rosenberg, et al. [Page 12] INTERNET-DRAFT SigComp October 19 , 2001 implementation decision, and in fact the standard decompressor can interoperate with any of the following classes of algorithm: * Generic text compressor (for example [DEFLATE] or a similar algorithm) * Protocol-aware compressor offering excellent performance for one signaling protocol * Hybrid compressor with similar performance to [DEFLATE] for generic text strings and superior performance for certain signaling protocols The choice of which instructions to send to the decompressor are left as a local implementation decision at the compressor. The only requirement is that of transparency, i.e. the compressor MUST NOT send instructions which cause the decompressor to incorrectly decompress a given signaling message. Note however that it is perfectly acceptable for the compressor to send tokens which update the dictionary at the decompressor, but which cause no decompressed message to be outputted. Indeed, this is a useful technique for pre-populating the dictionary with well-known text strings. 6.1.1. Structure of SigComp dictionary The SigComp dictionary consists of a simple byte buffer designed to hold the current uncompressed message, the current compressed message, and any other previously received text strings that might be useful for future compression. The size buffer_size of the byte buffer is negotiated by an externally defined mechanism (e.g. by the underlying SigComp protocol used to transport the compressed messages, or alternatively by the mechanism used to negotiate use of SigComp itself). Entries in the byte buffer are referred to as buffer[n] where 0 =< n < buffer_size. As all of the SigComp tokens currently use 2-byte indices into the byte buffer, the maximum size of the buffer is 64K. 6.1.2. Important entries in the byte buffer The first few bytes in the SigComp byte buffer are used to store some important 2-byte integers. These integers are given the following names: Hannu, Liu, Price, Rosenberg, et al. [Page 13] INTERNET-DRAFT SigComp October 19 , 2001 Position in buffer: Name: 0 - 1 first_token 2 - 3 compressed_start 4 - 5 compressed_length 6 - 7 uncompressed_start 8 - 9 uncompressed_length 10 - 11 circular_buffer 12 - 13 result_integer 14 - 15 stack_free 16 - 17 stack[0] 18 - 19 stack[1] 20 - 21 stack[2] : : The MSBs of the integer are always stored before the LSBs. So, for example, the MSBs of first_token are stored in buffer[0] whilst the LSBs are stored in buffer[1]. The use of each integer is described in the following sections of the draft. 6.1.3. Decompressor actions upon receiving a SigComp message When a SigComp context is initialized all entries in the byte buffer are set to 0. Upon receiving a SigComp message, the decompressor strips off the underlying protocol header and then performs the following actions: 1.) The message is copied directly into the byte buffer beginning at the byte specified in compressed_start. The length of the compressed message in bytes (known from the underlying protocol used to transport the compressed message) is then copied into compressed_length. Note that the buffer is circular, so once a byte is copied into buffer[buffer_size - 1], the next byte is copied into buffer[circular_buffer]. The parameter circular_buffer (see Section 3.2) can be set to prevent the first part of the buffer from being overwritten by new compressed messages. Typically this area of the buffer is used to hold important tokens and text strings that should be kept from one compressed message to the next. 2.) Next, the tokens contained within the byte buffer are executed beginning at the byte specified in first_token. The tokens are executed consecutively unless indicated explicitly (for example when the decompressor encounters a SWITCH token). If the byte buffer[buffer_size - 1] is ever reached, the next byte is found in buffer[circular_buffer]. Hannu, Liu, Price, Rosenberg, et al. [Page 14] INTERNET-DRAFT SigComp October 19 , 2001 3.) When the decompressor reaches buffer[0], instead of executing the token contained within buffer[0] it stops token execution and outputs the uncompressed message. The location of the uncompressed message is specified by uncompressed_start and uncompressed_length. As stated before the buffer is circular, so once a byte is copied from buffer[buffer_size - 1], the next byte is copied from buffer[circular_buffer]. Note that the byte buffer is not reset between SigComp messages unless explicitly requested by the underlying protocol. Note also that if uncompressed_length is set to 0 then the decompressor does not output an uncompressed message. This is very useful for populating the byte buffer with well-known text strings, before any actual decompression of signaling messages takes place. 6.1.4. Decompression failure If the compressed messages received by the decompressor are corrupted (either accidentally or maliciously) then one of three possibilities might occur: * A decompressed message is outputted that is incorrect. * A token is encountered that cannot be processed successfully by the decompressor (for example a RETURN token when no CALL token has reviously been encountered). * The decompressor never finishes decompressing a message. To counter the first possibility the underlying protocol SHOULD include a checksum to ensure that each message is decompressed successfully. If the decompressed message fails the checksum then "decompression failure" has occurred. The decompressor does not output an uncompressed message, and ignores any future compressed message except those which explicitly request the byte buffer to e reinitialized. If a token is encountered that cannot be successfully processed then decompression failure occurs automatically. To counter the third possibility, decompression failure SHOULD also occur after a certain number of tokens have been processed for a given compressed message. The maximum number of tokens to process is currently left as an implementation decision (but might in future be negotiated). Hannu, Liu, Price, Rosenberg, et al. [Page 15] INTERNET-DRAFT SigComp October 19 , 2001 6.2. Library of tokens The SigComp decompressor currently understands seven types of token, chosen to support the widest possible range of compression algorithms with the minimum possible overhead. All tokens are stored as a single byte to indicate the token type, followed by 0 or more bytes containing the parameters required by the token. The following token types are currently available: COPY ADD / SUBTRACT LSHIFT / RSHIFT COMPARE SWITCH CALL ... RETURN HUFFMAN Each token is explained in more detail below: 6.2.1. COPY The COPY token instructs the decompressor to copy a string of bytes from one part of the byte buffer to another. A COPY token is stored in the byte buffer as 7 consecutive bytes as described below. A simple mnemonic description of the COPY token is also provided, which can be useful when writing example lists of tokens to be transmitted within a compressed message. Mnemonic description: COPY (position, length, destination) Exact byte description: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 0 0 0| Position MSB | Position LSB | Length MSB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length LSB |Destination MSB|Destination LSB| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The meaning of the three parameters is explained below: Position: 2-byte integer indicating the location of the first byte in the string to be copied. Length: 2-byte integer indicating the number of bytes to be Hannu, Liu, Price, Rosenberg, et al. [Page 16] INTERNET-DRAFT SigComp October 19 , 2001 copied. Destination: 2-byte integer indicating the location to which the first byte in the string will be copied. Note that once a byte is copied into buffer[buffer_size - 1], the next byte is copied into buffer[circular_buffer]. 6.2.2. ADD / SUBTRACT The ADD token instructs the decompressor to add the two 2-byte integers at the specified locations in the byte buffer (addition performed modulo 2^16) and to store the result in the location of the first integer. ADD (index_1, index_2) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 0 0 1| Index 1 MSB | Index 1 LSB | Index 2 MSB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ | Index 2 LSB | +-+-+-+-+-+-+-+-+ The SUBTRACT token is the same as the ADD token except that the second integer is subtracted from the first (subtraction performed modulo 2^16). SUBTRACT (index_1, index_2) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 0 1 0| Index 1 MSB | Index 1 LSB | Index 2 MSB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ | Index 2 LSB | +-+-+-+-+-+-+-+-+ 6.2.3. LSHIFT / RSHIFT The LSHIFT token instructs the decompressor to left shift the 2-byte integer at the specified location. As always, the MSBs of the integer are stored before the LSBs. LSHIFT (index) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 0 1 1| Index MSB | Index LSB | Hannu, Liu, Price, Rosenberg, et al. [Page 17] INTERNET-DRAFT SigComp October 19 , 2001 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note that the least significant bit of the integer becomes zero, and the most significant bit is discarded. The RSHIFT token performs a right shift of the 2-byte integer at the specified location. RSHIFT (index) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 1 0 0| Index MSB | Index LSB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 6.2.4. COMPARE The COMPARE token instructs the decompressor to compare the two 2- byte integers at the specified locations in the byte buffer. COMPARE (index_1, index_2) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 1 0 1| Index 1 MSB | Index 1 LSB | Index 2 MSB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ | Index 2 LSB | +-+-+-+-+-+-+-+-+ If the first integer referenced by the COMPARE token is less than the second integer then the decompressor sets result_integer = 0. If both integers are equal then it sets result_integer = 1. If the first integer is greater than the second then the decompressor sets result_integer = 2. 6.2.5. SWITCH The SWITCH token performs a conditional jump based on the contents of result_integer. SWITCH (index_0, index_1, ... , index_n - 1) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 1 1 0| Index 0 MSB | Index 0 LSB | Index 1 MSB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+ | Index 1 LSB | ... +-+-+-+-+-+-+-+-+-+-+-+-+ Hannu, Liu, Price, Rosenberg, et al. [Page 18] INTERNET-DRAFT SigComp October 19 , 2001 When a SWITCH token is encountered, the decompressor reads the integer contained within result_integeer. Suppose that this integer is j. The decompressor then continues token execution at the byte position specified by index j. If result_integer specifies an index which beyond the size of the byte buffer, a bad compressed message has been received and decompression failure occurs. 6.2.6. CALL ... RETURN The CALL and RETURN tokens provide support for compression algorithms with a nested structure. CALL (index) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 0 1 1 1| Index MSB | Index LSB | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ RETURN +-+-+-+-+-+-+-+-+ |0 0 0 0 1 0 0 0| +-+-+-+-+-+-+-+-+ When the decompressor reaches a CALL token, it finds the byte position of the token immediately following the CALL token and copies this integer into stack[stack_free] ready for later retrieval. It then increases stack_free by 1 and continues token execution at the byte position specified in the CALL token. When the decompressor reaches a RETURN token it decreases stack_free by 1, and then continues token execution at the byte position specified in stack[stack_free]. If stack_free ever becomes more than buffer_size - 1 or less than 0 then a bad compressed message has been received and decompression failure occurs (see Section 3.4.). 6.2.7. HUFFMAN The HUFFMAN token maps a shorthand Huffman code onto its uncompressed equivalent. HUFFMAN (position, bit offset, destination, n, length 0, length 1, ... , length n - 1) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0 1 0 0 1| Position MSB | Position LSB | Bit Offset | Hannu, Liu, Price, Rosenberg, et al. [Page 19] INTERNET-DRAFT SigComp October 19 , 2001 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Destination MSB|Destination LSB| MSB of n | LSB of n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length 0 | Length 1 | ... | Length n - 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The meaning of the first three parameters is explained below: Position: 2-byte integer indicating the byte location of the Huffman code to be decompressed. Bit Offset: 1-byte integer indicating the bit offset at which the Huffman code begins within the byte specified above. Destination: 2-byte integer indicating the location to which the uncompressed value will be copied. Following the [DEFLATE] convention a bit offset of 0 indicates the least significant bit of a byte, whilst a bit offset of 7 indicates the most significant bit. For example, suppose that an 8-bit Huffman code begins at byte position 0 and bit offset 2. In this case the 8 bits of the Huffman code can be found in the following locations (Bit 0 is the first bit in the Huffman code and Bit 7 is the last bit): MSB LSB MSB LSB +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |5 4 3 2 1 0 | 7 6| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Byte 0 Byte 1 The remaining parameters specify the actual Huffman codes and their uncompressed equivalents. Note that the Huffman codes are downloaded to the decompressor using "Canonical" Huffman as described in Section 3.2.2 of [DEFLATE]. This format is very efficient because it can specify a set of Huffman codes by sending only their lengths. The parameter n specifies the total number of Huffman codes. Following this parameter is the length of each Huffman code in bits, where each code can be between 1 and 255 bits in lengths. The length j specifies the length of the Huffman code which maps onto the uncompressed 2-byte integer j. If length j is set to 0 then there Hannu, Liu, Price, Rosenberg, et al. [Page 20] INTERNET-DRAFT SigComp October 19 , 2001 is no Huffman code mapping onto the integer j, so it cannot be communicated to the decompressor. The actual Huffman codes themselves are assigned as per [DEFLATE], using the following two rules: * All codes of a given bit length have lexicographically consecutive values, in the same order as the symbols they represent; * Shorter codes lexicographically precede longer codes. As an example, suppose that the uncompressed 2-byte integers 0, 1, 2 and 3 have Huffman codes of lengths 2, 1, 3 and 3 bits respectively. Then the actual Huffman codes have the following values: Uncompressed integer Code length (bits) Huffman code 0 2 10 1 1 0 2 3 110 3 3 111 When the decompressor encounters a HUFFMAN token it reads the Huffman code at the specified position and bit offset, and maps it to the corresponding 2-byte uncompressed integer. The decompressor then copies this uncompressed integer to the specified destination in the byte buffer. Finally, the decompressor updates the position and bit offset parameters with the location of the next Huffman code (ready to decompress it at a later stage). If the bit offset does not take a value between 0 and 7 inclusive then a bad compressed message has been received and decompression failure occurs (see Section 6.1.4.). Decompression failure also occurs if a Huffman code is encountered for which no corresponding uncompressed integer has been defined. 7. Security considerations TBW 8. IANA considerations TBW Hannu, Liu, Price, Rosenberg, et al. [Page 21] INTERNET-DRAFT SigComp October 19 , 2001 9. Authors' Addresses Hans Hannu Box 920 Ericsson Erisoft AB SE-971 28 Lulea, Sweden Phone: +46 920 20 21 84 Fax: +46 920 20 20 99 EMail: hans.hannu@ericsson.com Zhigang Liu 2-700 Mobile Networks Laboratory Nokia Research Center 6000 Connection Drive Irving, TX 75039, USA Phone: +1 972 894-5935 Fax: +1 972 894-4589 EMail: zhigang.liu@nokia.com Richard Price Roke Manor Research Ltd Romsey, Hants, SO51 0ZN United Kingdom Phone: +44 1794 833681 Email: richard.price@roke.co.uk more authors... 10. Intellectual Property Right Considerations TBW 11. References [DEFLATE] "DEFLATE Compressed Data Format Specification version 1.3", RFC 1951, P. Deutsch, May 1996 [RTSP] H. Schulzrinne, A. Rao and R. Lanphier, Real Time Streaming Protocol (RTSP), RFC 2326, April 1998. [REQ] H. Hannu (Editor), Signaling Compression Requirements & Assumptions, Internet Draft (work in progress),September 2001. Hannu, Liu, Price, Rosenberg, et al. [Page 22] INTERNET-DRAFT SigComp October 19 , 2001 [SIP] M. Handley, H. Schulzrinne, E. Schooler and J. Rosenberg, SIP: Session Initiation Protocol, RFC 2543, August 2000. This Internet-Draft expires in April 2002. Hannu, Liu, Price, Rosenberg, et al. [Page 23]