Network Working Group Richard Price, Siemens/Roke Manor INTERNET-DRAFT Hans Hannu, Ericsson Expires: September 2002 Carsten Bormann, TZI/Uni Bremen Jan Christoffersson, Ericsson Zhigang Liu, Nokia Jonathan Rosenberg, dynamicsoft March 1, 2002 Signaling Compression (SigComp) Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is a submission of the IETF ROHC WG. Comments should be directed to its mailing list, rohc@ietf.org. Abstract This document defines SigComp, a solution for compressing messages generated by application protocols such as [SIP] and [RTSP]. The architecture and pre-requisites of SigComp are outlined, along with the format of the SigComp message. Decompression functionality for the SigComp solution is provided by a "Universal Decompressor Virtual Machine" optimized for the task of running decompression algorithms. The UDVM can be configured to understand the output of many well-known compressors such as [DEFLATE]. Price, Hannu, et al. [Page 1] INTERNET-DRAFT SigComp March 1, 2002 Table of contents 1. Introduction..................................................2 2. Terminology...................................................3 3. SigComp Architecture..........................................6 4. SigComp message flow..........................................11 5. SigComp compressor............................................14 6. State handling and announcement...............................16 7. Overview of the UDVM..........................................20 8. Decompressing a SigComp message...............................23 9. UDVM instruction set..........................................26 10. Security considerations.......................................38 11. IANA considerations...........................................40 12. Acknowledgements..............................................41 13. AuthorsĘ addresses............................................41 14. References....................................................42 Appendix A. Document history......................................43 1. Introduction The Session Initiation Protocol [SIP], along with many other application protocols used for multimedia communications such as [RTSP], is a textual protocol engineered for bandwidth rich links. As a result, SIP messages have not been optimized in terms of size. Typical SIP messages range from a few hundred bytes to as high as two thousand. To date, this has not been a significant problem. With the planned usage of these protocols in wireless handsets as part of 2.5G and 3G cellular networks, the large size of these messages is problematic. With low-rate IP connectivity, store-and- forward delays are significant. Taking into account retransmits, and the multiplicity of messages that are required in some flows, call setup and feature invocation are adversely affected. Therefore, we believe there is merit in reducing these message sizes. This document outlines the architecture and pre-requisites of the SigComp solution, the format of the SigComp message, algorithm upload, and the Universal Decompressor Virtual Machine (UDVM) that provides decompression functionality. SigComp is offered to applications as a "shim" layer between the application and the transport. The service provided is that of the underlying transport plus compression. Both connection-oriented and connectionless transports are supported by SigComp. This document focuses on the signaling scenario where an end-terminal communicates with a proxy. However SigComp may be applicable to other scenarios with multiple endpoints compressing and decompressing data. Price, Hannu, et al. [Page 2] INTERNET-DRAFT SigComp March 1, 2002 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119]. SigComp The overall solution for signaling compression, comprising the compressor, decompressor, dispatchers and state handler. Application For the purpose of this document, an application is a text-based protocol software that: a) sends application data to the compressor dispatcher b) receives data from the decompressor dispatcher c) authenticates the sender of a decompressed message and gives permission for state to be saved in the sender's name Application message An uncompressed message provided to or from the application. Endpoint One instance of an application plus a SigComp layer. Each endpoint is capable of sending and/or receiving SigComp messages. Endpoint identity A unique indicator assigned to each endpoint by the application (for example an URI). The application authenticates the sender of a decompressed message, and provides their endpoint identity to the SigComp state handler. Transport Mechanism for passing data between two endpoints. SigComp is capable of sending messages over a wide range of transports including TCP, UDP and [SCTP]. Message-based transport A transport that carries data as a set of bounded messages. Stream-based transport A transport that carries data as a continuous stream with no message boundaries. Price, Hannu, et al. [Page 3] INTERNET-DRAFT SigComp March 1, 2002 Application-defined parameters Parameters that must be agreed upon by the local and remote endpoints invoking SigComp. Values for the application-defined parameters are typically fixed to meet the requirements of a particular signaling application. SigComp message May contain a compressed application message in the form of UDVM bytecode. In case of a message-based transport such as UDP, a SigComp message corresponds to exactly one (UDP) datagram. For a stream-based transport such as TCP, each SigComp message is separated by a reserved delimiter. Standalone SigComp message A SigComp message that does not include any compressed application data. Certain signaling applications may not allow standalone SigComp messages due to security requirements. Compressor Entity that invokes an encoder, and keeps track of states that can be used for compression. It is responsible for supplying UDVM bytecode to the remote decompressor in order for compressed data to be decompressed. Encoder Encodes data according to a particular compression algorithm. Compressor dispatcher Entity that receives uncompressed application messages, invokes a compressor, and forwards the resulting SigComp messages to a remote endpoint. Decompressor Entity that is responsible for converting a SigComp message into uncompressed data. Decompression functionality is provided by the UDVM. Decompressor dispatcher Entity that receives SigComp messages, invokes a decompressor, and forwards the decompressed application messages to an application. Price, Hannu, et al. [Page 4] INTERNET-DRAFT SigComp March 1, 2002 Virtual machine A machine architecture designed to be implemented in software (although silicon implementations are of course possible). Universal Decompressor Virtual Machine (UDVM) The virtual machine described in this document. The UDVM is used for decompression of SigComp messages. Bytecode Machine code that can be executed by a virtual machine. UDVM bytecode is a combination of UDVM instructions and compressed data. Per-message compression Compression that does not reference data from previous messages. SigComp can decompress a message of this type using only the application-defined parameters and the data in the message itself. Dynamic compression Compression relative to messages sent prior to the current compressed message. SigComp stores and retrieves this data using the state handler. State Data saved for retrieval by later SigComp messages. An item of state typically reflects the contents of the UDVM memory after decompressing a message, but state can also be created by the compressor or by the application. State handler Entity responsible for storing and accessing state information once permission is granted by the application. State identifier Reference used to access an item of state previously created by the compressor, the decompressor or the application. CPU cycles A measure of the amount of "CPU power" required to execute a UDVM instruction (the simplest UDVM instructions require a single CPU cycle). An upper limit is placed on the number of cycles that can be used to decompress each bit in a compressed message. Price, Hannu, et al. [Page 5] INTERNET-DRAFT SigComp March 1, 2002 3. SigComp Architecture In the SigComp architecture compression and decompression is performed at two communicating entities. SigComp is offered to applications as a "shim" layer between the application and the underlying transport, and so these entities are endpoints when viewed from a transport layer perspective. Note however that from the application perspective SigComp is applied on a per-hop basis. Figure 1 shows the layout of a communicating endpoint that implements a SigComp layer. The figure does not mandate any particular implementation, but is shown to the reader for the sake of clarity. The SigComp layer is further decomposed in the following components: - A compressor dispatcher: this is the interface from the application. The compressor dispatcher receives an application message and an identifier for the receiving endpoint. Based on the endpoint identity the compressor dispatcher invokes a particular compressor, which returns a SigComp message that is forwarded to the remote SigComp endpoint. - A decompressor dispatcher: this is the interface towards the application. A SigComp message is received by the decompressor dispatcher and an instance a decompressor is invoked. Once the dispatcher has received the (decompressed) application data it forwards the message to the application. - One or more compressors: a distinct compressor is invoked for each remote endpoint with which the local application wishes to communicate. A compressor receives an (uncompressed) application message from the compressor dispatcher, compresses the message, and returns a SigComp message to the compressor dispatcher. During the compression process the compressor may invoke the state handler to restore previous state or save new state. Each compressor chooses a certain algorithm to encode the data, (e.g. [DEFLATE]). - One or more decompressors: since SigComp can run over an unsecure transport layer, a distinct decompressor must be invoked on a per-message basis. A decompressor receives a SigComp message from the decompressor dispatcher, decompresses the message, and returns the (decompressed) application message to the decompressor dispatcher. During the decompression process, the decompressor may invoke the state handler to restore previous state or save new state. - State handler: this entity contains enough logic to store and retrieve states. State is information that is stored between SigComp messages: this data can be saved either by a compressor, a decompressor or an application. For security purposes the state Price, Hannu, et al. [Page 6] INTERNET-DRAFT SigComp March 1, 2002 handler must always ask the application to grant permission for new states to be saved. State creation and retrieval are further described in Chapter 6. +---------------------------------------------+ | | | Application | | | | | +---------------------------------------------+ | | ^ Message & | Endpoint | | Decompressed endpoint | identity | | message identity | | | | | | +-- -- -- -- --|-- -- -- -- -- -- --|-- -- -- |- -- -- -- -- + | | | | | v v | | +--------------+ +--------------+ +--------------+ | SigComp | | | | | | SigComp message | Compressor | | State | | Decompressor | message <-------| dispatcher | | handler | | dispatcher |<------- | | | | | | | | +--------------+ +--------------+ +--------------+ | ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | | | | | | | | | | | | | | | v | | | | | v | | +--------------+ | | | | +--------------+ | | Compressor 1 | | | | | |Decompressor 1| | | |<----+ | | +---->| | | | (Encoder) | | | | (UDVM) | | | | | | | | | +--------------+ | | +--------------+ | | | | | | v | | v | +--------------+ | | +--------------+ | | Compressor 2 | | | |Decompressor 2| | | |<-------+ +------->| | | | (Encoder) | | (UDVM) | | | | | | | +--------------+ +--------------+ | | SigComp layer | +-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + Figure 1: High-level architectural overview of one SigComp endpoint Price, Hannu, et al. [Page 7] INTERNET-DRAFT SigComp March 1, 2002 Note that it is possible for SigComp to decompress messages from multiple endpoints at different physical locations in a network, as the architecture is designed to prevent data from one endpoint interfering with data from a different endpoint. A consequence of this design choice is that it is difficult for a malicious user to disrupt SigComp operation by inserting false compressed messages on the transport layer. Each decompressor in the architecture of Figure 1 is an instance of the Universal Decompressor Virtual Machine (UDVM). Figure 2 gives a more detailed view of a UDVM, including all of the interfaces between the UDVM and its environment. +----------------+ +----------------+ | | Request compressed data | | | |-------------------------------->| | | |<--------------------------------| | | | Provide compressed data | | | | | Dispatcher | | | | | | | Output decompressed data | | | |-------------------------------->| | | | | | | | +----------------+ | UDVM | | | +----------------+ | | Request state information | | | |-------------------------------->| | | |<--------------------------------| | | | Provide state information | | | | | State | | | | Handler | | | Make state creation request | | | |-------------------------------->| | | | Forward announcement | | | | | | +----------------+ +----------------+ Figure 2: Interfaces between the UDVM and its environment Note that for simplicity, the UDVM indicates when it requires additional compressed data or state information using an explicit instruction. It then pauses and waits for the information to be supplied before continuing with the next instruction. This prevents the arrival of more data from interfering with the operation of the UDVM (e.g. by accidentally overwriting UDVM memory that is currently in use). Price, Hannu, et al. [Page 8] INTERNET-DRAFT SigComp March 1, 2002 3.1. Requirements on application From an application perspective the SigComp layer appears as a new transport, with similar behavior to the original transport used to carry uncompressed data (for example SigComp/UDP behaves similarly to native UDP). If the application wishes to mix SigComp messages with other types of data (e.g. uncompressed data, or SigComp data for a different application) on the same transport then the transport must distinguish between the different types of data. This means that a new port will need to be reserved or discovered for the SigComp messages destined for a particular application. For example SIP uses port 5060 for TCP and port 5061 for TLS/TCP, so it could similarly reserve another port for SigComp/TCP. In the interests of security, a new interface is required to the signaling application in order to leverage the authentication functions built into the application itself. When the application receives a decompressed message it determines the identity of the sending endpoint and supplies this information to the state handler. 3.2. Application-defined parameters When an application invokes SigComp, a number of parameters are provided by the application to control the maximum size of compressed messages, the UDVM memory size etc. The local and remote applications that wish to communicate MUST initially agree on a common set of values for these parameters. Note that the majority of application-defined parameters are set to fixed values for a particular signaling application. However, endpoints implementing SigComp will typically have a wide range of capabilities; each offering a different amount of working memory, processing power and so on. In order to support this wide variation in endpoint capabilities, SigComp includes a mechanism for modifying the following application-defined parameters on the fly: UDVM_version UDVM_memory_size cycles_per_bit cycles_per_message Initial state The SigComp announcement mechanism is described further in Section 6.3. The advantage of building the announcement mechanism into SigComp is that it avoids the need for any form of negotiation to be performed by the application itself. Instead, it is sufficient to initialize Price, Hannu, et al. [Page 9] INTERNET-DRAFT SigComp March 1, 2002 all of the application-defined parameters to fixed values and modify them later using SigComp itself. Each application-defined parameter is described below. Note that unless otherwise indicated, all of the parameters can be stored as 2-byte integers. UDVM_version The UDVM_version parameter specifies the level of functionality available at the UDVM. The basic version of the UDVM (Version 0) is defined in this document. maximum_expansion_size The maximum_expansion_size parameter prevents the generation of excessively large SigComp messages. If set to 0 then the parameter is ignored by SigComp; for any other value then if an uncompressed message is k bytes long, the corresponding SigComp message must be no larger than (k + maximum_expansion_size). Note that any value other than 0 bans the creation of standalone SigComp messages (i.e. messages that do not contain a compressed application message). maximum_compressed_size The maximum_compressed_size parameter limits the size of one compressed message. SigComp rejects any message larger than the specified value. maximum_uncompressed_size The maximum_uncompressed_size parameter limits the size of one uncompressed message. SigComp rejects any message larger than the specified value. minimum_hash_size The minimum_hash_size parameter specifies the minimum size of the state identifier when creating new state information. This value needs to be sufficiently large to prevent malicious users from guessing a state identifier by brute force. UDVM_memory_size The UDVM_memory_size parameter specifies the total number of bytes in the UDVM memory. cycles_per_bit The cycles_per_bit parameter specifies the number of "CPU cycles" Price, Hannu, et al. [Page 10] INTERNET-DRAFT SigComp March 1, 2002 that can be used to decompress a single bit of data. One CPU cycle typically corresponds to a single UDVM instruction, although some of the high-level instructions may require additional cycles. cycles_per_message The cycles_per_message parameter specifies the number of additional CPU cycles made available at the start of a compressed message. These cycles can be useful when decompressing algorithms that upload additional data on a per-message basis, for example a new set of Huffman codes as with [DEFLATE]. The total maximum number of "CPU cycles" available for each compressed message is specified by the following formula: maximum_cycles = message_size * cycles_per_bit + cycles_per_message maximum_state_size The maximum_state_size parameter specifies the maximum amount of state information that can be saved by a local endpoint, for each remote endpoint with which it communicates. Note that the amount of state information is expressed as a multiple of the parameter UDVM_memory_size, because an item of state generally reflects the contents of the UDVM memory. Initial state The application can store useful information in the form of state. This predefined state is used to offer a range of well-known decompression algorithms to the compressor, which can choose to avoid uploading bytecode for a new algorithm if it supports one of the well-known algorithms. Each item of initial state can be made mandatory for every instance of the application, or it can be made optional (in which case support for the relevant state will need to be advertised before the state can be used). 4. SigComp message flow This chapter describes the SigComp message flow and the operation of the compressor and decompressor dispatcher. 4.1. Message exchange The local SigComp layer may send compressed data to a remote SigComp layer, and the local SigComp layer may also receive compressed data. Note however that compression in one direction does not necessarily imply compression in the reverse direction. Furthermore, even in the case that there are two unidirectional compressed flows between two Price, Hannu, et al. [Page 11] INTERNET-DRAFT SigComp March 1, 2002 SigComp layers, there is no need to use the same compression algorithm at both compressors. 4.2. SigComp message format In every SigComp message the first few bytes are interpreted as a state identifier that accesses some previously stored state information. This state information includes all of the data needed to decompress the SigComp message: including the decompression algorithm that will be applied to the remainder of the message, as well as any additional information that is required (e.g. one or more previously received messages if dynamic compression is in use). The format of the basic SigComp message is given in Figure 4: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 1 1 1 1 1 | length | +---+---+---+---+---+---+---+---+ | | : state_identifier (n-bytes) : | | +---+---+---+---+---+---+---+---+ | | : Remaining SigComp message : | | +---+---+---+---+---+---+---+---+ Figure 4: Basic SigComp message The length field is a 3-bit value (MSBs before LSBs) that indicates the length of the state identifier. The actual size n of the state identifier is calculated as follows: n = minimum_hash_size + length - 1 The state identifier is then extracted from the SigComp message and then executed as defined by the STATE-EXECUTE instruction of Chapter 9. If the length value is set to 0 then no state is accessed; instead the entire SigComp message is copied into the UDVM memory beginning at Address 6, and then executed starting from Address 6. All other addresses in the UDVM memory are initialized to 0. Decompression failure occurs if the SigComp message is too short to contain the expected state identifier, or if the requested state does not exist. See Section 8.2 for further details. Price, Hannu, et al. [Page 12] INTERNET-DRAFT SigComp March 1, 2002 4.3. Interfaces to and from the compressor dispatcher When the application provides a message to be compressed, it MUST also provide an "endpoint identity" that distinguishes the endpoint from other endpoints. The exact format of the endpoint identity is unimportant, provided that distinct endpoints have distinct endpoint identities. The SigComp layer contains one compressor for each remote endpoint with which the local application is communicating; the dispatcher forwards each new application message to the appropriate compressor (invoking a new compressor if a new endpoint identity is encountered). Note that the application MUST indicate to the compressor dispatcher when it no longer wishes to communicate with a particular endpoint, so that the resources taken by the corresponding compressor can be reclaimed. 4.4. Interfaces to and from the decompressor dispatcher To ensure that SigComp can run over an unsecure transport layer, the decompressor dispatcher invokes a new decompressor for each new SigComp message. Resources for the decompressor are released as soon as the message is decompressed. Upon the arrival of a SigComp message the decompressor dispatcher invokes an instance of the UDVM and loads it with the indicated state as per Section 4.2. The message is then decompressed by the UDVM, returned to the decompressor dispatcher, and passed on to the receiving application. Note that when the UDVM is invoked it does not receive any compressed data by default, but instead requests new data explicitly using a specific instruction. Therefore, the dispatcher is responsible for buffering each SigComp message and passing the data to the UDVM when it is requested. If the UDVM requests additional compressed data that is not yet available then it pauses and waits until enough data has been received by the dispatcher. Uncompressed data is also outputted by the UDVM using a specific instruction. Note that the UDVM has no awareness of whether the underlying transport is message-based or stream-based, and so it always outputs uncompressed data as a stream. It is the responsibility of the dispatcher to provide the uncompressed message to the application in the expected form (i.e. as a stream or as a set of distinct, bounded messages). Price, Hannu, et al. [Page 13] INTERNET-DRAFT SigComp March 1, 2002 For a stream-based transport, the dispatcher delimits messages by parsing the compressed data stream for instances of 0xFF and taking the following actions: Occurs in data stream: Meaning: 0xFF 00 one 0xFF byte in the data stream 0xFF 01 same, but the next byte is quoted (could be another 0xFF) : : 0xFF 7F same, but the next 127 bytes are quoted 0xFF 80 to 0xFF FE reserved 0xFF FF message boundary The reserved characters are useful for byte stuffing (if a compression algorithm generates compressed data containing the character 0xFF then it should be replaced by the character 0xFF00 to avoid accidentally inserting a message delimiter into the compressed data stream). 5. SigComp compressor An important feature of SigComp is that if two endpoints cannot agree on a common algorithm with which to send and receive data, it is possible for the compressor to upload bytecode for its own choice of algorithm to the decompressor. In particular this means that it is not necessary to force all compressors to use the same default algorithm; instead each implementer has the freedom to pick one of the predefined algorithms or to upload their own if needed. The overall requirement placed on the compressor is that of transparency, i.e. the compressor MUST NOT send bytecode which cause the UDVM to incorrectly decompress a given message. The following more specific requirements are also placed on the compressor (they can be considered particular instances of the transparency requirement): * It is RECOMMENDED that the compressor supply a CRC over the uncompressed message to ensure that successful decompression has occurred. A UDVM instruction is provided to verify this CRC. * If the transport is message-based then the compressor MUST preserve the boundaries between messages. * If the transport is stream-based but the application defines its own internal message boundaries, then the compressor SHOULD preserve the boundaries between messages by using the "end-of- message" character 0xFFFF reserved by SigComp. Price, Hannu, et al. [Page 14] INTERNET-DRAFT SigComp March 1, 2002 * The compressor MUST NOT exceed the maximum_compressed_size and MUST ensure that the message can be decompressed using no more than the resources available at the remote decompressor. The reason for preserving the message boundaries over a stream-based transport is that damage to one compressed message does not affect the decompression of subsequent messages. Moreover, the application typically vetoes state creation requests on a per-message basis. 5.1. Supplying bytecode to the UDVM A compressor MUST be certain that compressed data can be decompressed before the data is to be sent, i.e. the UDVM instructions for decompression MUST be available at the remote decompressor. Several options exist for ensuring that this bytecode is available: 1. Each SigComp message sent from the compressor contains the necessary UDVM instructions for decompression. 2. By setting up a reliable connection, such as a TCP connection, between a compressor and its remote decompressor the UDVM instructions can be transferred and saved as state. 3. If there are predefined UDVM codes for well-known algorithms, a compressor only needs to send the state identifier of that UDVM decompression algorithm code to its remote decompressor. The decompressor can then populate the UDVM locally. In order to save delay for "time-critical" sessions, the UDVM instructions should be uploaded prior to any initiation of "time- critical" sessions. 5.2. Compression failure The compressor SHOULD make every effort to successfully compress an application message, but in certain cases this might not be possible (particularly if a low maximum_compressed_size has been set by the application). In this case a "compression failure" is called. Reasons for compression failure include the following: * A compressed or uncompressed message exceeds the maximum size defined by the application. * The maximum_compressed_size is exceeded for a certain message. * Insufficient resources are available at the compressor or at the remote decompressor. If a compression failure occurs when compressing a message then the compressor informs the dispatcher and takes no further action. The Price, Hannu, et al. [Page 15] INTERNET-DRAFT SigComp March 1, 2002 dispatcher MUST report this failure to the application. The application may then try other methods to deliver the message. 6. State handling and state announcement This chapter defines the behavior of the SigComp state handler. The function of the state handler is to retain information between successive SigComp messages; it is the only SigComp entity that is capable of this function, and so it is of particular importance from a security perspective. 6.1. Storing and retrieving state To provide security against the malicious insertion or modification of SigComp messages, the UDVM memory is reset after decompressing each message. This ensures that damaged SigComp messages do not prevent the successful decompression of subsequent valid messages. Note however that the overall compression ratio is often significantly higher if messages can be compressed relative to the information stored in previous messages. For this reason it is possible to create "state" information for access when a later message is being decompressed. Both the creation and access of state are designed to be secure against malicious tampering with the compressed data. State can only be created when a complete message has been successfully decompressed, and the state handler MUST NOT save state without permission from the application. Upon receiving a decompressed message, the application may supply the state handler with the identity of the sending endpoint. Supplying this identity grants permission for the state handler to do the following: * An item of state can be saved using the memory reserved for the specified endpoint. * Announcement information can be taken into account when sending SigComp messages to the specified endpoint. This is especially useful if the application has an authentication mechanism that can be applied to determine whether the decompressed data is legitimate. Also note that state is not deleted when it is accessed. So even if a malicious user manages to access state information, subsequent messages compressed relative to this state can still be successfully decompressed. Instead, the state handler is responsible for deleting Price, Hannu, et al. [Page 16] INTERNET-DRAFT SigComp March 1, 2002 state information once it determines that the state will no longer be needed. Each item of state stores the following information: Name: Type of data: state_identifier 16-byte value state start 2-byte value state_instruction 2-byte value state length 2-byte value state_value String of bytes The state_identifier must be supplied to retrieve an item of state from the state handler. State can be accessed using the UDVM instructions STATE-REFERENCE and STATE-EXECUTE, and can be created using the END-MESSAGE instruction. The state_value is a byte string that contains the actual value that is copied from/to the UDVM memory. The state_length specifies the number of bytes contained within state_value, and state_start gives the UDVM memory address to which the state_value is copied when it is accessed. Finally, state_instruction specifies the memory address of the next UDVM instruction to execute when state is accessed. The kind of information which is included in the state_value is up to a particular compressor and the uploaded instructions in the remote UDVM. However a compressor MUST NOT use a state that is not known to be established at the remote decompressor. 6.2. Saving and deleting states The state handler for each endpoint is expected to offer memory to store UDVM-created state. Every remote endpoint that wishes to communicate with the local endpoint expects to be able to store a fixed amount of state; the number of bytes that it can store is given by the formula UDVM_memory_size * maximum_state_size. Note that each item of state costs (state_length + 22) bytes to store. The state handler keeps track of which endpoint created each item of state; when a particular endpoint exceeds its allocated memory limit then sufficient items of state created by the same endpoint are deleted (oldest state first) until enough memory is available to accommodate the new state. Price, Hannu, et al. [Page 17] INTERNET-DRAFT SigComp March 1, 2002 The application MUST indicate to the state handler when it no longer wishes to communicate with a particular endpoint, so that the resources taken by the corresponding state can be reclaimed. 6.3. Announcement The announcement information is used to modify the value of certain application-defined parameters. Since these parameter values are saved between SigComp messages, they are considered to be part of the overall state and hence are supplied from the UDVM to the state handler. The following list of parameters is passed to the state handler using the appropriate UDVM instruction (namely the END-MESSAGE instruction): +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDVM_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDVM_memory_size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cycles_per_bit | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cycles_per_message | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | id_length 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : id_value_1 : | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | id_length n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : id_value_n : | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : reserved : | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Announcement information Price, Hannu, et al. [Page 18] INTERNET-DRAFT SigComp March 1, 2002 If the application does not return a valid endpoint identifier then the announcement information is automatically discarded by the state handler. Otherwise it is passed to the compressor responsible for sending messages to the given endpoint. The reserved field allows for additional items of data to be added to the announcement information in future. Note that the length field specifies the total length of the announcement information including the reserved field. As usual, MSBs are stored preceding LSBs. The remaining items of data are explained in greater detail below: 6.3.1. UDVM version The next 2 bytes of the announcement information specify whether only the basic version of the UDVM is available, or whether an upgraded version of the UDVM is available offering additional instructions etc. The basic version of the UDVM is Version 0, which is the version described in this document. Upgraded versions MUST be backwards- compatible with the basic version in the following sense: * If some UDVM bytecode reaches the END-MESSAGE or DECOMPRESSION- FAILURE instructions when running on Version 0 of the UDVM, then the upgraded version MUST run the bytecode in an identical manner. This condition ensures that all bytecode that is valid for Version 0 of the UDVM will continue to be valid for upgraded versions of the UDVM. However, bytecode that is invalid on Version 0 of the UDVM (i.e. bytecode that produces a decompression failure that is not manually triggered) may become valid on upgraded versions. The simplest way to upgrade the UDVM in a backwards-compatible manner is to add additional UDVM instructions, as this will not affect the operation of existing UDVM bytecode. 6.3.2. Memory size and CPU cycles The next 6 bytes of data specify new values for the application- defined parameters UDVM_memory_size, cycles_per_bit and cycles_per_message. Note that this data can only be used to increase the amount of resources available at the remote UDVM. If the data specifies a parameter value that is smaller than the value already possessed by the state handler, the parameter keeps its original value (i.e. the announcement data for this parameter is simply ignored). Price, Hannu, et al. [Page 19] INTERNET-DRAFT SigComp March 1, 2002 In particular, only allowing the parameter values to increase means that the announcement mechanism is robust against message loss or reordering. The parameters can only be restored to their original values if reset or renegotiated by the application. 6.3.3. State identifiers The list of state identifiers indicates that the sending endpoint supports one or more optional mechanisms (including well-known decompression algorithms, dictionaries of common SIP phrases, feedback mechanisms etc.). The integer n specifies the number of state identifiers to follow. The field id_length_j specifies the length in bytes of id_value_j, where acceptable values for id_length_j range from 1 to 16 inclusive. If a value outside this range is received then the subsequent state identifiers are ignored by the state handler. Each id_value_j indicates support for one optional mechanism at the sending endpoint. The optional mechanisms themselves, and their corresponding state identifiers, are beyond the scope of this document. 7. Overview of the UDVM Decompression functionality for SigComp is provided by a "Universal Decompressor Virtual Machine" (UDVM). The UDVM is a virtual machine much like the Java Virtual Machine but with a key difference: it is designed solely for the purpose of running decompression algorithms. The motivation for creating the UDVM is to provide unlimited flexibility when choosing how to compress a given item of data. Rather than picking one of a small number of pre-negotiated compression algorithms, the implementer has the freedom to select an algorithm of their choice. The compressed data is then combined with a set of UDVM instructions that allow the original data to be extracted, and the result is outputted as UDVM bytecode. Since the UDVM is optimized specifically for running decompression algorithms, the code size of a typical algorithm is small (often sub 100 bytes). Moreover the UDVM approach does not add significant extra processing or memory requirements compared to running a fixed pre- programmed decompression algorithm. This chapter describes some basic features of the UDVM, including the well-known variables and instruction operands. Price, Hannu, et al. [Page 20] INTERNET-DRAFT SigComp March 1, 2002 Recall that the amount of memory available to the UDVM is specified by the application-defined parameter UDVM_memory_size. Any attempt to read memory addresses beyond the overall memory size MUST cause a decompression failure (see Section 8.2). 7.1. Well-known variables The first few variables in the UDVM memory have special tasks, for example specifying the location of the stack used by the CALL and RETURN instructions. Each of these well-known variables is a 2-byte integer. The following list gives the name of each well-known variable and the memory address at which the variable can be found: Name: Starting memory address: byte_copy_left 0 byte_copy_right 2 stack_location 4 The MSBs of each variable are always stored before the LSBs. So, for example, the MSBs of stack_location are stored at Address 4 whilst the LSBs are stored at Address 5. The use of each well-known variable is described in the following sections of the document. 7.2. Instruction operands Each of the UDVM instructions is followed by 0 or more bytes containing the operands required by the instruction. To reduce the code size of a typical UDVM program, each operand for a UDVM instruction is compressed using variable-length encoding. The aim is to store more common operand values using fewer bits than rarely occurring values. Three different types of operand are available: the literal, the reference and the multitype. The operand types that follow each UDVM instruction are specified in Chapter 9. The UDVM bytecode for each operand type is illustrated in Figure 7 to Figure 9, together with the integer values represented by the bytecode. Note that the MSBs in the bytecode are illustrated as preceding the LSBs. Also, any string of bits marked with k consecutive "n"s is to be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the MSBs of n illustrated as preceding the LSBs). Price, Hannu, et al. [Page 21] INTERNET-DRAFT SigComp March 1, 2002 The decoded integer value of the bytecode can be interpreted in two ways. In some cases it is taken to be the actual value of the operand. In other cases it is taken to be a memory address at which the 2-byte operand value can be found (MSBs found at the specified address, LSBs found at the following address). The latter case is denoted by memory[X] where X is the address and memory[X] is the 2- byte value starting at Address X. The simplest operand type is the literal (#), which encodes a constant integer from 0 to 65535 inclusive. A literal operand may require between 1 and 3 bytes depending on its value. Bytecode: Operand value: Range: 0nnnnnnn N 0 - 127 10nnnnnn nnnnnnnn N 0 - 16383 11000000 nnnnnnnn nnnnnnnn N 0 - 65535 Figure 7: Bytecode for a literal (#) operand The second operand type is the reference ($), which is always used to access a 2-byte value located elsewhere in the UDVM memory. The bytecode for a reference operand is decoded to be a constant integer from 0 to 65535 inclusive, which is interpreted as the memory address containing the actual value of the operand. Note that reference operands can always take values from 0 to 65535 inclusive, as they reference 2-byte values. Bytecode: Operand value: Range: 0nnnnnnn memory[2 * N] 0 - 65535 10nnnnnn nnnnnnnn memory[2 * N] 0 - 65535 11000000 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 8: Bytecode for a reference ($) operand The third kind of operand is the multitype (%), which can be used to encode both actual values and memory addresses. The multitype operand also offers efficient encoding for small integer values (both positive and negative) and for powers of 2. Bytecode: Operand value: Range: 00nnnnnn N 0 - 63 01nnnnnn memory[2 * N] 0 - 65535 1000011n 2 ^ (N + 6) 64 , 128 10001nnn 2 ^ (N + 8) 256 , ... , 32768 111nnnnn N + 65504 65504 - 65535 1001nnnn nnnnnnnn N + 61440 61440 - 65535 101nnnnn nnnnnnnn N 0 - 8191 Price, Hannu, et al. [Page 22] INTERNET-DRAFT SigComp March 1, 2002 110nnnnn nnnnnnnn memory[N] 0 - 65535 10000000 nnnnnnnn nnnnnnnn N 0 - 65535 10000001 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 9: Bytecode for a multitype (%) operand 7.3. Byte copying A number of UDVM instructions require a string of bytes to be copied to and from areas of the UDVM memory. This section defines how the byte copying operation should be performed. In general, the string of bytes is copied in ascending order of memory address. So if a byte is copied from/to Address n then the next byte is copied from/to Address n + 1. As usual, if a byte is read from an address beyond the overall memory size then decompression failure occurs. Note however that if a byte is copied from/to the memory address specified in byte_copy_right, the byte copy operation continues by copying the next byte from/to the memory address specified in byte_copy_left. This is useful for setting up a "circular buffer" within the UDVM memory. Note that the string of bytes is copied on a purely byte-by-byte basis. In particular, some of the later bytes to be copied may themselves have been written into the UDVM memory by the byte copying operation currently being performed. Equally, it is possible for a byte copying operation to overwrite the instruction that called the byte copy. If this occurs then the byte copying operation MUST be completed as if the original instruction were still in place in the UDVM memory (this also applies if byte_copy_left or byte_copy_right are overwritten). 8. Decompressing a SigComp message This chapter lists the steps involved in the decompression of a single SigComp message. 8.1. Invoking the UDVM Whenever the dispatcher receives a message to be decompressed, it invokes a new instance of the UDVM. The UDVM_memory_size is initialized using the corresponding application-defined parameter. The following steps are then taken: 1.) The number of remaining CPU cycles is set equal to the application-defined parameter cycles_per_message. Price, Hannu, et al. [Page 23] INTERNET-DRAFT SigComp March 1, 2002 Notes: The amount of compressed data available to the UDVM is exactly one compressed message. If the transport is stream-based then SigComp uses the reserved byte string 0xFFFF to delimit the compressed messages: the dispatcher takes the data between a pair of neighboring reserved byte strings to be a single compressed message. The reserved byte string itself is not considered to be part of the compressed message. The compressed data is not provided to the UDVM by default. Instead, the UDVM requests compressed data using the INPUT instructions (useful when running over a stream-based transport since there is no need to wait for the entire compressed message before decompression can begin). The dispatcher MUST NOT make more than one compressed message available to a given instance of the UDVM. In particular, the dispatcher MUST NOT concatenate two messages to form a single compressed message. This is because compressed messages are typically padded with trailing zero bits so that they are a whole number of bytes long. Concatenating two messages would cause these padding bits to be incorrectly interpreted as compressed data. 2.) Next, the instructions contained within the UDVM memory are executed beginning at the address specified by the state as per Section 4.2. Notes: The instructions are executed consecutively unless otherwise indicated (for example when the UDVM encounters a JUMP instruction). If the next instruction to be executed lies outside the available memory then decompression failure occurs (see Section 8.2). 3.) Each time an instruction is executed the number of available CPU cycles is decreased by the amount specified in Chapter 9. Additionally, if the UDVM requests n bits of compressed data (using one of the INPUT instructions) then the number of available CPU cycles is increased by n * cycles_per_bit. Notes: This means that the total number of CPU cycles available for processing a compressed message is given by the formula: maximum_cycles = cycles_per_message + message_size * cycles_per_bit The reason that this total is not allocated to the UDVM when it is invoked is that the UDVM can begin to decompress a message that has Price, Hannu, et al. [Page 24] INTERNET-DRAFT SigComp March 1, 2002 only been partially received. So the total message size may not be known when the UDVM is initialized. 4.) The UDVM stops executing instructions when it encounters an END-MESSAGE instruction or if decompression failure occurs. Notes: The UDVM passes uncompressed data to the dispatcher using the OUTPUT instruction. The OUTPUT instruction can be used to output a partially decompressed message; it is a dispatcher decision whether to use the data immediately or whether to buffer and wait until the entire message has been decompressed. The UDVM passes state creation requests to the state handler using the END-MESSAGE instruction. This means that it is only possible to make a state creation request once the message has been decompressed, which is necessary since the application typically determines the validity of these requests based on the contents of the decompressed message. 8.2. Decompression failure If a compressed message given to the UDVM is corrupted (either accidentally or maliciously) then the UDVM may terminate with a decompression failure. Reasons for decompression failure include the following: * A compressed or uncompressed message exceeds the maximum size defined by the application. * The UDVM exceeds the available CPU cycles for decompressing a message. * The UDVM attempts to read a memory address beyond the overall memory size. * An unknown instruction type is encountered. * An unknown operand type is encountered. * An instruction is encountered that cannot be processed successfully by the UDVM (for example a RETURN instruction when no CALL instruction has previously been encountered). * The UDVM attempts to access non-existent state. * A manual decompression failure is triggered using the DECOMPRESSION-FAILURE instruction. Price, Hannu, et al. [Page 25] INTERNET-DRAFT SigComp March 1, 2002 If a decompression failure occurs when decompressing a message then the UDVM informs the dispatcher and takes no further action. It is the responsibility of the dispatcher to decide how to cope with the decompression failure. In general a dispatcher SHOULD discard the compressed message and any decompressed data that has been outputted. 9. UDVM instruction set The UDVM currently understands 30 instructions, chosen to support the widest possible range of compression algorithms with the minimum possible overhead. Figure 10 lists the different instructions and the bytecode values used to store the instructions at the UDVM. The cost of each instruction in CPU cycles is also given: Instruction: Bytecode value: Cost in CPU cycles: DECOMPRESSION-FAILURE 0 1 AND 1 1 OR 2 1 NOT 3 1 ADD 4 1 SUBTRACT 5 1 MULTIPLY 6 1 DIVIDE 7 1 SORT-ASCENDING 8 1 + k * ceiling(log2(k)) SORT-DESCENDING 9 1 + k * ceiling(log2(k)) MD5 10 1 + length LOAD 11 1 MULTILOAD 12 1 + n COPY 13 1 + length COPY-LITERAL 14 1 + length COPY-OFFSET 15 1 + length + offset JUMP 16 1 COMPARE 17 1 CALL 18 1 RETURN 19 1 SWITCH 20 1 + n CRC 21 1 + length END-MESSAGE 22 1 + state length OUTPUT 23 1 + output_length NBO 24 1 INPUT-BYTECODE 25 1 + length INPUT-FIXED 26 1 INPUT-HUFFMAN 27 1 + n STATE-REFERENCE 28 1 + state_length STATE-EXECUTE 29 1 + state length Figure 10: UDVM instructions and corresponding bytecode values Price, Hannu, et al. [Page 26] INTERNET-DRAFT SigComp March 1, 2002 Each UDVM instruction costs a minimum of 1 CPU cycle. Certain high- level instructions may cost additional cycles depending on the value of one of the instruction operands. The only exception when calculating the number of CPU cycles is that the STATE-EXECUTE instruction takes (1 + state_length) cycles even though it does not have a state_length operand; instead the value of state length is provided by the state handler as part of the state being accessed. All instructions are stored as a single byte to indicate the instruction type, followed by 0 or more bytes containing the operands required by the instruction. The instruction specifies which of the three operand types of Section 7.2 is used in each case. For example, the ADD instruction is followed by two operands as shown below: ADD ($operand_1, %operand_2) When converted into bytecode the number of bytes required by the ADD instruction depends on the size of each operand value, and whether the second (multitype) operand contains the operand value itself or a memory address where the actual value of the operand can be found. The instruction set available for the UDVM offers a mix of low-level and high-level instructions. The high-level instructions can all be emulated using the low-level instructions provided, but given a choice it is generally preferable to use a single instruction rather than a large number of general-purpose instructions. The resulting bytecode will be more compact (leading to a higher overall compression ratio) and decompression will typically be faster because the implementation of the compression-specific instructions can be optimized for the UDVM. Each instruction is explained in more detail below: 9.1. Mathematical instructions The following instructions provide a number of mathematical operations including bit manipulation, arithmetic and sorting. 9.1.1. Bit manipulation The AND, OR and NOT instructions provide simple bit manipulation on 2-byte words. AND ($operand_1, %operand_2) OR ($operand_1, %operand_2) NOT ($operand_1) After the operation is complete, the value of the first operand is overwritten with the result. Note that since this operand is a Price, Hannu, et al. [Page 27] INTERNET-DRAFT SigComp March 1, 2002 reference, the memory address specified by the operand is always overwritten and not the operand itself. 9.1.2. Arithmetic The ADD, SUBTRACT, MULTIPLY and DIVIDE instructions perform arithmetic on 2-byte words. ADD ($operand_1, %operand_2) SUBTRACT ($operand_1, %operand_2) MULTIPLY ($operand_1, %operand_2) DIVIDE ($operand_1, %operand_2) After the operation is complete, the first operand is overwritten with the result. Note that in all cases the arithmetic operation is performed modulo 2^16. So for example, subtracting 1 from 0 gives the result 65535. For the SUBTRACT instruction the second operand is subtracted from the first. Similarly, for the DIVIDE instruction the first operand is divided by the second operand. Note that if the second operand does not divide exactly into the first operand then the remainder is ignored. 9.1.3. Sorting The SORT-ASCENDING and SORT-DESCENDING instructions sort lists of 2- byte words. SORT-ASCENDING (%start, %n, %k) SORT-DESCENDING (%start, %n, %k) The start operand specifies the starting memory address of the block of data to be sorted. The block of data itself is divided into n lists each containing k words. The SORT-ASCENDING instruction applies a certain permutation to the lists, such that the first list is sorted into ascending order (treating each data word as an integer). The same permutation is applied to all n lists, so lists other than the first will not necessarily be sorted into order. For example, the first list might contain a set of integers to be sorted, whilst the second list might be used to keep track of the integers: Before sorting After sorting List 1 List 2 List 1 List 2 8 1 1 2 Price, Hannu, et al. [Page 28] INTERNET-DRAFT SigComp March 1, 2002 1 2 1 3 1 3 3 4 3 4 8 1 In the case of two words of data with the same value, the original ordering of the list is preserved. The SORT-DESCENDING instruction behaves as above, except that the first list is sorted into descending order. 9.1.4. MD5 The MD5 instruction calculates an MD5 hash over the specified area of UDVM memory. MD5 (%position, %length, %destination) The position and length operands define the string of bytes over which the MD5 hash is calculated. Byte copying rules are enforced as per Section 7.3. The destination operand gives the starting address to which the resulting 16-byte hash will be copied. 9.2. Memory management instructions The following instructions are used to manipulate the UDVM memory. Bytes can be copied from one area of memory to another, and areas of memory can be write-protected to make it easier for UDVM code to be compiled. 9.2.1. LOAD The LOAD instruction sets a 2-byte variable to a certain specified value. The format of a LOAD instruction is as follows: LOAD (%address, %value) The first operand specifies the starting address of the 2-byte variable, whilst the second operand specifies the value to be loaded into this variable. As usual, MSBs are stored before LSBs in the UDVM memory. 9.2.2. MULTILOAD The MULTILOAD instruction sets a contiguous block of 2-byte variables to specified values. MULTILOAD (%address, #n, %value_0, ..., %value_n-1) The first operand specifies the starting address of the contiguous variables, whilst the operands value_0 through to value_n-1 specify Price, Hannu, et al. [Page 29] INTERNET-DRAFT SigComp March 1, 2002 the values to load into these variables (in the same order as they appear in the instruction). 9.2.3. COPY The COPY instruction is used to copy a string of bytes from one part of the UDVM memory to another. COPY (%position, %length, %destination) The position operand specifies the memory address of the first byte in the string to be copied, and the length operand specifies the number of bytes to be copied. The destination operand gives the address to which the first byte in the string will be copied. Note that byte copying is performed as per the rules of Section 7.3. 9.2.4. COPY-LITERAL A modified version of the COPY instruction is given below: COPY-LITERAL (%position, %length, $destination) The COPY-LITERAL instruction behaves as a COPY instruction except that after copying, the destination operand is replaced with the memory address immediately following the address to which the final byte was copied. If the final byte was copied to the memory address specified in byte_copy_right, the destination operand is set to the memory address specified in byte_copy_left. 9.2.5. COPY-OFFSET A further version of the COPY-LITERAL instruction is given below: COPY-OFFSET (%offset, %length, $destination) The COPY-OFFSET instruction behaves as a COPY-LITERAL instruction except that an offset operand is given instead of a position operand. To derive a suitable position operand, starting at the memory address specified by destination, the UDVM counts backwards a total of offset memory addresses. If the memory address specified in byte_copy_left is reached, the next memory address is taken to be byte_copy_right. The COPY-OFFSET instruction then behaves as a COPY-LITERAL instruction, taking the position operand to be the last memory address reached in the above step. Price, Hannu, et al. [Page 30] INTERNET-DRAFT SigComp March 1, 2002 9.3. Program flow instructions The following instructions alter the flow of UDVM code. Each instruction jumps to one of a number of memory addresses based on a certain specified criterion. Note that all of the instructions give the memory addresses in the form of deltas relative to the memory address of the instruction. The actual memory address is calculated as follows: memory_address = (memory_address_of_instruction + delta) modulo 2^16 Note that certain I/O instructions (see Section 9.4) can also alter program flow. 9.3.1. JUMP The JUMP instruction moves program execution to the specified memory address. JUMP (%delta) Note that if the address (specified as a delta from the address of the JUMP instruction) lies beyond the overall UDVM memory size then decompression failure occurs. 9.3.2. COMPARE The COMPARE instruction compares two operands and then jumps to one of three specified memory addresses depending on the result. COMPARE (%operand_1, %operand_2, %delta_1, %delta_2, %delta_3) If operand_1 < operand_2 then the UDVM continues instruction execution at the (relative) memory address specified by delta 1. If operand_1 = operand_2 then it jumps to the address specified by delta_2. If operand_1 > operand_2 then it jumps to the address specified by delta_3. 9.3.3. CALL and RETURN The CALL and RETURN instructions provide support for compression algorithms with a nested structure. CALL (%delta) RETURN The CALL and RETURN instructions make use of a stack of 2-byte variables stored at the memory address specified by the well-known variable stack_location. The stack contains the following variables: Price, Hannu, et al. [Page 31] INTERNET-DRAFT SigComp March 1, 2002 Name: Starting memory address: stack_free stack_location stack[0] stack_location + 2 stack[1] stack_location + 4 stack[2] stack_location + 6 : : The MSBs of these variables are stored before the LSBs in the UDVM memory. When the UDVM reaches a CALL instruction, it finds the memory address of the instruction immediately following the CALL instruction and copies this 2-byte value into stack[stack_free] ready for later retrieval. It then increases stack_free by 1 and continues instruction execution at the (relative) memory address specified by the operand. When the UDVM reaches a RETURN instruction it decreases stack_free by 1, and then continues instruction execution at the byte position stored in stack[stack_free]. If the variable stack_free is ever increased beyond 65535 or decreased below 0 then a bad compressed message has been received and decompression failure occurs (see Section 8.2). Decompression failure also occurs if one of the above instructions is encountered and the value of stack_location is smaller than 6 (this prevents the stack from overwriting the well-known variables). 9.3.4. SWITCH The SWITCH instruction performs a conditional jump based on the value of one of its operands. SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1) When a SWITCH instruction is encountered the UDVM reads the value of j. It then continues instruction execution at the (relative) address specified by delta j. If j specifies a value of n or more, a bad compressed message has been received and decompression failure occurs. 9.3.5. CRC The CRC instruction verifies a string of bytes using a 2-byte CRC. CRC (%value, %position, %length, %delta) Price, Hannu, et al. [Page 32] INTERNET-DRAFT SigComp March 1, 2002 The actual CRC calculation is performed using the generator polynomial x^16 + x^12 + x^5 + 1, which coincides with the 2-byte Frame Check Sequence (FCS) of [RFC-1662]. The position and length operands define the string of bytes over which the CRC is evaluated. Byte copying rules are enforced as per Section 7.3. Important note: Since a CRC calculation is always performed over a bitstream, for interoperability it is necessary to define the order in which bits are supplied within each individual byte. In this case the MSBs of the byte MUST be supplied to the CRC calculation before the LSBs. The value operand contains the expected integer value of the 2-byte CRC. If the calculated CRC matches the expected value then the UDVM continues at the following instruction. Otherwise the UDVM jumps to the (relative) memory address specified by delta. 9.4. I/O instructions The following instructions allow the UDVM to interface with its environment. Note that in the overall SigComp architecture all of these interfaces pass to the decompressor dispatcher or to the state handler. 9.4.1. END-MESSAGE The END-MESSAGE instruction successfully terminates the UDVM and passes state information to the state handler. END-MESSAGE (%hash_length, %state_start, %state_length, %state_instruction, %announcement_location) Note that the actual uncompressed message is outputted separately using the OUTPUT instruction; this conserves memory at the UDVM because there is no need to buffer an entire uncompressed message before it can be passed to the dispatcher. Note that if the announcement_location operand is set to 0 then no announcement information is provided, otherwise it points to the starting memory address of the announcement information as per Section 6.3. The END-MESSAGE instruction requests the creation of state using the operands state start and state length, which together denote a byte string state_value. Provided that the application gives permission, state_value is byte copied from the UDVM memory (obeying the rules of Section 7.3) and stored together with a 16-byte state identifier that can be used to access the state by a later compressed message. Price, Hannu, et al. [Page 33] INTERNET-DRAFT SigComp March 1, 2002 To provide security against malicious access, the identifier for any item of state created by the UDVM is derived from the [MD5] hash of the state_value to be stored. The state identifier is constructed by taking the 16-byte [MD5] hash and replacing all but the first hash_length most significant bytes with zeroes. Note that if hash_length is 16 then the unmodified [MD5] hash is the state identifier. Decompression failure occurs if hash_length is less than the application-defined parameter minimum_hash_size or greater than 16. If a state identifier already exists (hash collision occurs), the decompressor should check whether the requested state is identical to the established state, and count the state creation request as successful if this is the case. If not then the state creation request is unsuccessful. The existing state MUST NOT be replaced with the requested state to be saved. This is to avoid the situation where a compressed message cannot be decompressed because a needed item of state has been replaced (possibly by a malicious sender). 9.4.2. DECOMPRESSION-FAILURE The DECOMPRESSION-FAILURE instruction triggers a manual decompression failure. This is useful if the UDVM program discovers that it cannot successfully decompress the message (e.g. by using the CRC instruction). This instruction has no operands. 9.4.3. OUTPUT The OUTPUT instruction provides successfully decompressed data to the dispatcher. OUTPUT (%output_start, %output_length) The operands define the starting memory address and length of the byte string to be provided to the dispatcher. Note that the OUTPUT instruction can be used to output a partially decompressed message; each time the instruction is encountered it appends a byte string to the end of the data previously passed to the dispatcher via the OUTPUT instruction. The string of data is byte copied from the UDVM memory obeying the rules of Section 7.3. Decompression failure occurs if the cumulative number of bytes provided to the dispatcher exceeds the application-defined parameter maximum_uncompressed_size. Price, Hannu, et al. [Page 34] INTERNET-DRAFT SigComp March 1, 2002 Since there is technically a difference between outputting a 0-byte decompressed message, and not outputting a decompressed message at all, the OUTPUT instruction needs to distinguish between the two cases. Thus, if the UDVM terminates before encountering an OUTPUT instruction it is considered not to have outputted a decompressed message. If it encounters one or more OUTPUT instructions, each of which provides 0 bytes of data to the dispatcher, then it is considered to have outputted a 0-byte decompressed message. 9.4.4. NBO The NBO instruction modifies the order in which compressed bits are passed to the UDVM. As the INPUT-FIXED and INPUT-HUFFMAN instructions read individual bits from within a byte, to avoid ambiguity it is necessary to define the order in which these bits are read. The default operation is to read the MSBs before the LSBs, but if the NBO instruction is encountered then the LSBs are read before the MSBs. Both cases are illustrated below: MSB LSB MSB LSB MSB LSB MSB LSB +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 2 3 4 5 6 7|8 9 ... | |7 6 5 4 3 2 1 0| ... 9 8| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Byte 0 Byte 1 Byte 0 Byte 1 Default operation After NBO instruction The NBO instruction can only be used before bitwise compressed data is passed to the UDVM. Therefore, a decompression failure occurs if it is encountered after an INPUT-FIXED or an INPUT-HUFFMAN instruction has been used. 9.4.5. INPUT-BYTECODE The INPUT-BYTECODE instruction requests a certain number of bytes of compressed data from the dispatcher. INPUT-BYTECODE (%length, %destination, %delta) The length operand indicates the requested number of bytes of compressed data, and the destination operand specifies the starting memory address to which they should be copied. Byte copying is performed as per the rules of Section 7.3. If the instruction requests data that lies beyond the end of the compressed message, no data is returned. Instead the UDVM moves Price, Hannu, et al. [Page 35] INTERNET-DRAFT SigComp March 1, 2002 program execution to the memory address specified by the formula (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16. The INPUT-BYTECODE instruction can only be used before bitwise compressed data is passed to the UDVM. Therefore, a decompression failure occurs if it is encountered after an INPUT-FIXED or an INPUT- HUFFMAN instruction has been used. 9.4.6. INPUT-FIXED The INPUT-FIXED instruction requests a certain number of bits of compressed data from the dispatcher. INPUT-FIXED (%length, %destination, %delta) The length operand indicates the requested number of bits. If this operand does not lie between 1 and 16 inclusive then a decompression failure occurs. The destination operand specifies the memory address to which the compressed data should be copied. Note that the requested bits are interpreted as a 2-byte integer ranging from 0 to 2^length - 1. Under default operation the MSBs of this integer are provided first, but if an NBO instruction has been executed then the LSBs are provided first. If the instruction requests data that lies beyond the end of the compressed message, no data is returned. Instead the UDVM moves program execution to the memory address specified by the formula (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16. 9.4.7. INPUT-HUFFMAN The INPUT-HUFFMAN instruction requests a variable number of bits of compressed data from the dispatcher. The instruction initially requests a small number of bits and compares the result against a certain criterion; if the criterion is not met then additional bits are requested until the criterion is achieved. The INPUT-HUFFMAN instruction is followed by three mandatory operands plus n additional sets of operands. Every additional set contains four operands as shown below: INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1, %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n, %upper_bound_n, %uncompressed_n) Note that if n = 0 then the INPUT-HUFFMAN instruction is ignored by the UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then decompression failure occurs. Price, Hannu, et al. [Page 36] INTERNET-DRAFT SigComp March 1, 2002 In all other cases, the behavior of the INPUT-HUFFMAN instruction is defined below: 1.) Set j = 1. 2.) Request an additional bits_j compressed bits. Interpret the total (bits_1 + ... + bits_j) bits of compressed data requested so far as an integer H, with the first bit to be supplied as the MSB and the last bit to be supplied as the LSB (note that this is always the case, independently of whether the NBO instruction has been used). 3.) If data is requested that lies beyond the end of the compressed message, terminate the INPUT-HUFFMAN instruction and move program execution to the memory address specified by the formula (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16. 4.) If (H < lower_bound_j) or (H > upper_bound_j) then set j = j + 1. Then go back to Step 2, unless j > n in which case decompression failure occurs. 5.) Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 to the memory address specified by the destination operand. 9.4.8. STATE-REFERENCE The STATE-REFERENCE instruction retrieves some previously stored state information. STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length, %state_destination) The id_start and id_length operands specify the location of the state identifier used to retrieve the state information. The state identifier is always 16 bytes long; if id_length is less than 16 then the remaining least significant bytes of the identifier are padded with zeroes. Decompression failure occurs if id_length is greater than 16. Decompression failure also occurs if no state information matching the state identifier can be found. Note that when accessing state information that has been previously created by the UDVM, the state identifier is always taken from an [MD5] hash of the state to be retrieved. However this is not necessarily the case for application-defined state as per Section 3.2. The state_start and state_length operands define the starting byte and number of bytes to copy from the state_value contained in the identified item of state. If more state is requested than is actually available then decompression failure occurs. Price, Hannu, et al. [Page 37] INTERNET-DRAFT SigComp March 1, 2002 The state_destination operand contains a UDVM memory address. The requested state is byte copied to this memory address using the rules of Section 7.3. 9.4.9. STATE-EXECUTE The STATE-EXECUTE instruction retrieves and runs some previously stored state information. STATE-EXECUTE (%id_start, %id_length) The id_start and id_length operands function as per the STATE- REFERENCE instruction. STATE-EXECUTE is similar to STATE-REQUEST except that it does not require the amount of state being requested or the proposed destination for the state to be specified explicitly. Instead, it simply puts the state_value back into the UDVM memory using the operands state_start and state_length contained as part of the state information. The entire state_value (all state length bytes of it) is byte copied into the memory address specified by state start. The UDVM then jumps to the (absolute) memory address specified by state_instruction. Note that state start, state length and state_instruction are all stored together with state_value as part of an item of state information. 10. Security considerations 10.1. Security goals The overall security goal of the SigComp architecture is to not create risks that are in addition to those already present in the application protocols. There is no intention for SigComp to enhance the security of the protocols, as it always can be circumvented by not using compression. More specifically, the high-level security goals can be described as: -- do not worsen security of existing application protocol -- do not create any new security issues -- do not hinder deployment of application security Price, Hannu, et al. [Page 38] INTERNET-DRAFT SigComp March 1, 2002 10.2. Security risks and mitigations This subsection identifies the potential security risks associated with the overall SigComp architecture, and details the proposed solution for each risk. ** Confidentiality risks *** Attacking SigComp by snooping into state of other users State can only be accessed using a state identifier, which is a (prefix of a) cryptographic hash of the state being referenced. This implies that the referencing packet already needs knowledge about the state. To enforce this, a reference length of 72 bits is defined. This also minimizes the probability of an accidental state collision. Generally, ways to obtain knowledge about the state identifier (e.g., passive attacks) will also easily provide knowledge about the state referenced, so no new vulnerability results. The application needs to handle state identifiers with the same care it would handle the state itself. ** Integrity risks The SigComp approach assumes that there is appropriate integrity protection below and/or above the SigComp layer. However, the state establishment mechanism provides additional potential to compromise the integrity of the messages (which, however, would most likely be detectable at the application layer). *** Attacking SigComp by faking state or making unauthorized changes to state State cannot be destroyed or changed by a malicious sender -- it can only add new state. Faking state is only possible if the hash allows intentional collision. ** Availability risks (avoid DoS vulnerabilities) *** Use of SigComp as a tool in a DoS attack to another target SigComp cannot easily be used as an amplifier in a reflection attack, as it only generates one decompressed message per incoming compressed message. This packet is then handed to the application; the utility as a reflection amplifier is therefore limited by the utility of the application. However, it must be noted that SigComp can be used to generate larger packets as input to the application than have to be sent from the Price, Hannu, et al. [Page 39] INTERNET-DRAFT SigComp March 1, 2002 malicious sender; this therefore can send smaller packets (at a lower bandwidth) than are delivered to the application. Depending on the reflection characteristics of the application, this can be considered a mild form of amplification. The application MUST limit the number of packets reflected to a potential target -- even if SigComp is used to generate a large amount of information from a small incoming attack packet. *** Attacking SigComp as the DoS target by filling it with state Excessive state can only be installed by a malicious sender (or a set of malicious senders) with the consent of the application. The system consisting of SigComp and application is thus approximately as vulnerable as the application itself, unless it allows the installation of state from a message where it would not have installed state itself. If this is desirable to increase the compression ratio, the effect can be mitigated by adding feedback at the application level that indicates whether the state requested was actually installed -- This allows a system under attack to gracefully degrade by no longer installing compressor state that is not matched by application state. *** Attacking the UDVM by faking state or making unauthorized changes to state (See "Integrity risks" above.) *** Attacking the UDVM by sending it looping code The application sets an upper limit to the number of "CPU cycles" that can be used per compressed message and per input bit in the compressed message. The damage inflicted by sending packets with looping code is therefore limited, although this may still be substantial if a large number of CPU cycles are offered by the UDVM. However, this would be true for any decompressor that can receive packets from anywhere. 11. IANA considerations The SigComp solution currently requires two identifiers to be assigned by IANA: the UDVM_version and the state identifier. Upgraded versions of the UDVM will contain additional instructions to improve the performance of the overall SigComp solution; new UDVM_version parameters will be needed in this case. 12. Acknowledgements Thanks to Price, Hannu, et al. [Page 40] INTERNET-DRAFT SigComp March 1, 2002 Abigail Surtees (abigail.surtees@roke.co.uk) Mark A West (mark.a.west@roke.co.uk) Lawrence Conroy (lwc@roke.co.uk) Christian Schmidt (christian.schmidt@icn.siemens.de) Max Riegel (maximilian.riegel@icn.siemens.de) Lars-Erik Jonsson (lars-erik.jonsson@epl.ericsson.se) Stefan Forsgren (stefan.forsgren@epl.ericsson.se) Krister Svanbro (krister.svanbro@epl.ericsson.se) Miguel Garcia (miguel.a.garcia@ericsson.com) Christopher Clanton (christopher.clanton@nokia.com) Khiem Le (khiem.le@nokia.com) Ka Cheong Leung (kacheong.leung@nokia.com) for valuable input and review. 13. Authors' addresses Richard Price Tel: +44 1794 833681 Email: richard.price@roke.co.uk Roke Manor Research Ltd Romsey, Hants, SO51 0ZN United Kingdom Hans Hannu Tel: +46 920 20 21 84 Email: hans.hannu@epl.ericsson.se Box 920 Ericsson Erisoft AB SE-971 28 Lulea, Sweden Carsten Bormann Tel: +49 421 218 7024 Email: cabo@tzi.org Universitaet Bremen TZI Postfach 330440 D-28334 Bremen, Germany Jan Christoffersson Tel: +46 920 20 28 40 Email: jan.christoffersson@epl.ericsson.se Box 920 Ericsson Erisoft AB SE-971 28 Lulea, Sweden Price, Hannu, et al. [Page 41] INTERNET-DRAFT SigComp March 1, 2002 Zhigang Liu Tel: +1 972 894-5935 Email: zhigang.liu@nokia.com Nokia Research Center 6000 Connection Drive Irving, TX 75039 USA Jonathan Rosenberg Email: jdrosen@dynamicsoft.com dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 14. References [SIP] "SIP: Session Initiation Protocol", Handley et al, RFC 2543, Internet Engineering Task Force, March 1999 [RTSP] "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A. Rao and R. Lanphier, , RFC 2326, April 1998 [HTTP] "HyperText Transfer Protocol, HTTP/1.1", R. Fielding et al.", RFC 2616, June 1999 [SIPsrv] "SIP: Locating SIP Servers", J. Rosenberg, H. Schulzrinne, draft-ietf-sip-srv-04.txt, January 2002, work in progress [DEFLATE] "DEFLATE Compressed Data Format Specification version 1.3", P. Deutsch, RFC 1951, Internet Engineering Task Force, May 1996 [SCTP] "Stream Control Transmission Protocol", Stewart et al, RFC 2960, Internet Engineering Task Force, October 2000 [MD5] "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321, Internet Engineering Task Force, April 1992 [RFC-1662] "PPP in HDLC-like Framing", Simpson et al, Internet Engineering Task Force, July 1994 [RFC-2026] "The Internet Standards Process - Revision 3", Scott Bradner, Internet Engineering Task Force, October 1996 [RFC-2119] "Key words for use in RFCs to Indicate Requirement Levels", Scott Bradner, Internet Engineering Task Force, March 1997 Price, Hannu, et al. [Page 42] INTERNET-DRAFT SigComp March 1, 2002 Appendix A. Document history - October 19, 2001, version 00 First version. The draft describes the current ideas, from people involved in the ROHC WG, of how to perform compression of application signaling messages. - October 31, 2001, version 01 Second version. Additional section, 5.2.1, which describes when a message identifier can be reused. - November 21, 2001, version 02 Third version. Section 6 has been moved to a separate draft. The third version describes a modular solution, providing flexibility for implementers to decide which functions they want to integrate. - January 28, 2002, version 03 Fourth version. SigComp version 02 is divided into this draft, a UDVM draft and an extended operation mechanisms draft. Compressor/decompressor (UDVM) state approach has been introduced for security reasons. - February 14, 2002, version 04 Fifth version. Describes the complete base SigComp solution including the UDVM. - March 1, 2002, version 05 Sixth version. Comments from several authors and contributors have been taken into account. Announcement mechanism has been updated. This Internet-Draft expires in September 2002. Price, Hannu, et al. [Page 43]