Network Working Group                  Richard Price, Siemens/Roke Manor 
INTERNET-DRAFT                                      Hans Hannu, Ericsson  
Expires: September 2002                  Carsten Bormann, TZI/Uni Bremen 
                                           Jan Christoffersson, Ericsson 
                                                      Zhigang Liu, Nokia 
                                         Jonathan Rosenberg, dynamicsoft 
  
                                                           March 1, 2002 
 
 
                      Signaling Compression (SigComp) 
                     <draft-ietf-rohc-sigcomp-05.txt> 
                                     
    
Status of this memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that other 
   groups may also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-Drafts as reference 
   material or cite them other than as "work in progress". 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/lid-abstracts.txt 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html 
    
   This document is a submission of the IETF ROHC WG. Comments should be 
   directed to its mailing list, rohc@ietf.org. 
    
    
Abstract 
    
   This document defines SigComp, a solution for compressing messages 
   generated by application protocols such as [SIP] and [RTSP]. The 
   architecture and pre-requisites of SigComp are outlined, along with 
   the format of the SigComp message. 
    
   Decompression functionality for the SigComp solution is provided by a 
   "Universal Decompressor Virtual Machine" optimized for the task of 
   running decompression algorithms. The UDVM can be configured to 
   understand the output of many well-known compressors such as 
   [DEFLATE]. 

 
Price, Hannu, et al.                                            [Page 1]  

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
Table of contents 
   
   1.  Introduction..................................................2 
   2.  Terminology...................................................3 
   3.  SigComp Architecture..........................................6 
   4.  SigComp message flow..........................................11 
   5.  SigComp compressor............................................14 
   6.  State handling and announcement...............................16 
   7.  Overview of the UDVM..........................................20 
   8.  Decompressing a SigComp message...............................23 
   9.  UDVM instruction set..........................................26 
   10. Security considerations.......................................38 
   11. IANA considerations...........................................40 
   12. Acknowledgements..............................................41 
   13. AuthorsÆ addresses............................................41 
   14. References....................................................42 
   Appendix A. Document history......................................43 
 
1.  Introduction 
    
   The Session Initiation Protocol [SIP], along with many other 
   application protocols used for multimedia communications such as 
   [RTSP], is a textual protocol engineered for bandwidth rich links. As 
   a result, SIP messages have not been optimized in terms of size. 
   Typical SIP messages range from a few hundred bytes to as high as two 
   thousand. To date, this has not been a significant problem.  
    
   With the planned usage of these protocols in wireless handsets as 
   part of 2.5G and 3G cellular networks, the large size of these 
   messages is problematic. With low-rate IP connectivity, store-and-
   forward delays are significant. Taking into account retransmits, and 
   the multiplicity of messages that are required in some flows, call 
   setup and feature invocation are adversely affected. Therefore, we 
   believe there is merit in reducing these message sizes.  
 
   This document outlines the architecture and pre-requisites of the 
   SigComp solution, the format of the SigComp message, algorithm 
   upload, and the Universal Decompressor Virtual Machine (UDVM) that 
   provides decompression functionality. 
    
   SigComp is offered to applications as a "shim" layer between the 
   application and the transport. The service provided is that of the 
   underlying transport plus compression. Both connection-oriented and 
   connectionless transports are supported by SigComp. 
    
   This document focuses on the signaling scenario where an end-terminal 
   communicates with a proxy. However SigComp may be applicable to other 
   scenarios with multiple endpoints compressing and decompressing data. 
    
    
Price, Hannu, et al.                                            [Page 2] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
2.  Terminology 
    
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in [RFC-2119]. 
    
   SigComp 
    
     The overall solution for signaling compression, comprising the  
     compressor, decompressor, dispatchers and state handler. 
    
   Application 
    
     For the purpose of this document, an application is a text-based 
     protocol software that: 
    
     a) sends application data to the compressor dispatcher 
     b) receives data from the decompressor dispatcher 
     c) authenticates the sender of a decompressed message and gives 
        permission for state to be saved in the sender's name 
    
   Application message 
    
     An uncompressed message provided to or from the application. 
    
   Endpoint 
    
     One instance of an application plus a SigComp layer. Each endpoint  
     is capable of sending and/or receiving SigComp messages. 
    
   Endpoint identity 
    
     A unique indicator assigned to each endpoint by the application  
     (for example an URI). The application authenticates the sender of 
     a decompressed message, and provides their endpoint identity to the 
     SigComp state handler. 
    
   Transport 
    
     Mechanism for passing data between two endpoints. SigComp is  
     capable of sending messages over a wide range of transports  
     including TCP, UDP and [SCTP]. 
    
   Message-based transport 
    
     A transport that carries data as a set of bounded messages. 
    
   Stream-based transport 
    
     A transport that carries data as a continuous stream with no  
     message boundaries. 

 
Price, Hannu, et al.                                            [Page 3] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Application-defined parameters 
    
     Parameters that must be agreed upon by the local and remote  
     endpoints invoking SigComp. Values for the application-defined  
     parameters are typically fixed to meet the requirements of a 
     particular signaling application. 
    
   SigComp message 
    
     May contain a compressed application message in the form of UDVM  
     bytecode. In case of a message-based transport such as UDP, a  
     SigComp message corresponds to exactly one (UDP) datagram. For a  
     stream-based transport such as TCP, each SigComp message is  
     separated by a reserved delimiter. 
    
   Standalone SigComp message 
    
     A SigComp message that does not include any compressed application  
     data. Certain signaling applications may not allow standalone 
     SigComp messages due to security requirements. 
    
   Compressor 
    
     Entity that invokes an encoder, and keeps track of states that can  
     be used for compression. It is responsible for supplying UDVM 
     bytecode to the remote decompressor in order for compressed 
     data to be decompressed. 
    
   Encoder 
    
     Encodes data according to a particular compression algorithm. 
    
   Compressor dispatcher 
    
     Entity that receives uncompressed application messages, invokes a 
     compressor, and forwards the resulting SigComp messages to a remote  
     endpoint. 
    
   Decompressor 
    
     Entity that is responsible for converting a SigComp message into  
     uncompressed data. Decompression functionality is provided by the  
     UDVM. 
    
   Decompressor dispatcher 
    
     Entity that receives SigComp messages, invokes a decompressor, and 
     forwards the decompressed application messages to an application. 
    
    
Price, Hannu, et al.                                            [Page 4] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Virtual machine 
    
     A machine architecture designed to be implemented in software 
     (although silicon implementations are of course possible). 
    
   Universal Decompressor Virtual Machine (UDVM) 
    
     The virtual machine described in this document. The UDVM is used  
     for decompression of SigComp messages. 
    
   Bytecode 
    
     Machine code that can be executed by a virtual machine. UDVM 
     bytecode is a combination of UDVM instructions and compressed data. 
    
   Per-message compression 
    
     Compression that does not reference data from previous messages.  
     SigComp can decompress a message of this type using only the  
     application-defined parameters and the data in the message itself. 
    
   Dynamic compression 
    
     Compression relative to messages sent prior to the current  
     compressed message. SigComp stores and retrieves this data using  
     the state handler. 
    
   State 
    
     Data saved for retrieval by later SigComp messages. An item of  
     state typically reflects the contents of the UDVM memory after 
     decompressing a message, but state can also be created by the 
     compressor or by the application. 
    
   State handler 
    
     Entity responsible for storing and accessing state information 
     once permission is granted by the application. 
    
   State identifier 
    
     Reference used to access an item of state previously created by the  
     compressor, the decompressor or the application. 
    
   CPU cycles 
    
     A measure of the amount of "CPU power" required to execute a UDVM  
     instruction (the simplest UDVM instructions require a single CPU  
     cycle). An upper limit is placed on the number of cycles that can  
     be used to decompress each bit in a compressed message. 


Price, Hannu, et al.                                            [Page 5] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
3.  SigComp Architecture 
 

   In the SigComp architecture compression and decompression is 
   performed at two communicating entities. SigComp is offered to 
   applications as a "shim" layer between the application and the 
   underlying transport, and so these entities are endpoints when viewed 
   from a transport layer perspective. Note however that from the 
   application perspective SigComp is applied on a per-hop basis. 
    
   Figure 1 shows the layout of a communicating endpoint that implements 
   a SigComp layer. The figure does not mandate any particular 
   implementation, but is shown to the reader for the sake of clarity. 
    
   The SigComp layer is further decomposed in the following components: 
    
   - A compressor dispatcher: this is the interface from the  
     application. The compressor dispatcher receives an application  
     message and an identifier for the receiving endpoint. Based on the 
     endpoint identity the compressor dispatcher invokes a particular  
     compressor, which returns a SigComp message that is forwarded to  
     the remote SigComp endpoint. 
    
   - A decompressor dispatcher: this is the interface towards the  
     application. A SigComp message is received by the decompressor  
     dispatcher and an instance a decompressor is invoked. Once the  
     dispatcher has received the (decompressed) application data it  
     forwards the message to the application. 
    
   - One or more compressors: a distinct compressor is invoked for each 
     remote endpoint with which the local application wishes to  
     communicate. A compressor receives an (uncompressed) application  
     message from the compressor dispatcher, compresses the  
     message, and returns a SigComp message to the compressor  
     dispatcher. During the compression process the compressor may  
     invoke the state handler to restore previous state or save new  
     state. Each compressor chooses a certain algorithm to encode the  
     data, (e.g. [DEFLATE]). 
    
   - One or more decompressors: since SigComp can run over an unsecure 
     transport layer, a distinct decompressor must be invoked on a 
     per-message basis. A decompressor receives a SigComp message from  
     the decompressor dispatcher, decompresses the message, and returns  
     the (decompressed) application message to the decompressor  
     dispatcher. During the decompression process, the decompressor may  
     invoke the state handler to restore previous state or save new  
     state. 
    
   - State handler: this entity contains enough logic to store and  
     retrieve states. State is information that is stored between  
     SigComp messages: this data can be saved either by a compressor, a  
     decompressor or an application. For security purposes the state  

 
Price, Hannu, et al.                                            [Page 6] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
     handler must always ask the application to grant permission for new  
     states to be saved. State creation and retrieval are further  
     described in Chapter 6. 
    
    
               +---------------------------------------------+ 
               |                                             | 
               |                 Application                 | 
               |                                             | 
               |                                             | 
               +---------------------------------------------+ 
                      |                    |         ^ 
            Message & |           Endpoint |         | Decompressed 
             endpoint |           identity |         | message 
             identity |                    |         | 
                      |                    |         | 
       +-- -- -- -- --|-- -- -- -- -- -- --|-- -- -- |- -- -- -- -- + 
       |              |                    |         |              | 
                      v                    v         | 
       |    +--------------+  +--------------+  +--------------+    | 
    SigComp |              |  |              |  |              | SigComp 
    message |  Compressor  |  |    State     |  | Decompressor | message 
    <-------|  dispatcher  |  |   handler    |  |  dispatcher  |<-------    
       |    |              |  |              |  |              |    | 
            +--------------+  +--------------+  +--------------+ 
       |           ^  ^          ^  ^  ^  ^          ^  ^           | 
                   |  |          |  |  |  |          |  | 
       |           |  |          |  |  |  |          |  |           | 
                   |  |          |  |  |  |          |  | 
       |           v  |          |  |  |  |          v  |           | 
            +--------------+     |  |  |  |     +--------------+ 
       |    | Compressor 1 |     |  |  |  |     |Decompressor 1|    | 
            |              |<----+  |  |  +---->|              | 
       |    |  (Encoder)   |        |  |        |    (UDVM)    |    | 
            |              |        |  |        |              | 
       |    +--------------+        |  |        +--------------+    | 
                      |             |  |                | 
       |              v             |  |                v           | 
            +--------------+        |  |        +--------------+ 
       |    | Compressor 2 |        |  |        |Decompressor 2|    | 
            |              |<-------+  +------->|              | 
       |    |  (Encoder)   |                    |    (UDVM)    |    | 
            |              |                    |              | 
       |    +--------------+                    +--------------+    | 
    
       |                        SigComp layer                       | 
       +-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 
    
    Figure 1: High-level architectural overview of one SigComp endpoint 
    

Price, Hannu, et al.                                            [Page 7] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Note that it is possible for SigComp to decompress messages from 
   multiple endpoints at different physical locations in a network, as 
   the architecture is designed to prevent data from one endpoint 
   interfering with data from a different endpoint. A consequence of 
   this design choice is that it is difficult for a malicious user to 
   disrupt SigComp operation by inserting false compressed messages on 
   the transport layer. 
    
   Each decompressor in the architecture of Figure 1 is an instance of 
   the Universal Decompressor Virtual Machine (UDVM). Figure 2 gives a 
   more detailed view of a UDVM, including all of the interfaces between 
   the UDVM and its environment. 
    
    
   +----------------+                                 +----------------+ 
   |                |     Request compressed data     |                | 
   |                |-------------------------------->|                | 
   |                |<--------------------------------|                | 
   |                |     Provide compressed data     |                | 
   |                |                                 |   Dispatcher   | 
   |                |                                 |                | 
   |                |    Output decompressed data     |                | 
   |                |-------------------------------->|                | 
   |                |                                 |                | 
   |                |                                 +----------------+ 
   |      UDVM      | 
   |                |                                 +----------------+ 
   |                |    Request state information    |                | 
   |                |-------------------------------->|                | 
   |                |<--------------------------------|                | 
   |                |    Provide state information    |                | 
   |                |                                 |     State      | 
   |                |                                 |    Handler     | 
   |                |   Make state creation request   |                | 
   |                |-------------------------------->|                | 
   |                |      Forward announcement       |                | 
   |                |                                 |                | 
   +----------------+                                 +----------------+ 
    
         Figure 2: Interfaces between the UDVM and its environment 
    
   Note that for simplicity, the UDVM indicates when it requires 
   additional compressed data or state information using an explicit 
   instruction. It then pauses and waits for the information to be 
   supplied before continuing with the next instruction. This prevents 
   the arrival of more data from interfering with the operation of the 
   UDVM (e.g. by accidentally overwriting UDVM memory that is currently 
   in use). 
    
    
Price, Hannu, et al.                                            [Page 8] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
3.1.  Requirements on application 
    
   From an application perspective the SigComp layer appears as a new 
   transport, with similar behavior to the original transport used to 
   carry uncompressed data (for example SigComp/UDP behaves similarly to 
   native UDP). 
    
   If the application wishes to mix SigComp messages with other types of 
   data (e.g. uncompressed data, or SigComp data for a different 
   application) on the same transport then the transport must 
   distinguish between the different types of data. This means that a 
   new port will need to be reserved or discovered for the SigComp 
   messages destined for a particular application. For example SIP uses 
   port 5060 for TCP and port 5061 for TLS/TCP, so it could similarly 
   reserve another port for SigComp/TCP. 
    
   In the interests of security, a new interface is required to the 
   signaling application in order to leverage the authentication 
   functions built into the application itself. When the application 
   receives a decompressed message it determines the identity of the 
   sending endpoint and supplies this information to the state handler. 
    
3.2.  Application-defined parameters 
    
   When an application invokes SigComp, a number of parameters are 
   provided by the application to control the maximum size of compressed 
   messages, the UDVM memory size etc. The local and remote applications 
   that wish to communicate MUST initially agree on a common set of 
   values for these parameters. 
    
   Note that the majority of application-defined parameters are set to 
   fixed values for a particular signaling application. However, 
   endpoints implementing SigComp will typically have a wide range of 
   capabilities; each offering a different amount of working memory, 
   processing power and so on. In order to support this wide variation 
   in endpoint capabilities, SigComp includes a mechanism for modifying 
   the following application-defined parameters on the fly: 
    
   UDVM_version 
   UDVM_memory_size 
   cycles_per_bit 
   cycles_per_message 
   Initial state 
    
   The SigComp announcement mechanism is described further in Section 
   6.3. 
    
   The advantage of building the announcement mechanism into SigComp is 
   that it avoids the need for any form of negotiation to be performed 
   by the application itself. Instead, it is sufficient to initialize 


Price, Hannu, et al.                                            [Page 9] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   all of the application-defined parameters to fixed values and modify 
   them later using SigComp itself. 
    
   Each application-defined parameter is described below. 
    
   Note that unless otherwise indicated, all of the parameters can be 
   stored as 2-byte integers. 
    
   UDVM_version 
    
     The UDVM_version parameter specifies the level of functionality 
     available at the UDVM. The basic version of the UDVM (Version 0) 
     is defined in this document. 
    
   maximum_expansion_size 
    
     The maximum_expansion_size parameter prevents the generation of 
     excessively large SigComp messages. If set to 0 then the parameter 
     is ignored by SigComp; for any other value then if an uncompressed  
     message is k bytes long, the corresponding SigComp message must be  
     no larger than (k + maximum_expansion_size). Note that any value  
     other than 0 bans the creation of standalone SigComp messages (i.e.  
     messages that do not contain a compressed application message). 
    
   maximum_compressed_size 
    
     The maximum_compressed_size parameter limits the size of one  
     compressed message. SigComp rejects any message larger than the 
     specified value. 
    
   maximum_uncompressed_size 
    
     The maximum_uncompressed_size parameter limits the size of one  
     uncompressed message. SigComp rejects any message larger than the 
     specified value. 
    
   minimum_hash_size 
    
     The minimum_hash_size parameter specifies the minimum size of the 
     state identifier when creating new state information. This value  
     needs to be sufficiently large to prevent malicious users from  
     guessing a state identifier by brute force. 
    
   UDVM_memory_size 
    
     The UDVM_memory_size parameter specifies the total number of 
     bytes in the UDVM memory. 
    
   cycles_per_bit 
    
     The cycles_per_bit parameter specifies the number of "CPU cycles"  

 
Price, Hannu, et al.                                           [Page 10] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
     that can be used to decompress a single bit of data. One CPU cycle  
     typically corresponds to a single UDVM instruction, although some  
     of the high-level instructions may require additional cycles. 
    
   cycles_per_message 
    
     The cycles_per_message parameter specifies the number of additional  
     CPU cycles made available at the start of a compressed message.  
     These cycles can be useful when decompressing algorithms that  
     upload additional data on a per-message basis, for example a new  
     set of Huffman codes as with [DEFLATE]. 
    
     The total maximum number of "CPU cycles" available for each  
     compressed message is specified by the following formula: 
    
     maximum_cycles = message_size * cycles_per_bit + cycles_per_message 
    
   maximum_state_size 
    
     The maximum_state_size parameter specifies the maximum amount of 
     state information that can be saved by a local endpoint, for each 
     remote endpoint with which it communicates. Note that the amount of  
     state information is expressed as a multiple of the parameter  
     UDVM_memory_size, because an item of state generally 
     reflects the contents of the UDVM memory. 
    
   Initial state 
    
     The application can store useful information in the form of state.  
     This predefined state is used to offer a range of well-known 
     decompression algorithms to the compressor, which can choose to 
     avoid uploading bytecode for a new algorithm if it supports one of 
     the well-known algorithms. Each item of initial state can be made 
     mandatory for every instance of the application, or it can be made 
     optional (in which case support for the relevant state will need to  
     be advertised before the state can be used). 
    
    
4.  SigComp message flow 
    
   This chapter describes the SigComp message flow and the operation of 
   the compressor and decompressor dispatcher. 
    
4.1.  Message exchange 
    
   The local SigComp layer may send compressed data to a remote SigComp 
   layer, and the local SigComp layer may also receive compressed data. 
   Note however that compression in one direction does not necessarily 
   imply compression in the reverse direction. Furthermore, even in the 
   case that there are two unidirectional compressed flows between two 


Price, Hannu, et al.                                           [Page 11] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   SigComp layers, there is no need to use the same compression 
   algorithm at both compressors. 
    
4.2.  SigComp message format 
    
   In every SigComp message the first few bytes are interpreted as a 
   state identifier that accesses some previously stored state 
   information. 
    
   This state information includes all of the data needed to decompress 
   the SigComp message: including the decompression algorithm that will 
   be applied to the remainder of the message, as well as any additional 
   information that is required (e.g. one or more previously received 
   messages if dynamic compression is in use). 
    
   The format of the basic SigComp message is given in Figure 4: 
    
     0   1   2   3   4   5   6   7 
   +---+---+---+---+---+---+---+---+ 
   | 1   1   1   1   1 |  length   | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   state_identifier (n-bytes)  :  
   |                               | 
   +---+---+---+---+---+---+---+---+ 
   |                               | 
   :   Remaining SigComp message   : 
   |                               | 
   +---+---+---+---+---+---+---+---+ 
    
    Figure 4: Basic SigComp message 
    
   The length field is a 3-bit value (MSBs before LSBs) that indicates 
   the length of the state identifier. The actual size n of the state 
   identifier is calculated as follows: 
    
                   n  =  minimum_hash_size + length - 1 
    
   The state identifier is then extracted from the SigComp message and 
   then executed as defined by the STATE-EXECUTE instruction of Chapter 
   9. 
    
   If the length value is set to 0 then no state is accessed; instead 
   the entire SigComp message is copied into the UDVM memory beginning 
   at Address 6, and then executed starting from Address 6. 
    
   All other addresses in the UDVM memory are initialized to 0. 
    
   Decompression failure occurs if the SigComp message is too short to 
   contain the expected state identifier, or if the requested state does 
   not exist. See Section 8.2 for further details. 

 
Price, Hannu, et al.                                           [Page 12] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
4.3.  Interfaces to and from the compressor dispatcher 
    
   When the application provides a message to be compressed, it MUST 
   also provide an "endpoint identity" that distinguishes the endpoint 
   from other endpoints. 
    
   The exact format of the endpoint identity is unimportant, provided 
   that distinct endpoints have distinct endpoint identities. 
    
   The SigComp layer contains one compressor for each remote endpoint 
   with which the local application is communicating; the dispatcher 
   forwards each new application message to the appropriate compressor 
   (invoking a new compressor if a new endpoint identity is 
   encountered). 
    
   Note that the application MUST indicate to the compressor dispatcher 
   when it no longer wishes to communicate with a particular endpoint, 
   so that the resources taken by the corresponding compressor can be 
   reclaimed. 
    
4.4.  Interfaces to and from the decompressor dispatcher 
    
   To ensure that SigComp can run over an unsecure transport layer, the 
   decompressor dispatcher invokes a new decompressor for each new 
   SigComp message. Resources for the decompressor are released as soon 
   as the message is decompressed. 
    
   Upon the arrival of a SigComp message the decompressor dispatcher 
   invokes an instance of the UDVM and loads it with the indicated state 
   as per Section 4.2. The message is then decompressed by the UDVM, 
   returned to the decompressor dispatcher, and passed on to the 
   receiving application. 
    
   Note that when the UDVM is invoked it does not receive any compressed 
   data by default, but instead requests new data explicitly using a 
   specific instruction. Therefore, the dispatcher is responsible for 
   buffering each SigComp message and passing the data to the UDVM when 
   it is requested. If the UDVM requests additional compressed data that 
   is not yet available then it pauses and waits until enough data has 
   been received by the dispatcher. 
    
   Uncompressed data is also outputted by the UDVM using a specific 
   instruction. Note that the UDVM has no awareness of whether the 
   underlying transport is message-based or stream-based, and so it 
   always outputs uncompressed data as a stream. It is the 
   responsibility of the dispatcher to provide the uncompressed message 
   to the application in the expected form (i.e. as a stream or as a set 
   of distinct, bounded messages). 
    

Price, Hannu, et al.                                           [Page 13] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   For a stream-based transport, the dispatcher delimits messages by 
   parsing the compressed data stream for instances of 0xFF and taking 
   the following actions: 
    
   Occurs in data stream:     Meaning: 
    
   0xFF 00                    one 0xFF byte in the data stream 
   0xFF 01                    same, but the next byte is quoted (could  
                              be another 0xFF) 
      :                                           : 
   0xFF 7F                    same, but the next 127 bytes are quoted 
   0xFF 80 to 0xFF FE         reserved 
   0xFF FF                    message boundary 
    
   The reserved characters are useful for byte stuffing (if a 
   compression algorithm generates compressed data containing the 
   character 0xFF then it should be replaced by the character 0xFF00 to 
   avoid accidentally inserting a message delimiter into the compressed 
   data stream). 
    
    
5.  SigComp compressor  
 
   An important feature of SigComp is that if two endpoints cannot agree 
   on a common algorithm with which to send and receive data, it is 
   possible for the compressor to upload bytecode for its own choice of 
   algorithm to the decompressor. In particular this means that it is 
   not necessary to force all compressors to use the same default 
   algorithm; instead each implementer has the freedom to pick one of 
   the predefined algorithms or to upload their own if needed. 
    
   The overall requirement placed on the compressor is that of 
   transparency, i.e. the compressor MUST NOT send bytecode which cause 
   the UDVM to incorrectly decompress a given message. 
    
   The following more specific requirements are also placed on the 
   compressor (they can be considered particular instances of the 
   transparency requirement): 
    
   *    It is RECOMMENDED that the compressor supply a CRC over the 
        uncompressed message to ensure that successful decompression has 
        occurred. A UDVM instruction is provided to verify this CRC. 
    
   *    If the transport is message-based then the compressor MUST 
        preserve the boundaries between messages. 
    
   *    If the transport is stream-based but the application defines its 
        own internal message boundaries, then the compressor SHOULD 
        preserve the boundaries between messages by using the "end-of-
        message" character 0xFFFF reserved by SigComp. 
    

Price, Hannu, et al.                                           [Page 14] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   *    The compressor MUST NOT exceed the maximum_compressed_size and  
        MUST ensure that the message can be decompressed using no more 
        than the resources available at the remote decompressor. 
    
   The reason for preserving the message boundaries over a stream-based 
   transport is that damage to one compressed message does not affect 
   the decompression of subsequent messages. Moreover, the application 
   typically vetoes state creation requests on a per-message basis. 
    
5.1.  Supplying bytecode to the UDVM 
    
   A compressor MUST be certain that compressed data can be decompressed 
   before the data is to be sent, i.e. the UDVM instructions for 
   decompression MUST be available at the remote decompressor. Several 
   options exist for ensuring that this bytecode is available: 
    
   1. Each SigComp message sent from the compressor contains the  
      necessary UDVM instructions for decompression. 
    
   2. By setting up a reliable connection, such as a TCP connection,  
      between a compressor and its remote decompressor the UDVM  
      instructions can be transferred and saved as state. 
    
   3. If there are predefined UDVM codes for well-known algorithms, a  
      compressor only needs to send the state identifier of that UDVM  
      decompression algorithm code to its remote decompressor. The  
      decompressor can then populate the UDVM locally.  
    
   In order to save delay for "time-critical" sessions, the UDVM 
   instructions should be uploaded prior to any initiation of "time-
   critical" sessions. 
    
5.2.  Compression failure 
    
   The compressor SHOULD make every effort to successfully compress an 
   application message, but in certain cases this might not be possible 
   (particularly if a low maximum_compressed_size has been set by the 
   application). In this case a "compression failure" is called.  
   Reasons for compression failure include the following: 
    
   *    A compressed or uncompressed message exceeds the maximum size 
        defined by the application. 
    
   *    The maximum_compressed_size is exceeded for a certain message. 
    
   *    Insufficient resources are available at the compressor or at the 
        remote decompressor. 
    
   If a compression failure occurs when compressing a message then the 
   compressor informs the dispatcher and takes no further action. The 


Price, Hannu, et al.                                           [Page 15] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   dispatcher MUST report this failure to the application. The 
   application may then try other methods to deliver the message. 
    
    
6.  State handling and state announcement 
    
   This chapter defines the behavior of the SigComp state handler. The 
   function of the state handler is to retain information between 
   successive SigComp messages; it is the only SigComp entity that is 
   capable of this function, and so it is of particular importance from 
   a security perspective. 
 
6.1.  Storing and retrieving state 
    
   To provide security against the malicious insertion or modification 
   of SigComp messages, the UDVM memory is reset after decompressing 
   each message. This ensures that damaged SigComp messages do not 
   prevent the successful decompression of subsequent valid messages. 
    
   Note however that the overall compression ratio is often 
   significantly higher if messages can be compressed relative to the 
   information stored in previous messages. For this reason it is 
   possible to create "state" information for access when a later 
   message is being decompressed. 
    
   Both the creation and access of state are designed to be secure 
   against malicious tampering with the compressed data. State can only 
   be created when a complete message has been successfully 
   decompressed, and the state handler MUST NOT save state without 
   permission from the application. 
    
   Upon receiving a decompressed message, the application may supply the 
   state handler with the identity of the sending endpoint. Supplying 
   this identity grants permission for the state handler to do the 
   following: 
    
   *    An item of state can be saved using the memory reserved for the 
        specified endpoint. 
    
   *    Announcement information can be taken into account 
        when sending SigComp messages to the specified endpoint. 
    
   This is especially useful if the application has an authentication 
   mechanism that can be applied to determine whether the decompressed 
   data is legitimate. 
    
   Also note that state is not deleted when it is accessed. So even if a 
   malicious user manages to access state information, subsequent 
   messages compressed relative to this state can still be successfully 
   decompressed. Instead, the state handler is responsible for deleting 


Price, Hannu, et al.                                           [Page 16] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   state information once it determines that the state will no longer be 
   needed. 
    
   Each item of state stores the following information: 
    
   Name:                      Type of data: 
    
   state_identifier           16-byte value 
   state start                2-byte value 
   state_instruction          2-byte value 
   state length               2-byte value 
   state_value                String of bytes 
    
   The state_identifier must be supplied to retrieve an item of state 
   from the state handler. State can be accessed using the UDVM 
   instructions STATE-REFERENCE and STATE-EXECUTE, and can be created 
   using the END-MESSAGE instruction. 
    
   The state_value is a byte string that contains the actual value that 
   is copied from/to the UDVM memory. The state_length specifies the 
   number of bytes contained within state_value, and state_start gives 
   the UDVM memory address to which the state_value is copied when it is 
   accessed. 
    
   Finally, state_instruction specifies the memory address of the next 
   UDVM instruction to execute when state is accessed. 
    
   The kind of information which is included in the state_value is up to 
   a particular compressor and the uploaded instructions in the remote 
   UDVM. However a compressor MUST NOT use a state that is not known to 
   be established at the remote decompressor. 
    
6.2. Saving and deleting states 
 
   The state handler for each endpoint is expected to offer memory to 
   store UDVM-created state. Every remote endpoint that wishes to 
   communicate with the local endpoint expects to be able to store a 
   fixed amount of state; the number of bytes that it can store is given 
   by the formula UDVM_memory_size * maximum_state_size. 
    
   Note that each item of state costs (state_length + 22) bytes to 
   store. 
    
   The state handler keeps track of which endpoint created each item of 
   state; when a particular endpoint exceeds its allocated memory limit 
   then sufficient items of state created by the same endpoint are 
   deleted (oldest state first) until enough memory is available to 
   accommodate the new state. 
    

Price, Hannu, et al.                                           [Page 17] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   The application MUST indicate to the state handler when it no longer 
   wishes to communicate with a particular endpoint, so that the 
   resources taken by the corresponding state can be reclaimed. 
     
6.3.  Announcement 
    
   The announcement information is used to modify the value of certain 
   application-defined parameters. Since these parameter values are 
   saved between SigComp messages, they are considered to be part of the 
   overall state and hence are supplied from the UDVM to the state 
   handler. 
    
   The following list of parameters is passed to the state handler using 
   the appropriate UDVM instruction (namely the END-MESSAGE 
   instruction): 
    
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |            length             |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |         UDVM_version          |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |       UDVM_memory_size        |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |        cycles_per_bit         |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |      cycles_per_message       |   
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   
         |              n                | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |          id_length 1          | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |                               | 
         :          id_value_1           : 
         |                               | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
                    :         : 
                    :         : 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |          id_length n          | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |                               | 
         :          id_value_n           : 
         |                               | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
         |                               | 
         :           reserved            : 
         |                               | 
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
         Figure 5: Announcement information 
    

Price, Hannu, et al.                                           [Page 18] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   If the application does not return a valid endpoint identifier then 
   the announcement information is automatically discarded by the state 
   handler. Otherwise it is passed to the compressor responsible for 
   sending messages to the given endpoint. 
    
   The reserved field allows for additional items of data to be added to 
   the announcement information in future. 
    
   Note that the length field specifies the total length of the 
   announcement information including the reserved field. As usual, MSBs 
   are stored preceding LSBs. 
    
   The remaining items of data are explained in greater detail below: 
    
6.3.1.  UDVM version 
    
   The next 2 bytes of the announcement information specify whether only 
   the basic version of the UDVM is available, or whether an upgraded 
   version of the UDVM is available offering additional instructions 
   etc. 
    
   The basic version of the UDVM is Version 0, which is the version 
   described in this document. Upgraded versions MUST be backwards-
   compatible with the basic version in the following sense: 
    
   *    If some UDVM bytecode reaches the END-MESSAGE or DECOMPRESSION-
        FAILURE instructions when running on Version 0 of the UDVM, then 
        the upgraded version MUST run the bytecode in an identical 
        manner. 
    
   This condition ensures that all bytecode that is valid for Version 0 
   of the UDVM will continue to be valid for upgraded versions of the 
   UDVM. However, bytecode that is invalid on Version 0 of the UDVM 
   (i.e. bytecode that produces a decompression failure that is not 
   manually triggered) may become valid on upgraded versions. 
    
   The simplest way to upgrade the UDVM in a backwards-compatible manner 
   is to add additional UDVM instructions, as this will not affect the 
   operation of existing UDVM bytecode. 
    
6.3.2.  Memory size and CPU cycles 
    
   The next 6 bytes of data specify new values for the application-
   defined parameters UDVM_memory_size, cycles_per_bit and 
   cycles_per_message. 
    
   Note that this data can only be used to increase the amount of 
   resources available at the remote UDVM. If the data specifies a 
   parameter value that is smaller than the value already possessed by 
   the state handler, the parameter keeps its original value (i.e. the 
   announcement data for this parameter is simply ignored). 

 
Price, Hannu, et al.                                           [Page 19] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   In particular, only allowing the parameter values to increase means 
   that the announcement mechanism is robust against message loss or 
   reordering. 
    
   The parameters can only be restored to their original values if reset 
   or renegotiated by the application. 
    
6.3.3.  State identifiers 
    
   The list of state identifiers indicates that the sending endpoint 
   supports one or more optional mechanisms (including well-known 
   decompression algorithms, dictionaries of common SIP phrases, 
   feedback mechanisms etc.). 
    
   The integer n specifies the number of state identifiers to follow. 
   The field id_length_j specifies the length in bytes of id_value_j, 
   where acceptable values for id_length_j range from 1 to 16 inclusive. 
   If a value outside this range is received then the subsequent state 
   identifiers are ignored by the state handler. 
    
   Each id_value_j indicates support for one optional mechanism at the 
   sending endpoint. The optional mechanisms themselves, and their 
   corresponding state identifiers, are beyond the scope of this 
   document. 
    
    
7.  Overview of the UDVM 
    
   Decompression functionality for SigComp is provided by a "Universal 
   Decompressor Virtual Machine" (UDVM). The UDVM is a virtual machine 
   much like the Java Virtual Machine but with a key difference: it is 
   designed solely for the purpose of running decompression algorithms. 
    
   The motivation for creating the UDVM is to provide unlimited 
   flexibility when choosing how to compress a given item of data. 
   Rather than picking one of a small number of pre-negotiated 
   compression algorithms, the implementer has the freedom to select an 
   algorithm of their choice. The compressed data is then combined with 
   a set of UDVM instructions that allow the original data to be 
   extracted, and the result is outputted as UDVM bytecode. 
    
   Since the UDVM is optimized specifically for running decompression 
   algorithms, the code size of a typical algorithm is small (often sub 
   100 bytes). Moreover the UDVM approach does not add significant extra 
   processing or memory requirements compared to running a fixed pre-
   programmed decompression algorithm. 
    
   This chapter describes some basic features of the UDVM, including the 
   well-known variables and instruction operands. 
    

Price, Hannu, et al.                                           [Page 20] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Recall that the amount of memory available to the UDVM is specified 
   by the application-defined parameter UDVM_memory_size. Any attempt to 
   read memory addresses beyond the overall memory size MUST cause a 
   decompression failure (see Section 8.2). 
    
7.1.  Well-known variables 
    
   The first few variables in the UDVM memory have special tasks, for 
   example specifying the location of the stack used by the CALL and 
   RETURN instructions. Each of these well-known variables is a 2-byte 
   integer. 
    
   The following list gives the name of each well-known variable and the 
   memory address at which the variable can be found: 
    
   Name:           Starting memory address: 
    
   byte_copy_left             0 
   byte_copy_right            2 
   stack_location             4 
    
   The MSBs of each variable are always stored before the LSBs. So, for 
   example, the MSBs of stack_location are stored at Address 4 whilst 
   the LSBs are stored at Address 5. 
    
   The use of each well-known variable is described in the following 
   sections of the document. 
    
7.2.  Instruction operands 
    
   Each of the UDVM instructions is followed by 0 or more bytes 
   containing the operands required by the instruction. 
    
   To reduce the code size of a typical UDVM program, each operand for a 
   UDVM instruction is compressed using variable-length encoding. The 
   aim is to store more common operand values using fewer bits than 
   rarely occurring values. 
    
   Three different types of operand are available: the literal, the 
   reference and the multitype. The operand types that follow each UDVM 
   instruction are specified in Chapter 9. 
    
   The UDVM bytecode for each operand type is illustrated in Figure 7 to 
   Figure 9, together with the integer values represented by the 
   bytecode. 
    
   Note that the MSBs in the bytecode are illustrated as preceding the 
   LSBs. Also, any string of bits marked with k consecutive "n"s is to 
   be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the 
   MSBs of n illustrated as preceding the LSBs). 
    

Price, Hannu, et al.                                           [Page 21] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   The decoded integer value of the bytecode can be interpreted in two 
   ways. In some cases it is taken to be the actual value of the 
   operand. In other cases it is taken to be a memory address at which 
   the 2-byte operand value can be found (MSBs found at the specified 
   address, LSBs found at the following address). The latter case is 
   denoted by memory[X] where X is the address and memory[X] is the 2-
   byte value starting at Address X. 
    
   The simplest operand type is the literal (#), which encodes a 
   constant integer from 0 to 65535 inclusive. A literal operand may 
   require between 1 and 3 bytes depending on its value. 
    
   Bytecode:                  Operand value:      Range: 
    
   0nnnnnnn                        N                   0 - 127 
   10nnnnnn nnnnnnnn               N                   0 - 16383 
   11000000 nnnnnnnn nnnnnnnn      N                   0 - 65535 
    
               Figure 7: Bytecode for a literal (#) operand 
    
   The second operand type is the reference ($), which is always used to 
   access a 2-byte value located elsewhere in the UDVM memory. The 
   bytecode for a reference operand is decoded to be a constant integer 
   from 0 to 65535 inclusive, which is interpreted as the memory address 
   containing the actual value of the operand. 
    
   Note that reference operands can always take values from 0 to 65535 
   inclusive, as they reference 2-byte values. 
    
   Bytecode:                  Operand value:      Range: 
    
   0nnnnnnn                        memory[2 * N]       0 - 65535 
   10nnnnnn nnnnnnnn               memory[2 * N]       0 - 65535 
   11000000 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535 
    
              Figure 8: Bytecode for a reference ($) operand 
    
   The third kind of operand is the multitype (%), which can be used to 
   encode both actual values and memory addresses. The multitype operand 
   also offers efficient encoding for small integer values (both 
   positive and negative) and for powers of 2. 
    
   Bytecode:                  Operand value:      Range: 
    
   00nnnnnn                        N                   0 - 63 
   01nnnnnn                        memory[2 * N]       0 - 65535 
   1000011n                        2 ^ (N + 6)        64 , 128 
   10001nnn                        2 ^ (N + 8)    256 , ... , 32768 
   111nnnnn                        N + 65504       65504 - 65535 
   1001nnnn nnnnnnnn               N + 61440       61440 - 65535 
   101nnnnn nnnnnnnn               N                   0 - 8191 

 
Price, Hannu, et al.                                           [Page 22] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   110nnnnn nnnnnnnn               memory[N]           0 - 65535 
   10000000 nnnnnnnn nnnnnnnn      N                   0 - 65535 
   10000001 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535 
    
              Figure 9: Bytecode for a multitype (%) operand 
    
7.3.  Byte copying 
    
   A number of UDVM instructions require a string of bytes to be copied 
   to and from areas of the UDVM memory. This section defines how the 
   byte copying operation should be performed. 
    
   In general, the string of bytes is copied in ascending order of 
   memory address. So if a byte is copied from/to Address n then the 
   next byte is copied from/to Address n + 1. As usual, if a byte is 
   read from an address beyond the overall memory size then 
   decompression failure occurs. 
    
   Note however that if a byte is copied from/to the memory address 
   specified in byte_copy_right, the byte copy operation continues by 
   copying the next byte from/to the memory address specified in 
   byte_copy_left. This is useful for setting up a "circular buffer" 
   within the UDVM memory. 
    
   Note that the string of bytes is copied on a purely byte-by-byte 
   basis. In particular, some of the later bytes to be copied may 
   themselves have been written into the UDVM memory by the byte copying 
   operation currently being performed. 
    
   Equally, it is possible for a byte copying operation to overwrite the 
   instruction that called the byte copy. If this occurs then the byte 
   copying operation MUST be completed as if the original instruction 
   were still in place in the UDVM memory (this also applies if 
   byte_copy_left or byte_copy_right are overwritten). 
    
    
8.  Decompressing a SigComp message 
    
   This chapter lists the steps involved in the decompression of a 
   single SigComp message. 
    
8.1.  Invoking the UDVM 
    
   Whenever the dispatcher receives a message to be decompressed, it 
   invokes a new instance of the UDVM. The UDVM_memory_size is 
   initialized using the corresponding application-defined parameter. 
   The following steps are then taken: 
    
   1.)   The number of remaining CPU cycles is set equal to the 
   application-defined parameter cycles_per_message. 
    

Price, Hannu, et al.                                           [Page 23] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Notes: 
    
   The amount of compressed data available to the UDVM is exactly one 
   compressed message. If the transport is stream-based then SigComp 
   uses the reserved byte string 0xFFFF to delimit the compressed 
   messages: the dispatcher takes the data between a pair of neighboring 
   reserved byte strings to be a single compressed message. The reserved 
   byte string itself is not considered to be part of the compressed 
   message. 
    
   The compressed data is not provided to the UDVM by default. Instead, 
   the UDVM requests compressed data using the INPUT instructions 
   (useful when running over a stream-based transport since there is no 
   need to wait for the entire compressed message before decompression 
   can begin). 
    
   The dispatcher MUST NOT make more than one compressed message 
   available to a given instance of the UDVM. In particular, the 
   dispatcher MUST NOT concatenate two messages to form a single 
   compressed message. This is because compressed messages are typically 
   padded with trailing zero bits so that they are a whole number of 
   bytes long. Concatenating two messages would cause these padding bits 
   to be incorrectly interpreted as compressed data. 
    
   2.)   Next, the instructions contained within the UDVM memory are 
   executed beginning at the address specified by the state as per 
   Section 4.2. 
    
   Notes:    
    
   The instructions are executed consecutively unless otherwise 
   indicated (for example when the UDVM encounters a JUMP instruction). 
    
   If the next instruction to be executed lies outside the available 
   memory then decompression failure occurs (see Section 8.2). 
    
   3.)   Each time an instruction is executed the number of available 
   CPU cycles is decreased by the amount specified in Chapter 9. 
   Additionally, if the UDVM requests n bits of compressed data (using 
   one of the INPUT instructions) then the number of available CPU 
   cycles is increased by n * cycles_per_bit. 
    
   Notes: 
    
   This means that the total number of CPU cycles available for 
   processing a compressed message is given by the formula: 
    
    maximum_cycles = cycles_per_message + message_size * cycles_per_bit 
    
   The reason that this total is not allocated to the UDVM when it is 
   invoked is that the UDVM can begin to decompress a message that has 

 
Price, Hannu, et al.                                           [Page 24] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   only been partially received. So the total message size may not be 
   known when the UDVM is initialized. 
    
   4.)   The UDVM stops executing instructions when it encounters an 
   END-MESSAGE instruction or if decompression failure occurs. 
    
   Notes: 
    
   The UDVM passes uncompressed data to the dispatcher using the OUTPUT 
   instruction. The OUTPUT instruction can be used to output a partially 
   decompressed message; it is a dispatcher decision whether to use the 
   data immediately or whether to buffer and wait until the entire 
   message has been decompressed. 
    
   The UDVM passes state creation requests to the state handler using 
   the END-MESSAGE instruction. This means that it is only possible to 
   make a state creation request once the message has been decompressed, 
   which is necessary since the application typically determines the 
   validity of these requests based on the contents of the decompressed 
   message. 
    
8.2.  Decompression failure 
    
   If a compressed message given to the UDVM is corrupted (either 
   accidentally or maliciously) then the UDVM may terminate with a 
   decompression failure. 
    
   Reasons for decompression failure include the following: 
    
   *    A compressed or uncompressed message exceeds the maximum size 
        defined by the application. 
    
   *    The UDVM exceeds the available CPU cycles for decompressing a 
        message. 
    
   *    The UDVM attempts to read a memory address beyond the overall 
        memory size. 
    
   *    An unknown instruction type is encountered. 
    
   *    An unknown operand type is encountered. 
    
   *    An instruction is encountered that cannot be processed 
        successfully by the UDVM (for example a RETURN instruction when 
        no CALL instruction has previously been encountered). 
    
   *    The UDVM attempts to access non-existent state. 
    
   *    A manual decompression failure is triggered using the 
        DECOMPRESSION-FAILURE instruction. 
    

Price, Hannu, et al.                                           [Page 25] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   If a decompression failure occurs when decompressing a message then 
   the UDVM informs the dispatcher and takes no further action. It is 
   the responsibility of the dispatcher to decide how to cope with the 
   decompression failure. In general a dispatcher SHOULD discard the 
   compressed message and any decompressed data that has been outputted. 
    
9.  UDVM instruction set 
    
   The UDVM currently understands 30 instructions, chosen to support the 
   widest possible range of compression algorithms with the minimum 
   possible overhead. 
    
   Figure 10 lists the different instructions and the bytecode values 
   used to store the instructions at the UDVM. The cost of each 
   instruction in CPU cycles is also given: 
    
   Instruction:     Bytecode value:   Cost in CPU cycles: 
    
   DECOMPRESSION-FAILURE     0          1 
   AND                       1          1 
   OR                        2          1 
   NOT                       3          1 
   ADD                       4          1 
   SUBTRACT                  5          1 
   MULTIPLY                  6          1 
   DIVIDE                    7          1 
   SORT-ASCENDING            8          1 + k * ceiling(log2(k)) 
   SORT-DESCENDING           9          1 + k * ceiling(log2(k)) 
   MD5                       10         1 + length 
   LOAD                      11         1 
   MULTILOAD                 12         1 + n 
   COPY                      13         1 + length 
   COPY-LITERAL              14         1 + length 
   COPY-OFFSET               15         1 + length + offset 
   JUMP                      16         1 
   COMPARE                   17         1 
   CALL                      18         1 
   RETURN                    19         1 
   SWITCH                    20         1 + n 
   CRC                       21         1 + length 
   END-MESSAGE               22         1 + state length 
   OUTPUT                    23         1 + output_length 
   NBO                       24         1 
   INPUT-BYTECODE            25         1 + length 
   INPUT-FIXED               26         1 
   INPUT-HUFFMAN             27         1 + n 
   STATE-REFERENCE           28         1 + state_length 
   STATE-EXECUTE             29         1 + state length 
    
      Figure 10: UDVM instructions and corresponding bytecode values 
    

Price, Hannu, et al.                                           [Page 26] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Each UDVM instruction costs a minimum of 1 CPU cycle. Certain high-
   level instructions may cost additional cycles depending on the value 
   of one of the instruction operands. 
    
   The only exception when calculating the number of CPU cycles is that 
   the STATE-EXECUTE instruction takes (1 + state_length) cycles even 
   though it does not have a state_length operand; instead the value of 
   state length is provided by the state handler as part of the state 
   being accessed. 
    
   All instructions are stored as a single byte to indicate the 
   instruction type, followed by 0 or more bytes containing the operands 
   required by the instruction. The instruction specifies which of the 
   three operand types of Section 7.2 is used in each case. For example, 
   the ADD instruction is followed by two operands as shown below: 
    
   ADD ($operand_1, %operand_2) 
    
   When converted into bytecode the number of bytes required by the ADD 
   instruction depends on the size of each operand value, and whether 
   the second (multitype) operand contains the operand value itself or a 
   memory address where the actual value of the operand can be found. 
    
   The instruction set available for the UDVM offers a mix of low-level 
   and high-level instructions. The high-level instructions can all be 
   emulated using the low-level instructions provided, but given a 
   choice it is generally preferable to use a single instruction rather 
   than a large number of general-purpose instructions. The resulting 
   bytecode will be more compact (leading to a higher overall 
   compression ratio) and decompression will typically be faster because 
   the implementation of the compression-specific instructions can be 
   optimized for the UDVM. 
    
   Each instruction is explained in more detail below: 
    
9.1.  Mathematical instructions 
    
   The following instructions provide a number of mathematical 
   operations including bit manipulation, arithmetic and sorting. 
    
9.1.1.  Bit manipulation 
    
   The AND, OR and NOT instructions provide simple bit manipulation on 
   2-byte words. 
    
   AND ($operand_1, %operand_2) 
   OR ($operand_1, %operand_2) 
   NOT ($operand_1) 
    
   After the operation is complete, the value of the first operand is 
   overwritten with the result. Note that since this operand is a 

 
Price, Hannu, et al.                                           [Page 27] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   reference, the memory address specified by the operand is always 
   overwritten and not the operand itself. 
    
9.1.2.  Arithmetic 
    
   The ADD, SUBTRACT, MULTIPLY and DIVIDE instructions perform 
   arithmetic on 2-byte words. 
    
   ADD ($operand_1, %operand_2) 
   SUBTRACT ($operand_1, %operand_2) 
   MULTIPLY ($operand_1, %operand_2) 
   DIVIDE ($operand_1, %operand_2) 
    
   After the operation is complete, the first operand is overwritten 
   with the result. 
    
   Note that in all cases the arithmetic operation is performed modulo 
   2^16. So for example, subtracting 1 from 0 gives the result 65535. 
    
   For the SUBTRACT instruction the second operand is subtracted from 
   the first. Similarly, for the DIVIDE instruction the first operand is 
   divided by the second operand. Note that if the second operand does 
   not divide exactly into the first operand then the remainder is 
   ignored. 
    
9.1.3.  Sorting 
    
   The SORT-ASCENDING and SORT-DESCENDING instructions sort lists of 2-
   byte words. 
    
   SORT-ASCENDING (%start, %n, %k) 
   SORT-DESCENDING (%start, %n, %k) 
    
   The start operand specifies the starting memory address of the block 
   of data to be sorted. 
   The block of data itself is divided into n lists each containing k 
   words. The SORT-ASCENDING instruction applies a certain permutation 
   to the lists, such that the first list is sorted into ascending order 
   (treating each data word as an integer). The same permutation is 
   applied to all n lists, so lists other than the first will not 
   necessarily be sorted into order. 
    
   For example, the first list might contain a set of integers to be 
   sorted, whilst the second list might be used to keep track of the 
   integers: 
    
      Before sorting              After sorting 
    
   List 1        List 2        List 1        List 2 
    
      8             1             1             2 

 
Price, Hannu, et al.                                           [Page 28] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
      1             2             1             3 
      1             3             3             4 
      3             4             8             1 
    
   In the case of two words of data with the same value, the original 
   ordering of the list is preserved. 
    
   The SORT-DESCENDING instruction behaves as above, except that the 
   first list is sorted into descending order. 
    
9.1.4.  MD5 
    
   The MD5 instruction calculates an MD5 hash over the specified area of 
   UDVM memory. 
    
   MD5 (%position, %length, %destination) 
    
   The position and length operands define the string of bytes over 
   which the MD5 hash is calculated. Byte copying rules are enforced as 
   per Section 7.3. 
    
   The destination operand gives the starting address to which the 
   resulting 16-byte hash will be copied. 
    
9.2.  Memory management instructions 
    
   The following instructions are used to manipulate the UDVM memory. 
   Bytes can be copied from one area of memory to another, and areas of 
   memory can be write-protected to make it easier for UDVM code to be 
   compiled. 
    
9.2.1.  LOAD 
    
   The LOAD instruction sets a 2-byte variable to a certain specified 
   value. The format of a LOAD instruction is as follows: 
    
   LOAD (%address, %value) 
    
   The first operand specifies the starting address of the 2-byte 
   variable, whilst the second operand specifies the value to be loaded 
   into this variable. As usual, MSBs are stored before LSBs in the UDVM 
   memory. 
    
9.2.2.  MULTILOAD 
    
   The MULTILOAD instruction sets a contiguous block of 2-byte variables 
   to specified values. 
    
   MULTILOAD (%address, #n, %value_0, ..., %value_n-1) 
   The first operand specifies the starting address of the contiguous 
   variables, whilst the operands value_0 through to value_n-1 specify 

 
Price, Hannu, et al.                                           [Page 29] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   the values to load into these variables (in the same order as they 
   appear in the instruction). 
    
9.2.3.  COPY 
    
   The COPY instruction is used to copy a string of bytes from one part 
   of the UDVM memory to another. 
    
   COPY (%position, %length, %destination) 
    
   The position operand specifies the memory address of the first byte 
   in the string to be copied, and the length operand specifies the 
   number of bytes to be copied. 
    
   The destination operand gives the address to which the first byte in 
   the string will be copied. 
    
   Note that byte copying is performed as per the rules of Section 7.3. 
    
9.2.4.  COPY-LITERAL 
    
   A modified version of the COPY instruction is given below: 
    
   COPY-LITERAL (%position, %length, $destination) 
    
   The COPY-LITERAL instruction behaves as a COPY instruction except 
   that after copying, the destination operand is replaced with the 
   memory address immediately following the address to which the final 
   byte was copied. If the final byte was copied to the memory address 
   specified in byte_copy_right, the destination operand is set to the 
   memory address specified in byte_copy_left. 
    
9.2.5.  COPY-OFFSET 
    
   A further version of the COPY-LITERAL instruction is given below: 
    
   COPY-OFFSET (%offset, %length, $destination) 
    
   The COPY-OFFSET instruction behaves as a COPY-LITERAL instruction 
   except that an offset operand is given instead of a position operand. 
    
   To derive a suitable position operand, starting at the memory address 
   specified by destination, the UDVM counts backwards a total of offset 
   memory addresses. If the memory address specified in byte_copy_left 
   is reached, the next memory address is taken to be byte_copy_right. 
    
   The COPY-OFFSET instruction then behaves as a COPY-LITERAL 
   instruction, taking the position operand to be the last memory 
   address reached in the above step. 
    
    
Price, Hannu, et al.                                           [Page 30] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
9.3.  Program flow instructions 
    
   The following instructions alter the flow of UDVM code. Each 
   instruction jumps to one of a number of memory addresses based on a 
   certain specified criterion. Note that all of the instructions give 
   the memory addresses in the form of deltas relative to the memory 
   address of the instruction. The actual memory address is calculated 
   as follows: 
    
   memory_address = (memory_address_of_instruction + delta) modulo 2^16 
    
   Note that certain I/O instructions (see Section 9.4) can also alter 
   program flow. 
    
9.3.1.  JUMP 
    
   The JUMP instruction moves program execution to the specified memory 
   address. 
    
   JUMP (%delta) 
    
   Note that if the address (specified as a delta from the address of 
   the JUMP instruction) lies beyond the overall UDVM memory size then 
   decompression failure occurs. 
    
9.3.2.  COMPARE 
    
   The COMPARE instruction compares two operands and then jumps to one 
   of three specified memory addresses depending on the result. 
    
   COMPARE (%operand_1, %operand_2, %delta_1, %delta_2, %delta_3) 
    
   If operand_1 < operand_2 then the UDVM continues instruction 
   execution at the (relative) memory address specified by delta 1. If 
   operand_1 = operand_2 then it jumps to the address specified by 
   delta_2. If operand_1 > operand_2 then it jumps to the address 
   specified by delta_3. 
    
9.3.3.  CALL and RETURN 
    
   The CALL and RETURN instructions provide support for compression 
   algorithms with a nested structure. 
    
   CALL (%delta) 
    
   RETURN 
    
   The CALL and RETURN instructions make use of a stack of 2-byte 
   variables stored at the memory address specified by the well-known 
   variable stack_location. The stack contains the following variables: 
    

Price, Hannu, et al.                                           [Page 31] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Name:           Starting memory address: 
    
   stack_free            stack_location 
   stack[0]              stack_location + 2 
   stack[1]              stack_location + 4 
   stack[2]              stack_location + 6 
      :                       : 
    
   The MSBs of these variables are stored before the LSBs in the UDVM 
   memory. 
    
   When the UDVM reaches a CALL instruction, it finds the memory address 
   of the instruction immediately following the CALL instruction and 
   copies this 2-byte value into stack[stack_free] ready for later 
   retrieval. It then increases stack_free by 1 and continues 
   instruction execution at the (relative) memory address specified by 
   the operand. 
    
   When the UDVM reaches a RETURN instruction it decreases stack_free by 
   1, and then continues instruction execution at the byte position 
   stored in stack[stack_free]. 
    
   If the variable stack_free is ever increased beyond 65535 or 
   decreased below 0 then a bad compressed message has been received and 
   decompression failure occurs (see Section 8.2). 
    
   Decompression failure also occurs if one of the above instructions is 
   encountered and the value of stack_location is smaller than 6 (this 
   prevents the stack from overwriting the well-known variables). 
    
9.3.4.  SWITCH 
    
   The SWITCH instruction performs a conditional jump based on the value 
   of one of its operands. 
    
   SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1) 
    
   When a SWITCH instruction is encountered the UDVM reads the value of 
   j. It then continues instruction execution at the (relative) address 
   specified by delta j. 
    
   If j specifies a value of n or more, a bad compressed message has 
   been received and decompression failure occurs. 
    
9.3.5.  CRC 
    
   The CRC instruction verifies a string of bytes using a 2-byte CRC. 
    
   CRC (%value, %position, %length, %delta) 
    

Price, Hannu, et al.                                           [Page 32] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   The actual CRC calculation is performed using the generator 
   polynomial x^16 + x^12 + x^5 + 1, which coincides with the 2-byte 
   Frame Check Sequence (FCS) of [RFC-1662]. 
    
   The position and length operands define the string of bytes over 
   which the CRC is evaluated. Byte copying rules are enforced as per 
   Section 7.3. 
    
   Important note: Since a CRC calculation is always performed over a 
   bitstream, for interoperability it is necessary to define the order 
   in which bits are supplied within each individual byte. In this case 
   the MSBs of the byte MUST be supplied to the CRC calculation before 
   the LSBs. 
    
   The value operand contains the expected integer value of the 2-byte 
   CRC. If the calculated CRC matches the expected value then the UDVM 
   continues at the following instruction. Otherwise the UDVM jumps to 
   the (relative) memory address specified by delta. 
    
9.4.  I/O instructions 
    
   The following instructions allow the UDVM to interface with its 
   environment. Note that in the overall SigComp architecture all of 
   these interfaces pass to the decompressor dispatcher or to the state 
   handler. 
    
9.4.1.  END-MESSAGE 
    
   The END-MESSAGE instruction successfully terminates the UDVM and 
   passes state information to the state handler. 
    
   END-MESSAGE (%hash_length, %state_start, %state_length, 
   %state_instruction, %announcement_location) 
    
   Note that the actual uncompressed message is outputted separately 
   using the OUTPUT instruction; this conserves memory at the UDVM 
   because there is no need to buffer an entire uncompressed message 
   before it can be passed to the dispatcher. 
    
   Note that if the announcement_location operand is set to 0 then no 
   announcement information is provided, otherwise it points to the 
   starting memory address of the announcement information as per 
   Section 6.3. 
    
   The END-MESSAGE instruction requests the creation of state using the 
   operands state start and state length, which together denote a byte 
   string state_value. Provided that the application gives permission, 
   state_value is byte copied from the UDVM memory (obeying the rules of 
   Section 7.3) and stored together with a 16-byte state identifier that 
   can be used to access the state by a later compressed message. 
    

Price, Hannu, et al.                                           [Page 33] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   To provide security against malicious access, the identifier for any 
   item of state created by the UDVM is derived from the [MD5] hash of 
   the state_value to be stored. The state identifier is constructed by 
   taking the 16-byte [MD5] hash and replacing all but the first 
   hash_length most significant bytes with zeroes. Note that if 
   hash_length is 16 then the unmodified [MD5] hash is the state 
   identifier. Decompression failure occurs if hash_length is less than 
   the application-defined parameter minimum_hash_size or greater than 
   16. 
    
   If a state identifier already exists (hash collision occurs), the 
   decompressor should check whether the requested state is identical to 
   the established state, and count the state creation request as 
   successful if this is the case. 
    
   If not then the state creation request is unsuccessful. The existing 
   state MUST NOT be replaced with the requested state to be saved. This 
   is to avoid the situation where a compressed message cannot be 
   decompressed because a needed item of state has been replaced 
   (possibly by a malicious sender). 
    
9.4.2.  DECOMPRESSION-FAILURE 
    
   The DECOMPRESSION-FAILURE instruction triggers a manual decompression 
   failure. This is useful if the UDVM program discovers that it cannot 
   successfully decompress the message (e.g. by using the CRC 
   instruction). 
    
   This instruction has no operands. 
    
9.4.3.  OUTPUT 
    
   The OUTPUT instruction provides successfully decompressed data to the 
   dispatcher. 
    
   OUTPUT (%output_start, %output_length) 
    
   The operands define the starting memory address and length of the 
   byte string to be provided to the dispatcher. Note that the OUTPUT 
   instruction can be used to output a partially decompressed message; 
   each time the instruction is encountered it appends a byte string to 
   the end of the data previously passed to the dispatcher via the 
   OUTPUT instruction. 
    
   The string of data is byte copied from the UDVM memory obeying the 
   rules of Section 7.3. 
    
   Decompression failure occurs if the cumulative number of bytes 
   provided to the dispatcher exceeds the application-defined parameter 
   maximum_uncompressed_size. 
    

Price, Hannu, et al.                                           [Page 34] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Since there is technically a difference between outputting a 0-byte 
   decompressed message, and not outputting a decompressed message at 
   all, the OUTPUT instruction needs to distinguish between the two 
   cases. Thus, if the UDVM terminates before encountering an OUTPUT 
   instruction it is considered not to have outputted a decompressed 
   message. If it encounters one or more OUTPUT instructions, each of 
   which provides 0 bytes of data to the dispatcher, then it is 
   considered to have outputted a 0-byte decompressed message. 
    
9.4.4.  NBO 
    
   The NBO instruction modifies the order in which compressed bits are 
   passed to the UDVM. 
    
   As the INPUT-FIXED and INPUT-HUFFMAN instructions read individual 
   bits from within a byte, to avoid ambiguity it is necessary to define 
   the order in which these bits are read. The default operation is to 
   read the MSBs before the LSBs, but if the NBO instruction is 
   encountered then the LSBs are read before the MSBs. Both cases are 
   illustrated below: 
    
    MSB         LSB MSB         LSB     MSB         LSB MSB         LSB 
    
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |0 1 2 3 4 5 6 7|8 9 ...        |   |7 6 5 4 3 2 1 0|        ... 9 8| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
        Byte 0          Byte 1              Byte 0          Byte 1 
    
           Default operation            After NBO instruction 
    
   The NBO instruction can only be used before bitwise compressed data 
   is passed to the UDVM. Therefore, a decompression failure occurs if 
   it is encountered after an INPUT-FIXED or an INPUT-HUFFMAN 
   instruction has been used. 
    
9.4.5.  INPUT-BYTECODE 
    
   The INPUT-BYTECODE instruction requests a certain number of bytes of 
   compressed data from the dispatcher. 
    
   INPUT-BYTECODE (%length, %destination, %delta) 
    
   The length operand indicates the requested number of bytes of 
   compressed data, and the destination operand specifies the starting 
   memory address to which they should be copied. Byte copying is 
   performed as per the rules of Section 7.3. 
    
   If the instruction requests data that lies beyond the end of the 
   compressed message, no data is returned. Instead the UDVM moves 


Price, Hannu, et al.                                           [Page 35] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   program execution to the memory address specified by the formula 
   (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16. 
    
   The INPUT-BYTECODE instruction can only be used before bitwise 
   compressed data is passed to the UDVM. Therefore, a decompression 
   failure occurs if it is encountered after an INPUT-FIXED or an INPUT-
   HUFFMAN instruction has been used. 
    
9.4.6.  INPUT-FIXED 
    
   The INPUT-FIXED instruction requests a certain number of bits of 
   compressed data from the dispatcher. 
    
   INPUT-FIXED (%length, %destination, %delta) 
    
   The length operand indicates the requested number of bits. If this 
   operand does not lie between 1 and 16 inclusive then a decompression 
   failure occurs. 
    
   The destination operand specifies the memory address to which the 
   compressed data should be copied. Note that the requested bits are 
   interpreted as a 2-byte integer ranging from 0 to 2^length - 1. Under 
   default operation the MSBs of this integer are provided first, but if 
   an NBO instruction has been executed then the LSBs are provided 
   first. 
    
   If the instruction requests data that lies beyond the end of the 
   compressed message, no data is returned. Instead the UDVM moves 
   program execution to the memory address specified by the formula 
   (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16. 
    
9.4.7.  INPUT-HUFFMAN 
    
   The INPUT-HUFFMAN instruction requests a variable number of bits of 
   compressed data from the dispatcher. The instruction initially 
   requests a small number of bits and compares the result against a 
   certain criterion; if the criterion is not met then additional bits 
   are requested until the criterion is achieved. 
    
   The INPUT-HUFFMAN instruction is followed by three mandatory operands 
   plus n additional sets of operands. Every additional set contains 
   four operands as shown below: 
    
   INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1, 
   %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n, 
   %upper_bound_n, %uncompressed_n) 
    
   Note that if n = 0 then the INPUT-HUFFMAN instruction is ignored by 
   the UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then 
   decompression failure occurs. 
    

Price, Hannu, et al.                                           [Page 36] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   In all other cases, the behavior of the INPUT-HUFFMAN instruction is 
   defined below: 
    
   1.)   Set j = 1. 
    
   2.)   Request an additional bits_j compressed bits. Interpret the 
   total (bits_1 + ... + bits_j) bits of compressed data requested so 
   far as an integer H, with the first bit to be supplied as the MSB and 
   the last bit to be supplied as the LSB (note that this is always the 
   case, independently of whether the NBO instruction has been used). 
    
   3.)   If data is requested that lies beyond the end of the compressed 
   message, terminate the INPUT-HUFFMAN instruction and move program 
   execution to the memory address specified by the formula 
   (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16. 
    
   4.)   If (H < lower_bound_j) or (H > upper_bound_j) then set j = j +  
   1. Then go back to Step 2, unless j > n in which case decompression 
   failure occurs. 
    
   5.)   Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 to the 
   memory address specified by the destination operand. 
    
9.4.8.  STATE-REFERENCE 
    
   The STATE-REFERENCE instruction retrieves some previously stored 
   state information. 
    
   STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length, 
   %state_destination) 
    
   The id_start and id_length operands specify the location of the state 
   identifier used to retrieve the state information. The state 
   identifier is always 16 bytes long; if id_length is less than 16 then 
   the remaining least significant bytes of the identifier are padded 
   with zeroes. 
    
   Decompression failure occurs if id_length is greater than 16. 
   Decompression failure also occurs if no state information matching 
   the state identifier can be found. 
    
   Note that when accessing state information that has been previously 
   created by the UDVM, the state identifier is always taken from an 
   [MD5] hash of the state to be retrieved. However this is not 
   necessarily the case for application-defined state as per Section 
   3.2. 
    
   The state_start and state_length operands define the starting byte 
   and number of bytes to copy from the state_value contained in the 
   identified item of state. If more state is requested than is actually 
   available then decompression failure occurs. 

 
Price, Hannu, et al.                                           [Page 37] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   The state_destination operand contains a UDVM memory address. The 
   requested state is byte copied to this memory address using the rules 
   of Section 7.3. 
    
9.4.9.  STATE-EXECUTE 
    
   The STATE-EXECUTE instruction retrieves and runs some previously 
   stored state information. 
    
   STATE-EXECUTE (%id_start, %id_length) 
    
   The id_start and id_length operands function as per the STATE-
   REFERENCE instruction. 
    
   STATE-EXECUTE is similar to STATE-REQUEST except that it does not 
   require the amount of state being requested or the proposed 
   destination for the state to be specified explicitly. Instead, it 
   simply puts the state_value back into the UDVM memory using the 
   operands state_start and state_length contained as part of the state 
   information. 
    
   The entire state_value (all state length bytes of it) is byte copied 
   into the memory address specified by state start. The UDVM then jumps 
   to the (absolute) memory address specified by state_instruction. 
    
   Note that state start, state length and state_instruction are all 
   stored together with state_value as part of an item of state 
   information. 
    
    
10. Security considerations 
    
10.1.  Security goals 
    
   The overall security goal of the SigComp architecture is to not 
   create risks that are in addition to those already present in the 
   application protocols. There is no intention for SigComp to enhance 
   the security of the protocols, as it always can be circumvented by 
   not using compression. More specifically, the high-level security 
   goals can be described as: 
    
   -- do not worsen security of existing application protocol 
    
   -- do not create any new security issues 
    
   -- do not hinder deployment of application security 
    
    
Price, Hannu, et al.                                           [Page 38] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
10.2.  Security risks and mitigations 
    
   This subsection identifies the potential security risks associated 
   with the overall SigComp architecture, and details the proposed 
   solution for each risk. 
    
    
   ** Confidentiality risks 
    
   *** Attacking SigComp by snooping into state of other users 
    
   State can only be accessed using a state identifier, which is a 
   (prefix of a) cryptographic hash of the state being referenced. This 
   implies that the referencing packet already needs knowledge about the 
   state. To enforce this, a reference length of 72 bits is defined. 
   This also minimizes the probability of an accidental state collision. 
    
   Generally, ways to obtain knowledge about the state identifier (e.g., 
   passive attacks) will also easily provide knowledge about the state 
   referenced, so no new vulnerability results. 
    
   The application needs to handle state identifiers with the same care 
   it would handle the state itself. 
    
   ** Integrity risks 
    
   The SigComp approach assumes that there is appropriate integrity 
   protection below and/or above the SigComp layer. However, the state 
   establishment mechanism provides additional potential to compromise 
   the integrity of the messages (which, however, would most likely be 
   detectable at the application layer). 
    
   *** Attacking SigComp by faking state or making unauthorized changes 
   to state 
    
   State cannot be destroyed or changed by a malicious sender -- it can 
   only add new state. Faking state is only possible if the hash allows 
   intentional collision. 
    
   ** Availability risks (avoid DoS vulnerabilities) 
    
   *** Use of SigComp as a tool in a DoS attack to another target 
    
   SigComp cannot easily be used as an amplifier in a reflection attack, 
   as it only generates one decompressed message per incoming compressed 
   message. This packet is then handed to the application; the utility 
   as a reflection amplifier is therefore limited by the utility of the 
   application. 
    
   However, it must be noted that SigComp can be used to generate larger 
   packets as input to the application than have to be sent from the 

 
Price, Hannu, et al.                                           [Page 39] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   malicious sender; this therefore can send smaller packets (at a lower 
   bandwidth) than are delivered to the application. Depending on the 
   reflection characteristics of the application, this can be considered 
   a mild form of amplification. The application MUST limit the number 
   of packets reflected to a potential target -- even if SigComp is used 
   to generate a large amount of information from a small incoming 
   attack packet. 
   *** Attacking SigComp as the DoS target by filling it with state 
    
   Excessive state can only be installed by a malicious sender (or a set 
   of malicious senders) with the consent of the application. The system 
   consisting of SigComp and application is thus approximately as 
   vulnerable as the application itself, unless it allows the 
   installation of state from a message where it would not have 
   installed state itself. 
    
   If this is desirable to increase the compression ratio, the effect 
   can be mitigated by adding feedback at the application level that 
   indicates whether the state requested was actually installed -- This 
   allows a system under attack to gracefully degrade by no longer 
   installing compressor state that is not matched by application state. 
    
   *** Attacking the UDVM by faking state or making unauthorized changes 
   to state 
    
    (See "Integrity risks" above.) 
    
   *** Attacking the UDVM by sending it looping code 
    
   The application sets an upper limit to the number of "CPU cycles" 
   that can be used per compressed message and per input bit in the 
   compressed message. The damage inflicted by sending packets with 
   looping code is therefore limited, although this may still be 
   substantial if a large number of CPU cycles are offered by the UDVM. 
   However, this would be true for any decompressor that can receive 
   packets from anywhere. 
    
    
11. IANA considerations 
    
   The SigComp solution currently requires two identifiers to be 
   assigned by IANA: the UDVM_version and the state identifier. 
    
   Upgraded versions of the UDVM will contain additional instructions to 
   improve the performance of the overall SigComp solution; new 
   UDVM_version parameters will be needed in this case. 
    
    
12. Acknowledgements 
    
   Thanks to  

 
Price, Hannu, et al.                                           [Page 40] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
            Abigail Surtees (abigail.surtees@roke.co.uk)  
            Mark A West (mark.a.west@roke.co.uk)  
            Lawrence Conroy (lwc@roke.co.uk)  
            Christian Schmidt (christian.schmidt@icn.siemens.de)  
            Max Riegel (maximilian.riegel@icn.siemens.de)  
            Lars-Erik Jonsson (lars-erik.jonsson@epl.ericsson.se) 
            Stefan Forsgren (stefan.forsgren@epl.ericsson.se)  
            Krister Svanbro (krister.svanbro@epl.ericsson.se)  
            Miguel Garcia (miguel.a.garcia@ericsson.com) 
            Christopher Clanton (christopher.clanton@nokia.com)  
            Khiem Le (khiem.le@nokia.com)  
            Ka Cheong Leung (kacheong.leung@nokia.com)  
    
   for valuable input and review. 
    
    
13. Authors' addresses 
    
   Richard Price         Tel: +44 1794 833681 
   Email:                richard.price@roke.co.uk 
    
   Roke Manor Research Ltd 
   Romsey, Hants, SO51 0ZN 
   United Kingdom 
    
    
   Hans Hannu            Tel: +46 920 20 21 84 
   Email:                hans.hannu@epl.ericsson.se 
    
   Box 920 
   Ericsson Erisoft AB 
   SE-971 28 Lulea, Sweden 
    
    
   Carsten Bormann       Tel: +49 421 218 7024 
   Email:                cabo@tzi.org 
    
   Universitaet Bremen TZI 
   Postfach 330440 
   D-28334 Bremen, Germany 
    
    
   Jan Christoffersson   Tel: +46 920 20 28 40 
   Email:                jan.christoffersson@epl.ericsson.se 
    
   Box 920 
   Ericsson Erisoft AB 
   SE-971 28 Lulea, Sweden 
    
    
Price, Hannu, et al.                                           [Page 41] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
   Zhigang Liu           Tel: +1 972 894-5935 
   Email:                zhigang.liu@nokia.com 
    
   Nokia Research Center 
   6000 Connection Drive 
   Irving, TX 75039 
   USA 
    
    
   Jonathan Rosenberg 
   Email:                jdrosen@dynamicsoft.com 
    
   dynamicsoft 
   72 Eagle Rock Avenue 
   First Floor 
   East Hanover, NJ 07936 
 
14. References 
 
   [SIP]       "SIP: Session Initiation Protocol", Handley et al,  
               RFC 2543, Internet Engineering Task Force, March 1999 
    
   [RTSP]      "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A.  
               Rao and R. Lanphier, , RFC 2326, April 1998 
    
   [HTTP]      "HyperText Transfer Protocol, HTTP/1.1", R. Fielding et  
               al.", RFC 2616, June 1999 
    
   [SIPsrv]    "SIP: Locating SIP Servers", J. Rosenberg, H.  
               Schulzrinne, draft-ietf-sip-srv-04.txt, January 2002,  
               work in progress 
    
   [DEFLATE]   "DEFLATE Compressed Data Format Specification version  
               1.3", P. Deutsch, RFC 1951, Internet Engineering Task  
               Force, May 1996 
    
   [SCTP]      "Stream Control Transmission Protocol", Stewart et al,  
               RFC 2960, Internet Engineering Task Force, October 2000 
    
   [MD5]       "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321,  
               Internet Engineering Task Force, April 1992 
    
   [RFC-1662]  "PPP in HDLC-like Framing", Simpson et al, Internet  
               Engineering Task Force, July 1994 
    
   [RFC-2026]  "The Internet Standards Process - Revision 3", Scott 
               Bradner, Internet Engineering Task Force, October 1996 
    
   [RFC-2119]  "Key words for use in RFCs to Indicate Requirement 
               Levels", Scott Bradner, Internet Engineering Task Force, 
               March 1997 

 
Price, Hannu, et al.                                           [Page 42] 

INTERNET-DRAFT                  SigComp                   March 1, 2002 
 
 
Appendix A. Document history 
 
   - October 19, 2001, version 00 
    
   First version. The draft describes the current ideas, from people    
   involved in the ROHC WG, of how to perform compression of  
   application signaling messages. 
    
   - October 31, 2001, version 01 
    
   Second version. Additional section, 5.2.1, which describes when a  
   message identifier can be reused. 
    
   - November 21, 2001, version 02 
    
   Third version. Section 6 has been moved to a separate draft. The  
   third version describes a modular solution, providing flexibility  
   for implementers to decide which functions they want to integrate. 
    
   - January 28, 2002, version 03  
       
   Fourth version. SigComp version 02 is divided into this draft, a UDVM 
   draft and an extended operation mechanisms draft. 
   Compressor/decompressor (UDVM) state approach has been introduced for 
   security reasons.  
    
   - February 14, 2002, version 04 
    
   Fifth version. Describes the complete base SigComp solution including 
   the UDVM. 
    
   - March 1, 2002, version 05 
    
   Sixth version. Comments from several authors and contributors have 
   been taken into account. Announcement mechanism has been updated. 
    
    
   This Internet-Draft expires in September 2002. 

 
Price, Hannu, et al.                                           [Page 43]