Network News Transfer Protocol (NNTP) Extension for Compression

Network News Transfer Protocol (NNTP) Extension for Compression Carnegie Mellon University

5000 Forbes Avenue Pittsburgh PA 15213 US +1 412 268 1982 murch@andrew.cmu.edu

10 allée Clovis 93160 Noisy-le-Grand France julien@trigofacile.com http://www.trigofacile.com/

Applications Independent Submission NNTP Usenet NetNews COMPRESS DEFLATE compression This document defines an extension to the Network News Transport Protocol (NNTP) that allows a connection to be effectively and efficiently compressed between an NNTP client and server.

The goal of COMPRESS is to reduce the bandwidth usage of NNTP. Compared to PPP compression and modem-based compression ( and ), COMPRESS offers greater compression efficiency. COMPRESS can be used together with Transport Layer Security (TLS) , Simple Authentication and Security Layer (SASL) encryption , Virtual Private Networks (VPNs), etc. The point of COMPRESS as an NNTP extension is to behave as a transport layer, similar to STARTTLS . Compression can therefore benefit to all NNTP commands sent or received after the use of COMPRESS. This facility responds to a long-standing need for NNTP to compress data, that has partially been addressed by unstandardized commands like XZVER, XZHDR, XFEATURE COMPRESS, or MODE COMPRESS. These commands are not wholly satisfactory because they enable compression only for the responses sent by the news server. On the contrary, the COMPRESS command permits to compress data sent by both the client and the server, and removes the constraint of having to implement compression separately in each NNTP command. Besides, the compression level can be dynamically adjusted and optimized at any time during the connection, which even allows to disable compression for certain commands if need be. If the news client wants to stop compression on a particular connection, it can simply use QUIT ( Section 5.4), and establish a new connection. For these reasons, using other NNTP commands than COMPRESS to enable compression is discouraged once COMPRESS is supported. In order to increase interoperability, it is desirable to have as few different compression algorithms as possible, so this document specifies only one. The DEFLATE algorithm (defined in ) MUST be implemented as part of this extension. This compression algorithm is standard, widely available, and fairly efficient. This specification should be read in conjunction with the NNTP base specification . In the case of a conflict between these two documents, takes precedence.

Though data compression is made possible via the use of TLS with NNTP , the best current practice is to disable TLS-level compression as explained in Section 3.3 of . The COMPRESS command will permit to keep the compression facility in NNTP and control when it is available during a connection. Compared to TLS-level compression , NNTP COMPRESS has the following advantages: COMPRESS can be implemented easily both by NNTP servers and clients. COMPRESS benefits from an intimate knowledge of the NNTP protocol's state machine, allowing for dynamic and aggressive optimization of the underlying compression algorithm's parameters. COMPRESS can be activated after authentication has completed thus reducing the chances that authentication credentials can be leaked via for instance a CRIME attack ( Section 2.6).

The notational conventions used in this document are the same as those in , and any term not defined in this document has the same meaning as it does in that one. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in . In the examples, commands from the client are indicated with [C], and responses from the server are indicated with [S]. The client is the initiator of the NNTP connection; the server is the other endpoint.

Please write the first letter of "Elie" and the penultimate letter of "allee" with an acute accent wherever possible -- they are respectively U+00C9 ("É" in XML) and U+00E9 ("é" in XML).

The COMPRESS extension is used to enable data compression on an NNTP connection. This extension provides a new COMPRESS command and has capability label COMPRESS.

A server supporting the COMPRESS command as defined in this document will advertise the "COMPRESS" capability label in response to the CAPABILITIES command ( Section 5.2). However, this capability MUST NOT be advertised once a compression layer is active (see Section 2.2.2). This capability MAY be advertised both before and after any use of the MODE READER command ( Section 5.3), with the same semantics. The COMPRESS capability label contains a whitespace-separated list of available compression algorithms. This document defines one compression algorithm: DEFLATE. This algorithm is mandatory to implement and MUST be supported in order to advertise the COMPRESS extension. Future extensions may add additional compression algorithms to this capability. Unrecognized algorithms MUST be ignored by the client. As the COMPRESS command is related to security because it can weaken encryption, cached results of CAPABILITIES from a previous session MUST NOT be relied on, as per Section 12.6 of . Example:

[C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] READER [S] IHAVE [S] COMPRESS DEFLATE X-SHRINK [S] LIST ACTIVE NEWSGROUPS [S] .

This command MUST NOT be pipelined.

Syntax COMPRESS algorithm Responses 206 Compression active 403 Unable to activate compression 502 Command unavailable [1] [1] If a compression layer is already active, COMPRESS is not a valid command (see Section 2.2.2). Parameters algorithm = Name of compression algorithm (e.g. "DEFLATE")

The COMPRESS command instructs the server to use the named compression algorithm ("DEFLATE" is the only one defined in this document) for all commands and/or responses after COMPRESS. The client MUST NOT send any further commands until it has seen the result of COMPRESS. If the requested compression algorithm is syntactically incorrect, the server MUST reject the COMPRESS command with a 501 response code ( Section 3.2.1). If the requested compression algorithm is invalid (e.g., is not supported), the server MUST reject the COMPRESS command with a 503 response code ( Section 3.2.1). If the server is unable to activate compression for any reason (e.g., a server configuration or resource problem), the server MUST reject the COMPRESS command with a 403 response code ( Section 3.2.1). Otherwise, the server issues a 206 response code and the compression layer takes effect for both client and server immediately following the CRLF of the success reply. Additionally, the client MUST NOT issue a MODE READER command after activating a compression layer, and a server MUST NOT advertise the MODE-READER capability. Both the client and the server MUST know if there is a compression layer active (for instance via the previous use of the COMPRESS command or the negotiation of a TLS-level compression ). A client MUST NOT attempt to activate compression or negotiate a TLS layer (for instance via the use of the COMPRESS or STARTTLS commands) if a compression layer is already active. A server MUST NOT return the COMPRESS or STARTTLS capability labels in response to a CAPABILITIES command received after a compression layer is active, and a server MUST reply with a 502 response code if a syntactically valid COMPRESS or STARTTLS command is received while a compression layer is already active. In order to help mitigate leaking authentication credentials via for instance a CRIME attack , authentication SHOULD NOT be attempted when a compression layer is active. Consequently, a server SHOULD NOT return any arguments with the AUTHINFO capability label (or SHOULD NOT advertise it at all) in response to a CAPABILITIES command received from an unauthenticated client after a compression layer is active, and such a client SHOULD NOT attempt to utilize any AUTHINFO commands. It implies that a server SHOULD reply with a 502 response code if a syntactically valid AUTHINFO command is received while a compression layer is already active. For DEFLATE (as for many other compression mechanisms), the compressor can trade speed against quality. The decompressor MUST automatically adjust to the parameters selected by the sender. Consequently, the client and server are both free to pick the best reasonable rate of compression for the data they send. When COMPRESS is combined with TLS or SASL security layers, the processing order of the three layers MUST be first COMPRESS, then SASL, and finally TLS. That is, before data is transmitted, it is first compressed. Second, if a SASL security layer has been negotiated, the compressed data is then signed and/or encrypted accordingly. Third, if a TLS security layer has been negotiated, the data from the previous step is signed and/or encrypted accordingly. When receiving data, the processing order MUST be reversed. This ensures that before sending, data is compressed before it is encrypted, independent of the order in which the client issues the COMPRESS, AUTHINFO SASL , and STARTTLS commands. When compression is active and either the client or the server receives invalid or corrupted compressed data, the receiving end SHOULD immediately close the connection. (Then the sending end will in turn do the same.)

Example of layering TLS and NNTP compression:

[C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] READER [S] STARTTLS [S] AUTHINFO [S] COMPRESS DEFLATE [S] LIST ACTIVE NEWSGROUPS [S] . [C] STARTTLS [S] 382 Continue with TLS negotiation [TLS negotiation without compression occurs here] [Following successful negotiation, all traffic is encrypted] [C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] READER [S] AUTHINFO USER [S] COMPRESS DEFLATE [S] LIST ACTIVE NEWSGROUPS [S] . [C] AUTHINFO USER fred [S] 381 Enter passphrase [C] AUTHINFO PASS flintstone [S] 281 Authentication accepted [C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] READER [S] POST [S] COMPRESS DEFLATE [S] LIST ACTIVE NEWSGROUPS [S] . [C] COMPRESS DEFLATE [S] 206 Compression active [Henceforth, all traffic is compressed before being encrypted] Example of a server failing to activate compression:

[C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] IHAVE [S] COMPRESS DEFLATE [S] . [C] COMPRESS DEFLATE [S] 403 Unable to activate compression Example of attempting to use an unsupported compression algorithm:

[C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] IHAVE [S] COMPRESS DEFLATE [S] . [C] COMPRESS X-SHRINK [S] 503 Compression algorithm not supported Examples of a server refusing to compress twice:

[C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] IHAVE [S] STARTTLS [S] COMPRESS DEFLATE [S] . [C] STARTTLS [S] 382 Continue with TLS negotiation [TLS negotiation with compression occurs here] [Following successful negotiation, all traffic is protected by TLS] [C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] IHAVE [S] . [C] COMPRESS DEFLATE [S] 502 Compression already active via TLS

[C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] IHAVE [S] STARTTLS [S] COMPRESS DEFLATE [S] . [C] COMPRESS DEFLATE [S] 206 Compression active [Henceforth, all traffic is compressed] [C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] IHAVE [S] . [C] STARTTLS [S] 502 DEFLATE compression already active Example of a server not advertising AUTHINFO arguments after compression has been activated:

[C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] READER [S] AUTHINFO USER [S] COMPRESS DEFLATE [S] LIST ACTIVE NEWSGROUPS [S] . [C] COMPRESS DEFLATE [S] 206 Compression active [Henceforth, all traffic is compressed] [C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] READER [S] AUTHINFO [S] LIST ACTIVE NEWSGROUPS [S] . [C] AUTHINFO USER fred [S] 502 DEFLATE compression already active

This section is informative, not normative. NNTP poses some unusual problems for a compression layer. Upstream traffic is fairly simple. Most NNTP clients send the same few commands again and again, so any compression algorithm that can exploit repetition works efficiently. The article posting and transfer commands (e.g., POST, IHAVE, and TAKETHIS ) are exceptions; clients that send many article posting or transfer commands may want to surround large multi-line data blocks with a dictionary flush and/or, depending on the compression algorithm, a change of compression level in the same way as is recommended for servers later in this document (). Downstream traffic has the unusual property that several kinds of data are sent, possibly confusing a dictionary-based compression algorithm. One type is NNTP simple responses and NNTP multi-line responses not related to article header/body retrieval (e.g, CAPABILITIES, GROUP, LISTGROUP, LAST, NEXT, STAT, DATE, NEWNEWS, NEWGROUPS, LIST, CHECK , etc). These are highly compressible; zlib using its least CPU-intensive setting compresses typical responses to 25-40% of their original size. Another type is article headers (as retrieved via the HEAD, HDR, OVER, or ARTICLE commands). These are equally compressible, and benefit from using the same dictionary as the NNTP responses. A third type is article body text (as retrieved via the BODY or ARTICLE commands). Text is usually fairly short and includes much ASCII, so the same compression dictionary will do a good job here, too. When multiple messages in the same thread are read at the same time, quoted lines, etc. can often be compressed almost to zero. Finally, non-text article bodies or attachments (as retrieved via the BODY and ARTICLE commands) are transmitted in encoded form, usually Base64 , UUencode , or yEnc . When already compressed articles or attachments are retrieved, a compression algorithm may be able to compress them, but the format of their encoding is usually not NNTP-like, so the dictionary built while compressing NNTP does not help much. The compressor has to adapt its dictionary from NNTP to the attachment's encoding format, and then back. When attachments are retrieved in Base64 or UUencode form, the Huffman coding usually compresses those to approximatively only 75% of their encoding size. 8-bit compression algorithms such as DEFLATE work well on 8-bit file formats; however, both Base64 and UUencode transform a file into something resembling 6-bit bytes, hiding most of the 8-bit file format from the compressor. On the other end, attachments encoded using a compression algorithm that retains the full 8-bit spectrum, like yEnc, are much more likely to be incompressible.

When using the zlib library (see ), the functions deflateInit2(), deflate(), inflateInit2(), and inflate() suffice to implement this extension. The windowBits value must be in the range -8 to -15 for deflateInit2(), or else it will use the wrong format. The windowBits value should be -15 for inflateInit2(), or else it will not be able to decompress a stream with a larger window size. deflateParams() can be used to improve compression rate and resource use. In order to improve compression efficiency, the Z_PARTIAL_FLUSH argument to deflate() should always be used to flush data. As far as DEFLATE is concerned, clearing the dictionary never improves compression over the other flushes. On the contrary, having the 32kB dictionary from previous data, no matter how unrelated, can only help. If there are no matching strings in there, then it is simply not referenced. Using Z_FULL_FLUSH clears the dictionary, and consequently always results in compression that is less effective than a Z_PARTIAL_FLUSH. A server can improve downstream compression and the CPU efficiency both of the server and the client if it adjusts the compression level (e.g., using the deflateParams() function in zlib) at the start and end of large non-text multi-line data blocks (before and after 'content-lines' in the definition of 'multi-line-data-block' in Section 9.8). It permits to avoid trying to compress incompressible attachments. Small multi-line data blocks are best left alone. A possible boundary is 5kB. A very simple strategy is to change the compression level to 0 at the start of a multi-line data block provided the first two bytes are either 0x1F 0x8B (as in deflate-compressed files) or 0xFF 0xD8 (JPEG), and to keep it at 1-5 the rest of the time. More complex strategies are of course possible, and encouraged.

This section describes the formal syntax of the COMPRESS extension using ABNF . It extends the syntax in Section 9 of , and non-terminals not defined in this document are defined there. The ABNF should be imported first, before attempting to validate these rules.

This syntax extends the non-terminal <command>, which represents an NNTP command.

command =/ compress-command compress-command = "COMPRESS" WS algorithm

This syntax extends the non-terminal <capability-entry>, which represents a capability that may be advertised by the server.

capability-entry =/ compress-capability compress-capability = "COMPRESS" 1*(WS algorithm)

algorithm = "DEFLATE" / 1*20alg-char alg-char = UPPER / DIGIT / "-" / "_"

This section contains a list of each new response code defined in this document and indicates whether it is multi-line, which commands can generate it, what arguments it has, and what its meaning is.

Response code 206 Generated by: COMPRESS Meaning: compression layer activated

Security issues are discussed throughout this document. In general, the security considerations of the NNTP core specification ( Section 12) and the DEFLATE compressed data format specification ( Section 6) are applicable here. Implementers should be aware that combining compression with encryption like TLS can sometimes reveal information that would not have been revealed without compression, as explained in Section 6 of . As a matter of fact, adversaries that observe the length of the compressed data might be able to derive information about the corresponding uncompressed data. The CRIME and the BREACH attacks ( Section 2.6) are examples of such case. In order to help mitigate leaking authentication credentials, this document states in that authentication SHOULD NOT be attempted when a compression layer is active. Therefore, when a client wants to authenticate, compress data, and negotiate a secure TLS layer (without TLS-level compression) in the same NNTP connection, it SHOULD use the STARTTLS, AUTHINFO, and COMPRESS commands in that order. Of course instead of using the STARTTLS command, a client can also use implicit TLS, that is to say it begins the TLS negotiation immediately upon connection on a separate port dedicated to NNTP over TLS. NNTP commands other than AUTHINFO are not believed to divulgate confidential information as far as public Netnews newsgroups and articles are concerned. That is why this specification only adds a restriction to the use of AUTHINFO when a compression layer is active. In case private and confidential newsgroups or articles are accessed, a server SHOULD NOT allow compression, and a client SHOULD NOT attempt to activate compression, for the same reasons as mentioned above in this Section. Implementations MUST support a configuration where compression can be easily disabled. Future extensions to NNTP that define commands conveying confidential data SHOULD ensure to state that they SHOULD NOT be used along with compression.

The NNTP Compression Algorithm registry will be maintained by IANA. The registry will be available at <http://www.iana.org/assignments/nntp-compression-algorithms>. The purpose of this registry is not only to ensure uniqueness of values used to name NNTP compression algorithms, but also to provide a definitive reference to technical specifications detailing each NNTP compression algorithm available for use on the Internet. There is no naming convention for NNTP compression algorithms; any name that conforms to the syntax of a NNTP compression algorithm name can be registered. The procedure detailed in is to be used for registration of a value naming a specific individual mechanism. Comments may be included in the registry as discussed in and may be changed as discussed in .

IANA will register new NNTP compression algorithm names on a First Come First Served basis, as defined in BCP 26 . IANA has the right to reject obviously bogus registration requests, but will perform no review of claims made in the registration form. Registration of an NNTP compression algorithm is requested by filling in the following template and sending it via electronic mail to IANA at <iana@iana.org>:

Subject: Registration of NNTP compression algorithm X NNTP compression algorithm name: Security considerations: Published specification (recommended): Contact for further information: Intended usage: (One of COMMON, LIMITED USE, or OBSOLETE) Owner/Change controller: Note: (Any other information that the author deems relevant may be added here.) While this registration procedure does not require expert review, authors of NNTP compression algorithms are encouraged to seek community review and comment whenever that is feasible. Authors may seek community review by posting a specification of their proposed mechanism as an Internet-Draft. NNTP compression algorithms intended for widespread use should be standardized through the normal IETF process, when appropriate.

Comments on a registered NNTP compression algorithm should first be sent to the "owner" of the algorithm and/or to the <ietf-nntp@lists.eyrie.org> mailing list. Submitters of comments may, after a reasonable attempt to contact the owner, request IANA to attach their comment to the NNTP compression algorithm registration itself by sending mail to <iana@iana.org>. At IANA's sole discretion, IANA may attach the comment to the NNTP compression algorithm's registration.

Once an NNTP compression algorithm registration has been published by IANA, the owner may request a change to its definition. The change request follows the same procedure as the initial registration request. The owner of an NNTP compression algorithm may pass responsibility for the algorithm to another person or agency by informing IANA; this can be done without discussion or review. The IESG may reassign responsibility for an NNTP compression algorithm. The most common case of this will be to enable changes to be made to algorithms where the owner of the registration has died, has moved out of contact, or is otherwise unable to make changes that are important to the community. NNTP compression algorithm registrations MUST NOT be deleted; algorithms that are no longer believed appropriate for use can be declared OBSOLETE by a change to their "intended usage" field; such algorithms will be clearly marked in the registry published by IANA. The IESG is considered to be the owner of all NNTP compression algorithms that are on the IETF standards track.

This section gives a formal definition of the DEFLATE compression algorithm as required by for the IANA registry. NNTP compression algorithm name: DEFLATE Security considerations: See of this document Published specification: This document Contact for further information: Authors of this document Intended usage: COMMON Owner/Change controller: IESG <iesg@ietf.org>. Note: This algorithm is mandatory to implement

This section gives a formal definition of the COMPRESS extension as required by Section 3.3.3 of for the IANA registry. The COMPRESS extension allows an NNTP connection to be effectively and efficiently compressed. The capability label for this extension is "COMPRESS", whose arguments list the available compression algorithms. This extension defines one new command, COMPRESS, whose behavior, arguments, and responses are defined in . This extension does not associate any new responses with pre-existing NNTP commands. This extension does affect the overall behavior of both server and client, in that after successful use of the COMPRESS command, all communication is transmitted in a compressed format. This extension does not affect the maximum length of commands or initial response lines. This extension does not alter pipelining, but the COMPRESS command cannot be pipelined. Use of this extension does alter the capabilities list; once the COMPRESS command has been used successfully, the COMPRESS capability can no longer be advertised by CAPABILITIES. Additionally, the STARTTLS and MODE-READER capabilities MUST NOT be advertised, and the AUTHINFO capability label SHOULD either return no arguments or no longer be advertised after successful execution of the COMPRESS command. This extension does not cause any pre-existing command to produce a 401, 480, or 483 response code. This extension is unaffected by any use of the MODE READER command; however, the MODE READER command MUST NOT be used in the same session following a successful execution of the COMPRESS command. The STARTTLS command MUST NOT be used, and the AUTHINFO command SHOULD NOT be used, in the same session following a successful execution of the COMPRESS command. Published Specification: This document. Contact for Further Information: Authors of this document. Change Controller: IESG <iesg@ietf.org>.

&rfc2119; &rfc3977; &rfc4642; &rfc5226; &rfc5234; &rfc1951; &rfc1962; &rfc3749; &rfc4422; &rfc4643; &rfc4644; &rfc4648; &rfc4978; &rfc5246; &rfc7457; &rfc7525; &ieee1003; Data compression procedures for data circuit-terminating equipment (DCE) using error correction procedures International Telecommunications Union The Complete Modem Reference The CRIME Attack yEnc - Efficient encoding for Usenet and eMail

This document draws heavily on ideas in by Arnt Gulbrandsen and a large portion of this text was borrowed from that specification. The authors would like to thank the following individuals for contributing their ideas and support for writing this specification: Mark Adler, Russ Allbery, Michael Bäuerle, and Brian Peterson.

Added text stating that the receiving end SHOULD terminate the connection when receiving invalid or corrupted compressed data. Explained why COMPRESS permits to do better than existing unstandardized commands like XZVER, XZHDR, MODE COMPRESS, and XFEATURE GZIP. Added an example of AUTHINFO command when compression is active. The LIST capability label was missing in the examples when READER was also advertised. Improved an example to send CAPABILITIES after successful authentication. Mentioned that COMPRESS is related to security. CAPABILITIES is therefore sent again after COMPRESS. Re-added discussion of attachments in binary form and incompressible file formats. Improve the discussion about flushes, and add a specific section about DEFLATE. Change a MUST NOT to SHOULD NOT for the use of AUTHINFO after COMPRESS. Algorithm names are case-insensitive. Mentioned the use of the 501 response code. Added the Security Considerations Section. Added Julien Élie as co-author of this document. Minor editorial changes.

Switched to using 206 response code when compression has been activated. Added text stating that TLS-level compression is susceptible to CRIME attack and current BCP is to disable it. Added text stating that AUTHINFO shouldn't be advertised or used after COMPRESS to prevent possible CRIME attack (with example). Added text stating that a windowBits value of -15 should be used for inflateInit2(). Minor editorial changes.

Made DEFLATE the mandatory to implement compression algorithm. Removed the requirement that clients/servers implementing COMPRESS also implement TLS compression. Added an example of a client trying to use an unsupported compression algorithm. Rewrote Compression Efficiency as follows: Included a sample listing of which NNTP commands produce which type of data to be compressed. Removed discussion of attachments in binary form and incompressible file formats. Mentioned UUencode and yEnc encoding of attachments. Added IANA registry of NNTP compression algorithms. Miscellaneous editorial changes submitted by Julien Élie.

How are SASL and TLS specific exchanges supposed to be handled? Should it be mentioned in the RFC that they are outside the scope of NNTP compression? (I speak of stuff like TLS handshakes, renegotiations, etc. that can occur after the use of COMPRESS. OpenSSL may use SSL_read/SSL_write on its own, mayn't it? And it does not know that it should compress data...) Anything to add to the Security Considerations section? Whom should we get a review from? Should we add a naming convention for private compression algorithms? (I am unsure that saying it begins with "X" is a good idea, though; we may one day see a standardized compression algorithm with such a name - XZ for instance, though LZMA2 would be its real name.)