INTERNET-DRAFT Eric A. Hall Document: draft-hall-mime-app-mbox-02.txt June 2004 Expires: January, 2005 Category: Standards Track The APPLICATION/MBOX Media-Type Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document requests that the application/MBOX media-type be authorized for allocation by IANA, according to the terms specified in RFC 2048 [RFC2048]. Internet Draft draft-hall-mime-app-mbox-02.txt July 2004 1. Background and Overview UNIX and look-alike operating systems have historically made use of "mbox" database files for a variety of messaging purposes. In the common case, these database files hold collections of electronic mail messages which are collectively manipulated as "folders" in a private mail-store. These files are also widely used by a variety of filtering systems, archival programs, and other messaging-related tools, and are also widely supported on non-UNIX platforms for similar purposes. The increased pervasiveness of these files has led to an increased demand for improvements in cross-system, network-wide interchange of these files. In turn, this requirement also dictates a need for a media-type definition for mbox files in general, so that the data can be tagged and identified during transfer. Note that there are many inconsistencies in how mbox databases are structured and stored, and some form of prior agreement is usually necessary before these files can be used in automated tasks. For example, it is entirely possible for an mbox database to contain untagged eight-bit character data or to use end-of-line sequences which are peculiar to a host platform, although the mbox database format does not provide any means for indicating these options within the database itself. As a result, some form of external negotiation or prior agreement is often necessary in order to ensure that the contents of the database are accurately read by messaging systems on other hosts. On this point, it is useful to note that the multipart/digest [RFC2046] media-type has authoritative, platform-independent formatting rules which facilitate much more predictable transfer and conversion routines in multi-system environments, and implementations are strongly encouraged to give preference to the multipart/digest media-type where possible. 2. Prerequisites and Terminology Readers of this document are expected to be familiar with the specification for MIME registrations [RFC2048]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Hall I-D Expires: December 2004 [page 2] Internet Draft draft-hall-mime-app-mbox-02.txt July 2004 3. The APPLICATION/MBOX Media-Type Registration Request This section provides the registration request, as per [RFC2048], and which will be submitted to IANA after IESG approval. MIME media type name: application MIME subtype name: mbox Required parameters: none Optional parameters: none Encoding considerations: mbox data typically consists of seven-bit ASCII characters in an eight-bit file stream, although this is not required, nor can it be assumed. mbox databases may contain unencoded and untagged eight-bit character data, or may contain data which has been encoded to fit within a seven-bit stream, or may contain a mixture of both types simultaneously. Any transfer encoding may be used with mbox files, with the appropriate encoding for any specific file being entirely dependant upon the database contents. Security considerations: mbox data is passive, and does not generally represent a unique or new security threat. However, there is some risk in sharing any kind of data, in that unintentional information may be exposed, and that risk applies to mbox data as well. Interoperability considerations: mbox databases on UNIX-like systems typically use ASCII Line Feed (0x0A) as the end-of-line character. Messaging systems on other platforms may use the UNIX- centric end-of-line marker, but may also use an end-of-line signal that is specific to their host operating system. Messages in mbox databases contain header fields which appear to mirror common Internet message headers, but which typically use local encodings rather than Internet formats. For example, message headers in an mbox database may contain untagged eight-bit character data, or may contain email addresses with no domain name, with these usages reflecting local-system mail-store services that are incompatible with defined Internet formats. Hall I-D Expires: December 2004 [page 3] Internet Draft draft-hall-mime-app-mbox-02.txt July 2004 As a result of these and other vagaries, mbox databases generally require some kind of out-of-band negotiation or prior agreement before they can be successfully parsed and read on other systems. Published specification: see Appendix A. Applications which use this media type: scores of messaging products make use of the mbox database format. Magic number(s): no standard File extension(s): mbox files sometimes have a ".mbox" extension, but this is not required nor expected. Macintosh File Type Code(s): no standard Person & email address to contact for further information: Eric A. Hall (ehall@ntrg.com) Intended usage: COMMON 4. Security Considerations See the discussion in section 3. 5. IANA Considerations After any IESG approval which may be forthcoming, IANA would be expected to register the application/mbox media-type, using the application provided in section 3 above. 6. Normative References [RFC2046] Freed, N., Borenstein, N., "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [RFC2048] Freed, N., Klensin, J., Postel, J., "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", BCP 13, RFC 2048, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Hall I-D Expires: December 2004 [page 4] Internet Draft draft-hall-mime-app-mbox-02.txt July 2004 Appendix A. The mbox Database Format The mbox database format is not documented by any authoritative source, but instead only exists as commonly-understood output from historical messaging tools. Partly due to the lack of any such authoritative documentation, the mbox format has been adapted and mutated by various utilities over the years, and does not exist in a form which is syntactically precise. In general, mbox files typically contain a sequence of messages, each of which begin with a "From_" line. The structure of the "From_" lines vary somewhat, but almost always contain the exact character sequence of "From", followed by whitespace, followed by an email address of some kind, followed by more whitespace, and terminated by a timestamp sequence of some kind. Note that the email address may reflect any addressing syntax which has ever been used on any system in all of history, and the timestamp sequences can also vary according to system functions. In most cases, the timestamp is followed by an end-of-line signal, but some messaging systems have also been known to append additional information after the timestamp. The exact format of the "From_" line in use with a particular mbox file can often be determined by examining the first line of the file itself, which will likely be a "From_" line, and which is easy to locate, although implementers are cautioned that multiple mbox files may have been joined together, or a single file may have been accessed from multiple systems, resulting in different "From_" line formats being used within a single file. When multiple messages are stored in an mbox file, each message is separated from its neighboring messages by an empty line that precedes the next "From_" line. This means that the first message in an mbox file will immediately begin with a "From_" line, while every other message will begin with a "From_" line that is immediately preceded by two end-of-line sequences (one at the end of the preceding message, and another to signify the blank line). Many implementations are also known to escape body lines beginning with "From " with a leading Greater-Than symbol (0x3E) so that excessively-liberal parsers do not misinterpret these sentences as new "From_" lines. However, other implementations are known not to escape such lines unless they are immediately preceded by a blank line or if they appear to contain an email address and a Hall I-D Expires: December 2004 [page 5] Internet Draft draft-hall-mime-app-mbox-02.txt July 2004 timestamp, while others are known to perform secondary escapes against "From_" lines which are already escaped or quoted. A comprehensive description of mbox database files on UNIX-like systems can be found at http://qmail.org./man/man5/mbox.html, which should be treated as mostly authoritative for those variations which are otherwise only documented in anecdotal form. Acknowledgments Funding for the RFC editor function is currently provided by the Internet Society. Authors' Addresses Eric A. Hall ehall@ntrg.com Full Copyright Statement Copyright (C) The Internet Society (2004). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF Hall I-D Expires: December 2004 [page 6] Internet Draft draft-hall-mime-app-mbox-02.txt July 2004 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Hall I-D Expires: December 2004 [page 7]