charset="utf-8" .vs. charset=utf-8

MURATA Makoto

$Id: charsetQuotes.html,v 1.2 2003/07/31 00:53:39 makoto Exp makoto $

MIME headers for XML media types can specify the charset parameter. Recently, questions arose about the syntax of this parameter.

All examples in Section 8 of RFC 3023 (XML Media Types) and SOAP Version 1.2 Part 0: Primer (W3C Recommendation) use quotes. E.g.,

charset="iso-8859-1"

Meanwhile, Section 4.1 of RFC 2046 (Media types) does not use quotes. E.g.,

charset=iso-8859-1

Some implementations use the former and reject the latter. Others use the latter and reject the former.

We find that both are correct, since Section 5.1 of RFC 2045 (Format of Internet Message Bodies) explicitly allows both.

Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms

Content-type: text/plain; charset=us-ascii (Plain text)

Content-type: text/plain; charset="us-ascii"

are completely equivalent.

Thus, both charset="utf-8" and charset=utf-8 are correct. Implementations must accept both.

Further information about this issue is available at:
http://groups.yahoo.com/group/unicode/message/17065

Acknowledgements: Thanks to TAKASE Toshiro and Simon St. Laurent for their helpful comments.


(c)MURATA Makoto 2030-07-31