WWW server configuration for HTML, XML, CSS 
                     and other textual data
                                                        UCHIDA Akira
Abstract
It is difficult for WWW server administrators to configure WWW servers 
correctly, and it is also difficult for WWW page authors to create 
WWW pages correctly, when they have to manage Web document files in 
various character encodings.  One effective solution is editing server 
configuration file to bind a set of extended suffixes for Web document 
files to MIME types as well as charset parameters.  This memo proposes 
a suffixes extension guideline for Web document files to indicate 
character encodings.
1. Introduction
There are various Web pages which are written in various languages and 
various character encodings in the WWW, and there are lot of WWW servers 
which have not been configured to send a charset parameter.  Because 
it is very difficult for Webmaster to choose one default character 
encoding for his/her site in this "World Wide" Web ages.
Even though he/she can choose one default character encoding, he/she 
has to configure his/her server to send correct charset parameter 
whitch indicates another character encodings.
To provide correct charset parameter, some Webmaster make a local rule 
to bind suffixes to MIME types using server configuration mechanism;
    suffix ".html" indicate MIME type "text/html; charset=iso-8859-1", 
    suffix ".html2" indicate MIME type "text/html; charset=iso-8859-2", 
etc. [Bert Bos]
However, it is not easy for most people to systematically choose 
different suffixes for different charsets, provide appropriate 
configuration files, and write comprehensible manuals for WWW page 
authros.  This difficulty is one of the many reasons that most WWW 
browsers are not configured correctly.
To overcome this problem, we propose a set of file suffixes and provide 
example configuration files for some widely-used WWW servers.  These 
suffixes are systematically chosen, easy to remember, and ready to use. 
We hope that this proposal helps WWW server administrators to configure 
WWW servers correctly and also helps WWW page authors to create WWW pages 
correctly.
2. Goal
It is necessary and sufficient for our indicator to identify all 
character encodings that registered IANA registry (See [HTTP1.1] 
section 3.4, [HTML4.0] section 5, [Text/xml] section 3.1, [Text/css] 
section 4).
Design goals are:
- It shall be able to identify all character encodings that registered 
  IANA registry now.
- It shall be able to identify all character encodings that will be 
  registered IANA registry in the future.
- Fewer (digits or letters) is better.
3. Proposal: 
We propose that the choice of charset indicator should observe 
the following priorities when determining a new charset 
indicators (from highest priority to lowest): 
    1. If the document's character encoding is US-ASCII, use "ASCII" 
       as an indicator. 
    2. If the document's character encoding's MIME name has less than 
       seven letters (or digits) form or the document's character 
       encoding is experimental character encoding, use MIME name as an 
       indicator.
    3. If the document's character encoding is ISO-8859 series, 
       use ISO-8859 series identifier as an indicator.
    4. If the document's character encoding is ISO-2022 series, 
       use ISO-2022 series identifier as an indicator.
    5. And if not, use MIBnum value as an indicator.
The indicator can be either Mixed style (One suffix such like 
".htmlascii" represent MIME type and character encoding) or Seperate 
style (Doublled suffixes such like ".html.ascii" represent MIME type and 
character encoding). 
Note.  Suffix ".html" and the HTTP default character encoding
 Conforming HTTP 1.1 WWW server, if it's not changed default character 
 encoding for HTML file, will provide the charset parameter "charset=
 ISO-8859-1" for ".html" file (see [HTTP1.1] section 3.7.1).  By our 
 proposal or default setting, the WWW server may be able to provide 
 appropriate charset parameter for ISO-8859-1 encoded HTML files.
3.1. US-ASCII encoded file
Seperate style usage: ".*.ascii"
(eg. to indicate US-ASCII encoded CSS file, suffix will be ".css.ascii")
Mixed style usage: ".*ascii"
(eg. to indicate US-ASCII encoded XML file, suffix will be ".xmlascii")
3.2. MIME name as an indicator
Seperate style usage: ".*.xx"
(eg. to indicate UTF-8 encoded HTML file, suffix will be ".html.utf-8")
Mixed style usage: ".*xx"
(eg. to indicate EUC-JP encoded HTML file, suffix will be ".htmleuc-jp")
3.3. ISO-8859 series identifiers as an indicator
Seperate style usage: ".*.8859-x"
(eg. to indicate ISO-8859-1 encoded XML file, suffix will be 
".xml.8859-1")
Preffered Mixed style usage: ".*8859-x"
(eg. to indicate ISO-8859-2 encoded HTML file, suffix will be 
".html8859-2")
Altanative Mixed style usage: ".*-x"
(eg. to indicate ISO-8859-2 encoded HTML file, suffix will be 
".html-2")
Advantages:
- Easy to remember.
Disadvantages:
- None.
3.4. ISO-2022 series identifiers as an indicator
Seperate style usage: ".*.2022xx"
(eg. to indicate ISO-2022-JP-2 encoded CSS file, suffix will be 
".css.2022jp2")
Preffered Mixed style usage: ".*2022xx"
(eg. to indicate ISO-2022-JP-2 encoded HTML file, suffix will be 
".html2022jp2")
Altanative Mixed style usage: ".*xx"
(eg. to indicate ISO-2022-JP-2 encoded HTML file, suffix will be 
".htmljp2")
Advantages:
- Easy to remember.
Disadvantages:
- None.
3.5. MIBnum values as an indicator
Seperate style usage: ".*.MIBxx"
(eg. to indicate MIBnum 17 encoded HTML file, suffix will be 
".html.mib17")
Mixed style usage: ".*MIBxx"
(eg. to indicate MIBnum 17 (Shift_JIS) encoded HTML file, suffix will 
be ".htmlmib17")
Advantages:
- Absolutely unique.
- Can indicate all IANA registered charsets.
Disadvantages:
- Hard to remember.
- Four digits are needed to represent the character encoding scheme 
  that have not been standardized by any standard setting organization.
4. Sample suffixes table for HTML file
4.1. Suffixes for HTML file in Seperate style
	encoding        suffix
	US-ASCII        .html.ascii
	ISO-8859-1      .html.8859-1
	ISO-8859-2      .html.8859-2
	ISO-8859-3      .html.8859-3
	ISO-8859-4      .html.8859-4
	ISO-8859-5      .html.8859-5
	ISO-8859-6      .html.8859-6
	ISO-8859-7      .html.8859-7
	ISO-8859-8      .html.8859-8
	ISO-8859-9      .html.8859-9
	Shift_JIS       .html.mib17
	EUC-JP          .html.euc-jp
	ISO-2022-KR     .html.2022kr
	EUC-KR          .html.euc-kr
	ISO-2022-JP     .html.2022jp
	ISO-2022-JP-2   .html.2022jp2
    UTF-7           .html.utf-7
    UTF-8           .html.utf-8
	GB2312          .html.gb2312
	Big5            .html.big5
	KOI8-R          .html.koi8-r
4.2. Preffered Suffixes for HTML file in Mixed style
	encoding        suffix
	US-ASCII        .htmlascii
	ISO-8859-1      .html8859-1
	ISO-8859-2      .html8859-2
	ISO-8859-3      .html8859-3
	ISO-8859-4      .html8859-4
	ISO-8859-5      .html8859-5
	ISO-8859-6      .html8859-6
	ISO-8859-7      .html8859-7
	ISO-8859-8      .html8859-8
	ISO-8859-9      .html8859-9
	Shift_JIS       .htmlmib17
	EUC-JP          .htmleuc-jp
	ISO-2022-KR     .html2022kr
	EUC-KR          .htmleuc-kr
	ISO-2022-JP     .html2022jp
	ISO-2022-JP-2   .html2022jp2
    UTF-7           .htmlutf-7
    UTF-8           .htmlutf-8
	GB2312          .htmlgb2312
	Big5            .htmlbig5
	KOI8-R          .htmlkoi8-r
4.3. Altanative Suffixes for HTML file in Mixed style
	encoding        suffix
	US-ASCII        .htmlascii
	ISO-8859-1      .html-1
	ISO-8859-2      .html-2
	ISO-8859-3      .html-3
	ISO-8859-4      .html-4
	ISO-8859-5      .html-5
	ISO-8859-6      .html-6
	ISO-8859-7      .html-7
	ISO-8859-8      .html-8
	ISO-8859-9      .html-9
	Shift_JIS       .htmlmib17
	EUC-JP          .htmleuc-jp
	ISO-2022-KR     .htmlkr
	EUC-KR          .htmleuc-kr
	ISO-2022-JP     .htmljp
	ISO-2022-JP-2   .htmljp2
    UTF-7           .htmlutf-7
    UTF-8           .htmlutf-8
	GB2312          .htmlgb2312
	Big5            .htmlbig5
	KOI8-R          .htmlkoi8-r
5. Example of Configuration
5.1. Apache httpd in Seperate style
This is the [Apache] httpd's [AddCharset] configuration sample list.
    AddCharset  US-ASCII       ascii
    AddCharset  ISO-8859-1     8859-1
    AddCharset  ISO-8859-2     8859-2
    AddCharset  ISO-8859-3     8859-3
    AddCharset  ISO-8859-4     8859-4
    AddCharset  ISO-8859-5     8859-5
    AddCharset  ISO-8859-6     8859-6
    AddCharset  ISO-8859-7     8859-7
    AddCharset  ISO-8859-8     8859-8
    AddCharset  ISO-8859-9     8859-9
    AddCharset  Shift_JIS      mib17
    AddCharset  EUC-JP         euc-jp
    AddCharset  ISO-2022-KR    2022kr
    AddCharset  EUC-KR         euc-kr
    AddCharset  ISO-2022-JP    2022jp
    AddCharset  ISO-2022-JP-2  2022jp2
    AddCharset  UTF-7          utf-7
    AddCharset  UTF-8          utf-8
    AddCharset  GB2312         gb2312
    AddCharset  Big5           big5
    AddCharset  KOI8-R         koi8-r
5.2. Apache httpd in Mixed style
This is the [Apache] httpd's configuration sample list.
    AddType  "text/html; charset=US-ASCII"       htmlascii
    AddType  "text/html; charset=ISO-8859-1"     html8859-1
    AddType  "text/html; charset=ISO-8859-2"     html8859-2
    AddType  "text/html; charset=ISO-8859-3"     html8859-3
    AddType  "text/html; charset=ISO-8859-4"     html8859-4
    AddType  "text/html; charset=ISO-8859-5"     html8859-5
    AddType  "text/html; charset=ISO-8859-6"     html8859-6
    AddType  "text/html; charset=ISO-8859-7"     html8859-7
    AddType  "text/html; charset=ISO-8859-8"     html8859-8
    AddType  "text/html; charset=ISO-8859-9"     html8859-9
    AddType  "text/html; charset=Shift_JIS"      htmlmib17
    AddType  "text/html; charset=EUC-JP "        htmleuc-jp
    AddType  "text/html; charset=ISO-2022-KR"    html2022kr
    AddType  "text/html; charset=EUC-KR"         htmleuc-kr
    AddType  "text/html; charset=ISO-2022-JP"    html2022jp
    AddType  "text/html; charset=ISO-2022-JP-2"  html2022jp2
    AddType  "text/html; charset=UTF-7"          htmlutf-7
    AddType  "text/html; charset=UTF-8"          htmlutf-8
    AddType  "text/html; charset=GB2312"         htmlgb2312
    AddType  "text/html; charset=Big5"           htmlbig5
    AddType  "text/html; charset=KOI8-R"         htmlkoi8-r
    AddType  "text/xml; charset=US-ASCII"       xmlascii
    AddType  "text/xml; charset=ISO-8859-1"     xml8859-1
    AddType  "text/xml; charset=ISO-8859-2"     xml8859-2
    AddType  "text/xml; charset=ISO-8859-3"     xml8859-3
    AddType  "text/xml; charset=ISO-8859-4"     xml8859-4
    AddType  "text/xml; charset=ISO-8859-5"     xml8859-5
    AddType  "text/xml; charset=ISO-8859-6"     xml8859-6
    AddType  "text/xml; charset=ISO-8859-7"     xml8859-7
    AddType  "text/xml; charset=ISO-8859-8"     xml8859-8
    AddType  "text/xml; charset=ISO-8859-9"     xml8859-9
    AddType  "text/xml; charset=Shift_JIS"      xmlmib17
    AddType  "text/xml; charset=EUC-JP "        xmleuc-jp
    AddType  "text/xml; charset=ISO-2022-KR"    xml2022kr
    AddType  "text/xml; charset=EUC-KR"         xmleuc-kr
    AddType  "text/xml; charset=ISO-2022-JP"    xml2022jp
    AddType  "text/xml; charset=ISO-2022-JP-2"  xml2022jp2
    AddType  "text/xml; charset=UTF-7"          xmlutf-7
    AddType  "text/xml; charset=UTF-8"          xmlutf-8
    AddType  "text/xml; charset=GB2312"         xmlgb2312
    AddType  "text/xml; charset=Big5"           xmlbig5
    AddType  "text/xml; charset=KOI8-R"         xmlkoi8-r
    AddType  "text/css; charset=US-ASCII"       cssascii
    AddType  "text/css; charset=ISO-8859-1"     css8859-1
    AddType  "text/css; charset=ISO-8859-2"     css8859-2
    AddType  "text/css; charset=ISO-8859-3"     css8859-3
    AddType  "text/css; charset=ISO-8859-4"     css8859-4
    AddType  "text/css; charset=ISO-8859-5"     css8859-5
    AddType  "text/css; charset=ISO-8859-6"     css8859-6
    AddType  "text/css; charset=ISO-8859-7"     css8859-7
    AddType  "text/css; charset=ISO-8859-8"     css8859-8
    AddType  "text/css; charset=ISO-8859-9"     css8859-9
    AddType  "text/css; charset=Shift_JIS"      cssmib17
    AddType  "text/css; charset=EUC-JP "        csseuc-jp
    AddType  "text/css; charset=ISO-2022-KR"    css2022kr
    AddType  "text/css; charset=EUC-KR"         csseuc-kr
    AddType  "text/css; charset=ISO-2022-JP"    css2022jp
    AddType  "text/css; charset=ISO-2022-JP-2"  css2022jp2
    AddType  "text/css; charset=UTF-7"          cssutf-7
    AddType  "text/css; charset=UTF-8"          cssutf-8
    AddType  "text/css; charset=GB2312"         cssgb2312
    AddType  "text/css; charset=Big5"           cssbig5
    AddType  "text/css; charset=KOI8-R"         csskoi8-r
5.3. CERN httpd in Mixed style
This is the [CERN] httpd's configuration sample list.
    AddType  .htmlascii    text/html;charset=US-ASCII       8bit
    AddType  .html8859-1   text/html;charset=ISO-8859-1     8bit
    AddType  .html8859-2   text/html;charset=ISO-8859-2     8bit
    AddType  .html8859-3   text/html;charset=ISO-8859-3     8bit
    AddType  .html8859-4   text/html;charset=ISO-8859-4     8bit
    AddType  .html8859-5   text/html;charset=ISO-8859-5     8bit
    AddType  .html8859-6   text/html;charset=ISO-8859-6     8bit
    AddType  .html8859-7   text/html;charset=ISO-8859-7     8bit
    AddType  .html8859-8   text/html;charset=ISO-8859-8     8bit
    AddType  .html8859-9   text/html;charset=ISO-8859-9     8bit
    AddType  .htmlmib17    text/html;charset=Shift_JIS      8bit
    AddType  .htmleuc-jp   text/html;charset=EUC-JP         8bit
    AddType  .html2022kr   text/html;charset=ISO-2022-KR    8bit
    AddType  .htmleuc-kr   text/html;charset=EUC-KR         8bit
    AddType  .html2022jp   text/html;charset=ISO-2022-JP    8bit
    AddType  .html2022jp2  text/html;charset=ISO-2022-JP-2  8bit
    AddType  .htmlutf-7    text/html;charset=UTF-7          8bit
    AddType  .htmlutf-8    text/html;charset=UTF-8          8bit
    AddType  .htmlgb2312   text/html;charset=GB2312         8bit
    AddType  .htmlbig5     text/html;charset=Big5           8bit
    AddType  .htmlkoi8-r   text/html;charset=KOI8-R         8bit
    AddType  .xmlascii    text/xml;charset=US-ASCII       8bit
    AddType  .xml8859-1   text/xml;charset=ISO-8859-1     8bit
    AddType  .xml8859-2   text/xml;charset=ISO-8859-2     8bit
    AddType  .xml8859-3   text/xml;charset=ISO-8859-3     8bit
    AddType  .xml8859-4   text/xml;charset=ISO-8859-4     8bit
    AddType  .xml8859-5   text/xml;charset=ISO-8859-5     8bit
    AddType  .xml8859-6   text/xml;charset=ISO-8859-6     8bit
    AddType  .xml8859-7   text/xml;charset=ISO-8859-7     8bit
    AddType  .xml8859-8   text/xml;charset=ISO-8859-8     8bit
    AddType  .xml8859-9   text/xml;charset=ISO-8859-9     8bit
    AddType  .xmlmib17    text/xml;charset=Shift_JIS      8bit
    AddType  .xmleuc-jp   text/xml;charset=EUC-JP         8bit
    AddType  .xml2022kr   text/xml;charset=ISO-2022-KR    8bit
    AddType  .xmleuc-kr   text/xml;charset=EUC-KR         8bit
    AddType  .xml2022jp   text/xml;charset=ISO-2022-JP    8bit
    AddType  .xml2022jp2  text/xml;charset=ISO-2022-JP-2  8bit
    AddType  .xmlutf-7    text/xml;charset=UTF-7          8bit
    AddType  .xmlutf-8    text/xml;charset=UTF-8          8bit
    AddType  .xmlgb2312   text/xml;charset=GB2312         8bit
    AddType  .xmlbig5     text/xml;charset=Big5           8bit
    AddType  .xmlkoi8-r   text/xml;charset=KOI8-R         8bit
    AddType  .cssascii    text/css;charset=US-ASCII       8bit
    AddType  .css8859-1   text/css;charset=ISO-8859-1     8bit
    AddType  .css8859-2   text/css;charset=ISO-8859-2     8bit
    AddType  .css8859-3   text/css;charset=ISO-8859-3     8bit
    AddType  .css8859-4   text/css;charset=ISO-8859-4     8bit
    AddType  .css8859-5   text/css;charset=ISO-8859-5     8bit
    AddType  .css8859-6   text/css;charset=ISO-8859-6     8bit
    AddType  .css8859-7   text/css;charset=ISO-8859-7     8bit
    AddType  .css8859-8   text/css;charset=ISO-8859-8     8bit
    AddType  .css8859-9   text/css;charset=ISO-8859-9     8bit
    AddType  .cssmib17    text/css;charset=Shift_JIS      8bit
    AddType  .csseuc-jp   text/css;charset=EUC-JP         8bit
    AddType  .css2022kr   text/css;charset=ISO-2022-KR    8bit
    AddType  .csseuc-kr   text/css;charset=EUC-KR         8bit
    AddType  .css2022jp   text/css;charset=ISO-2022-JP    8bit
    AddType  .css2022jp2  text/css;charset=ISO-2022-JP-2  8bit
    AddType  .cssutf-7    text/css;charset=UTF-7          8bit
    AddType  .cssutf-8    text/css;charset=UTF-8          8bit
    AddType  .cssgb2312   text/css;charset=GB2312         8bit
    AddType  .cssbig5     text/css;charset=Big5           8bit
    AddType  .csskoi8-r   text/css;charset=KOI8-R         8bit
6. Notice
This proposal is no more than one of the many possible ways to 
configure servers.  For example, one WWW server administrator may use 
[AddCharset] patch for Apache 1.3.3, and onother may permit authors of 
HTML documents to use ".htaccess" file for his/her own HTML file type 
configuration, and so on.  We welcome any additional configurations 
to maximize usability for a specific purpose.
If your WWW server can be configured to seperate MIME type and charset 
each other, we encorage you to configure your WWW server in Seperate 
style. Because Mixed style configuration leads to combinatorial 
explosion.
7. Refferences
[Bert Bos]
    Bert Bos,
    Creating a multilingual site with the CERN-httpd server, 1996,
    http://www.w3.org/International/O-help-CERN.html
[HTTP1.1]
    W3C,
    Hypertext Transfer Protocol -- HTTP/1.1, RFC 2068, 1997,
    http://www.w3.org/Protocols/rfc2068/rfc2068
[HTML4.0]
    W3C,
    HTML 4.0 Specification Recommendation, 1997-1998,
    http://www.w3.org/TR/REC-html40/
[Text/css]
    H. Lie, B. Bos, C. Lilley, 
    The text/css Media Type, RFC 2318, 1998
    ftp://ftp.isi.edu/in-notes/rfc2318.txt. 
[Text/xml]
    E. Whitehead, M. Murata, 
    XML Media Types, RFC 2376, 1998
    ftp://ftp.isi.edu/in-notes/rfc2376.txt. 
[IANA]
    IANA,
    Character sets,
    http://www.isi.edu/in-notes/iana/assignments/character-sets
[Apache]
    Apache HTTP Server Project,
    Apache 1.3 User's Guide,
    http://www.apache.org/docs/
[CERN]
    W3C,
    CERN httpd,
    http://www.w3.org/Daemon/
[AddCharset]
    KOGA Youichirou,
    Koga's Apache page, 1998,
    http://www.isoternet.org/~y-koga/Apache/
Author's address
    UCHIDA Akira
    Hachiman 2-11-1-101, Aoba-ku, Sendai, Japan
    Email: uchida@happy.email.ne.jp