RFC 2070 (rfc2070) - Page 2 of 43
Internationalization of the Hypertext Markup Language
Alternative Format: Original Text Document
RFC 2070 HTML Internationalization January 1997
3. The LANG attribute.............................................. 8
4. Additional entities, attributes and elements ................... 9
4.1. Full Latin-1 entity set .................................... 9
4.2. Markup for language-dependent presentation ................ 10
5. Forms ..........................................................16
5.1. DTD additions ..............................................16
5.2. Form submission ............................................17
6. External character encoding issues .............................18
7. HTML public text ...............................................20
7.1. HTML DTD ...................................................20
7.2. SGML declaration for HTML ..................................35
7.3. ISO Latin 1 character entity set ...........................37
8. Security Considerations.........................................40
Bibliography ......................................................40
Authors' Addresses ................................................43
1. Introduction
The Hypertext Markup Language (HTML) is a markup language used to
create hypertext documents that are platform independent. Initially,
the application of HTML on the World Wide Web was seriously
restricted by its reliance on the ISO-8859-1 coded character set,
which is appropriate only for Western European languages. Despite
this restriction, HTML has been widely used with other languages,
using other coded character sets or character encodings, through
various ad hoc extensions to the language [TAKADA].
This document is meant to address the issue of the
internationalization of HTML by extending the specification of HTML
and giving additional recommendations for proper internationalization
support. It is in good part based on a paper by one of the authors
on multilingualism on the WWW [NICOL]. A foremost consideration is
to make sure that HTML remains a valid application of SGML, while
enabling its use with all languages of the world.
The specific issues addressed are the SGML document character set to
be used for HTML, the proper treatment of the charset parameter
associated with the "text/html" content type and the specification of
some additional elements and entities.
1.1 Scope
HTML has been in use by the World-Wide Web (WWW) global information
initiative since 1990. This specification extends the capabilities
of HTML 2.0 (RFC 1866), primarily by removing the restriction to the
ISO-8859-1 coded character set [ISO-8859].
Yergeau, et. al. Standards Track