19 SGML reference information for HTML

Contents

  1. Document Validation
  2. Sample SGML catalog

The following sections contain the formal SGML definition of HTML 4.0. It includes the SGML declaration, the Document Type Definition (DTD), and the Character entity references, as well as a sample SGML catalog.

These files are also available in ASCII format as listed below:

Default DTD:
http://www.w3.org/TR/REC-html40/strict.dtd
Transitional DTD:
http://www.w3.org/TR/REC-html40/loose.dtd
Frameset DTD:
http://www.w3.org/TR/REC-html40/frameset.dtd
SGML declaration:
http://www.w3.org/TR/REC-html40/HTML4.decl
Entity definition files:
http://www.w3.org/TR/REC-html40/HTMLspecial.ent
http://www.w3.org/TR/REC-html40/HTMLsymbol.ent
http://www.w3.org/TR/REC-html40/HTMLlat1.ent
A sample catalog:
http://www.w3.org/TR/REC-html40/HTML4.cat

19.1 Document Validation

Many authors rely on a limited set of browsers to check on the documents they produce, assuming that if the browsers can render their documents they are valid. Unfortunately, this is a very ineffective means of verifying a document's validity precisely because browsers are designed to cope with invalid documents by rendering them as well as they can to avoid frustrating users.

For better validation, you should check your document against an SGML parser such as nsgmls (see [SP]), to verify that HTML documents conform to the HTML 4.0 DTD. If the document type declaration of your document includes a URI and your SGML parser supports this type of system identifier, it will get the DTD directly. Otherwise you can use the following sample SGML catalog. It assumes that the DTD has been saved as the file "strict.dtd" and that the entities are in the files "HTMLlat1.ent", "HTMLsymbol.ent" and "HTMLspecial.ent". In any case, make sure your SGML parser is capable of handling Unicode. See your validation tool documentation for further details.

Beware that such validation, although useful and highly recommended, does not guarantee that a document fully conforms to the HTML 4.0 specification. This is because an SGML parser relies solely on the given SGML DTD which does not express all aspects of a valid HTML 4.0 document. Specifically, an SGML parser ensures that the syntax, the structure, the list of elements, and their attributes are valid. But for instance, it cannot catch errors such as setting the width attribute of an IMG element to an invalid value (i.e., "foo" or "12.5"). Although the specification restricts the value for this attribute to an "integer representing a length in pixels," the DTD only defines it to be CDATA, which actually allows any value. Only a specialized program could capture the complete specification of HTML 4.0.

Nevertheless, this type of validation is still highly recommended since it permits the detection of a large set of errors that make documents invalid.

19.2 Sample SGML catalog

This catalog includes the override directive to ensure that processing software such as nsgmls uses public identifiers in preference to system identifiers. This means that users do not have to be connected to the Web when retrieving URI-based system identifiers.

OVERRIDE YES

PUBLIC "-//W3C//DTD HTML 4.0//EN" strict.dtd
PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" loose.dtd
PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN" frameset.dtd
PUBLIC "-//W3C//ENTITIES Latin1//EN//HTML" HTMLlat1.ent
PUBLIC "-//W3C//ENTITIES Special//EN//HTML" HTMLspecial.ent
PUBLIC "-//W3C//ENTITIES Symbols//EN//HTML" HTMLsymbol.ent