Where is the HTML5 Document Type Definition?

26,665

Solution 1

There is no HTML5 DTD. The HTML5 RC explicitly says this when discussing XHTML serialization, and this clearly applies to HTML serialization as well.

DTDs have been regarded by the designers of HTML5 as too limited in expressive power, and HTML5 validators (basically the HTML5 mode of http://validator.nu and its copy at http://validator.w3.org/nu/) use schemas and ad hoc checks, not DTD-based validation.

Moreover, HTML5 has been designed so that writing a DTD for it is impossible. For example, there is no SGML way to capture the HTML5 rule that any attribute name that starts with “data-” and complies with certain general rules is valid. In SGML, attributes need to be listed individually, so a DTD would need to be infinite.

It is possible to design DTDs that correspond to HTML5 with some omissions and perhaps with some extra rules imposed, but they won’t really be HTML5 DTDs. My experiment with the idea is not very encouraging: too many limitations, too tricky, and the DTD would need to be so permissive that many syntax errors would go uncaught.

Solution 2

Correct. There is no DTD. However, HTML5 documents should start with <!DOCTYPE html> So there's a DOCTYPE, but no DTD.

See:

Solution 3

I have created an HTML5 DTD for use in my PHP XML projects. It ain't beautiful, but it works with well-formed XHTML5 (that is, HTML5 expressed as XML).

You can grab it from my bitbucket account here:

https://bitbucket.org/kashbridge/dtd/overview

Enjoy!

Solution 4

Certain Marcus from sgmljs.net created and analyzed an SGML DTD for HTML 5.1 and started a thread in the XML-DEV mailing list for review and discussion. The discussion revolves around entity definitions so far.

I've just completed my analysis of W3C's HTML 5.1 recommendation at http://sgmljs.net/docs/html5.html (from a markup language rather than web development PoV), and I'm publishing it here for review in the form of an initial SGML DTD for parsing HTML 5.1, along with a lengthy analysis text.

[…]

I'm aware that WHATWG and W3C have since long moved away from SGML (and XML in most web-related specification work), treating it as a legacy technique and with a somewhat presumptuous attitude in the specification text and elsewhere. But as the analysis of HTML5's grammar shows, they've essentially abandoned use of any formal methods alltogether (and it shows in at least two flaws discussed in the analysis).

Nothing official yet, but maybe this initiative will get traction, or at least find its users as an unofficial resource.

Share:
26,665
Šime Vidas
Author by

Šime Vidas

I write dailies for the "Open Web Platform Daily Digest" at webplatformdaily.org. E-mail: [email protected] Twitter: http://twitter.com/simevidas

Updated on July 05, 2022

Comments

  • Šime Vidas
    Šime Vidas almost 2 years

    The "old" HTML/XHTML standards have a DTD (Document Type Definition) defined for them:

    HTML 4.01 http://www.w3.org/TR/html401/sgml/dtd.html
    XHTML 1.0 http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict

    This DTDs specify the rules for nesting elements - "which types of elements may appear in which types of elements". I made a diagram for XHTML 1.0 here (sorry, I no longer have that resource)

    I would like to update that diagram with a new version which also includes the new HTML5 elements. However, there doesn't seem to be a HTML5 DTD. It seems that the nesting rules are defined by the various content models that are defined in HTML5.

    So there is no DTD, correct?

    Follow-up question: Is there a reason why there is no DTD in HTML5? The DTD is such a nice method of defining the nesting rules for all the different types of elements. Why wouldn't they include such a thing?

    Update: I found this: http://www.w3.org/TR/html5/dom.html#kinds-of-content I guess, this is the closest to having a DTD.

    Update: The Visual Studio Team made a XML Schema for XHTML5. I guess that answers my question: Link

  • Šime Vidas
    Šime Vidas over 13 years
    @Adam The DOCTPYE has no reference to a DTD, obviously. However, it would be nice if there would be an "unofficial" DTD just for the sake of having a good overview of the nesting rules...
  • Adam
    Adam over 13 years
    @Šime Vidas The DTD is from HTML's SGML roots. HTML5 is no longer based on SGML so there is no DTD.
  • Davis Peixoto
    Davis Peixoto over 13 years
    +1 concise answer. Also, worth to mention that HTML5 is currently a working draft, with a bunch of changes in the last months. A DTD makes sense after reaching a stable status, which is not the case right now. Despite it is safe to assume and use some elements and APIs which are stable, but the whole spec isn't.
  • Šime Vidas
    Šime Vidas over 13 years
    @Adam But what about XHTML5? It is an application of XML. So, it should have a DTD or XML Shema, right?
  • Adam
    Adam over 13 years
    @Šime Vidas Good point. I didn't know about XHTML5. You're right, it should be possible to create one. I did a quick search to see if anyone had made one and I found johndyer.name/post/… and for HTML5 entities w3.org/2003/entities/2007/w3centities-f.ent
  • Šime Vidas
    Šime Vidas over 13 years
    @Adam Excellent. The link to the XML Shema is here: blogs.msdn.com/b/webdevtools/archive/2009/11/18/…
  • RubenGeert
    RubenGeert over 11 years
    @Adam Sorry to respond to this old thread but if I leave out the DTD, then how can the browser know how I'd like it to interpret the HTML it receives?
  • Adam
    Adam over 11 years
    @pythonforspss.org The browser knows from the doctype that the document is HTML5. Modern browsers know how to interpret HTML5.
  • Jukka K. Korpela
    Jukka K. Korpela about 11 years
    DTD is an SGML and XML thing. An XML DTD is even more limited in expressive power than an SGML DTD; XML is a simplification of SGML in this area, too.
  • pgoetz
    pgoetz about 9 years
    @JukkaK.Korpela - not sure if still care about this, but the <colgroup> entry in your faux HTML5 DTD seems to be definitely incorrect. The only allowed child is <col>, and this doesn't seem to be included, while invalid children are listed via %phrase;
  • blagus
    blagus over 7 years
    All this is so unfortunate; there are somethings that make no sense doing, e.g. a head tag inside a body tag, or a div inside a span. So, there should be a way to validate your HTML syntax, just like a javascript would throw logic errors when you commit a logic mistake.
  • Palec
    Palec over 7 years
    I found the info about the initiative interesting and this answer certainly brought more than just a link, @cpburnz (and other reviewers). Another answer in this Q&A has very similar content – just a link and a short description of another unofficial HTML 5 DTD. It got 6 upvotes and no downvotes. I included relevant info from the xml-dev list and I don’t see a better way to answer this question now.
  • Palec
    Palec over 7 years
    The DTD by Jukka K. Korpela has been already mentioned by himself in the accepted answer, @Hibou57.
  • raner
    raner over 2 years
    Another hand-rolled DTD for HTML5 is provided in an answer to this related question
  • raner
    raner over 2 years
    Another answer to the same question also offers a possible solution, though it is not completely clear whether that DTD is actually open-source.
  • That Realty Programmer Guy
    That Realty Programmer Guy about 2 years
    @blagus I agree; though perhaps you hit the nail on the head there. You could write a validator in javascript at least