An XHTML 2 far

Before the weekend W3C announced that the XHTML2 Working Group would be discontinued. That hardly came as any surprise, and mixed with that feeling of relief and melancholy the death of a terminally ill patient may elicit. To me XHTML2 was the next HTML3, another ill-fated W3C spec discontinued at an early stage and superceded by a browser-supported spec, HTML 3.2. The difference was that I had an inside view of XHTML2. …

Close to five years ago I withdrew from the working group as it became evident that I was wasting my time and Opera’s money. More depressingly the HTML working group, as it was known then, was one of the most well-functioning technical working groups I’ve encountered inside or outside of the W3C. Participating in standards committee work doesn’t pay for anyone but the largest companies. Much better to just implement what is eventually specified, ignore it if it is unusable, or write suggestions if the specs are nearly usable, but deficient in specific points.

HTML5 is a vastly better spec than HTML 3.2 was, and vastly bigger, and like HTML4 were later to take up most of the good ideas in HTML3, HTML5 is already subsuming XHTML2.

Much of the animosity has unsurprisingly been about syntax, this goes with the coding mentality. Coders can easily spend more time quarreling about how many spaces should be used for code indentation, or if tabs should be used instead, than they spend on discussing the actual code. Early on this took the guise of a holy war, XML was sent from heaven to cleanse us of the sin that HTML had been tainted with. The W3C took this as a gospel, as did all its working groups, including the HTML Working Group. All working groups except the CSS working group (or for the matter RDF), which long had been criticised for not having converted to XML.

This is all the more the pity, because for much more pragmatic reasons XML is a good thing. I listed two reasons to like XML, it separated errors into two classes, syntax errors and expectation errors, and it has nicer character set handling. The pros and cons of XML is too much to talk about for this entry, but it had an important impact on the HTML Working Group, XML issues kept it occupied for a decade. First with XHTML 1.0, which was HTML 4.0 with an XML syntax. Then XHTML Basic, which was supposed to be a device-friendly subset of XHTML 1.0. Then XHTML Modularization, which was another way to split up XHTML 1.0 so that subsets of the HTML vocabulary could be used by other specifications, though what it most of all showed was that DTDs are pretty much useless for anything. Then XHTML 1.1, which was essentially XHTML 1.0 Strict with a few changes and Ruby, a Japanese annotation system, added. Then a number of other XHTML subsets to show that modularisation was feasible, all promptly ignored.

All in all the working group spent a decade XMLifying HTML, often dropping features, but never adding new features or improving existing features of HTML4. Except that XHTML was pure and clean, and lately modular, there was no compelling reason to choose it over HTML4, and there were a good number of reasons to choose HTML4 over XHTML. In the beginning the browsers had enough to do with catching up to HTML4 so this detour didn’t matter, but towards the end this began to chafe. By then the work on XHTML 2 had begun. It was never officially stated, but one of the chief motivations for having an XHTML2 was to provide compelling reasons to switch from HTML to XHTML, something the previous specs weren’t able to do.

Also unstated XHTML2 was a trojan elephant for XForms. The other changes in XHTML2 were fairly minor, all things considered, but this one was not. This was what started WHATWG, Web Forms 2, Web Applications, and the browser rebellion which caused the shift in W3C governance of HTML. Whether HTML was serialised in XML or not was of lesser importance, though the rebels were unquestioningly syntax warriors as well, and the dual serialisation we have ended up with for HTML5 is a triple win. XForms still exists, but is no longer tightly coupled with (X)HTML. Browsers these days are capable of supporting XForms, but whether they will depends on what web developers want. So far there has been no great uptake, but then again neither has there for Web Forms 2.

With the death of XHTML2 the incentive to choose XML serialisation over HTML serialisation, because the former would give you access to new XHTML2 features HTML4 hasn’t, is gone as well. The same goes for the working group rationale not to publish HTML4 errata as XHTML is the future, HTML5 in effect incorporates the HTML4 errata. This leaves it up to the HTML writers to pick which serialisation to choose. Choice is always bad, but in this case we can live with it. It is fairly probable that the serialisation of HTML6 will be either HTML or XML exclusively, which one we will find out in the decades to come.

In a way it is flattering that many of the new features named as XHTML2 advantages were originally proposed by me, though in all cases they had precedents and I am happy that they have ended up in HTML5 in a better form. This includes the “href everywhere”, in other words that the [href] attribute denotes a hyperlink, not a[href] (for those of an XPath inclination that would be written @href, but why squabble over syntax?). This was based on some observations on handling links, but also on CLink and that any attribute that expects a URI in effect makes that element a reference or link. A comparing document I made on CLink, XLink, and HLink is down now that I don’t work for Opera anymore, but the clink above should do the trick. The crux of my argument is that the markup should provide the link (in XLink sense), while the hyperlink is a matter of presentation.

Contextual headlines having a level based upon the ‘section’ depth was improved upon by the HTML5 version using ‘h1’ instead of adding a new ‘h’ element. The line or ‘l’ element, denoting the lines instead of the ‘br’ break, like the above elements were based on the possibility to style line elements, styling ‘br’ is less interesting.

Join the Conversation

  1. “Choice is always bad […]. It is fairly probable that the serialisation of HTML6 will be either HTML or XML exclusively”Why ?I see advantages in both HTML and XHTML, and I don’t see why it would change.XHTML is extensible and there are plenty of (corporate) applications that needs it to be. HTML is not extensible and the web needs it not to be.

  2. Given the current time-scale, an HTML6 would be formalised in the 2020s to 2030s. By then I would expect a slow migration from one to the other, or even a non-draconian XML 5.0 inspired by the XHTML5 work on error handling. The (non-)extensibility of HTML and XHTML would warrant a second blog entry. As a format the serialisation of HTML is not such a big deal as it has been made out to be.

  3. Well, there’s the exact same debate between compiled and interpreted languages. That discussion has been there for decades, and both types of languages still exist.Some like the flexibility. Others don’t, and prefer the (“draconian”) blocking errors to be reported, as a kind of an initial debugging system.And sometimes, the errors should be handled instead.To have two syntaxes to represent one infoset has a lot of sense to me. I like the idea of a non-draconian XML syntax, but I’d love to see the equivalent draconian syntax to be kept alive !


Your email address will not be published.