This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

October 06, 2005

To XML or not to XML...

I've been seeing a lot of comments in various forums similar to this comment by Christopher Diggins: XML down a slippery slope. In most of them, there is the implied belief that XML is the solution to all problems and that anything that violates the spirit of XML is to be condemned.

I have personally been working with XML since just after XML 1.0 became a recommendation. I've been through the everything in XML phase and come through it alive. The most important issue that people need to understand is that XML is just a data (or document) format; it is not a religion. It is a very useful data format. It can be a powerful way to represent some kinds of data, but it is not always the best way.

For example, I doubt that anyone would really recommend that we use the following XML anywhere:

<number type="integer" sign="positive">
  <thousand>2</thousand>
  <hundred>0</hundred>
  <ten>0</ten>
  <ones>5</ones>
</number>

Obviously, 2005 is more useful, even though marking up the number as above would be more in the spirit of XML. After all, you might need to do something specific with all numbers that have a 2 in the thousands position and you'll need a micro-parser to deal with this embedded format if you don't mark it up.

Some may consider this to be a bit of a strawman argument, but I would like to propose that it is actually one point along a continuum. Individual numbers and words obviously do not (necessarily) need to be marked up in XML. Just as obviously, a complicated, nested data structure or document greatly benefits from XML markup or something similar.

Diggins references a previous article, XML.com: Painting by Numbers with SVG, which covers a discussion on some of the reasons the SVG recommendation uses the micro format for the path element. According to the SVG Working Group, using an element-based path format could easily result in documents that were twice the size of the chosen approach. The working group decided that this was not acceptable.

In fact, their foresight has paid off. One of the areas where SVG has done very well is in mapping. Maps tend not to have many regular shapes. They are mostly built from paths. On those kinds of documents, the increase in size may be a factor of three or higher. In addition, some of these maps are very large. A factor of three for a multi-megabyte file is much different than a factor of three for a 10K file.

The important thing to remember is that XML is not a religion, it is only a tool we use in solving problems. Every tool involves tradeoffs. Sometimes we decide in the direction of purity of expression, sometimes we bow to practicality. While I personally don't particularly like the path element, I understand the tradeoffs involved. Just as importantly, I'm not sure I would have done differently in their shoes.

It may be useful to consider that the people who worked on these standards were not stupid. They thought and worked on the standard for a long time. Any compromises they made to purity were probably carefully considered. This does not mean that they are always right, but it does mean that second-guessing them without considering all of the use cases they considered might be a bit rash.

Posted by GWade at October 6, 2005 07:14 AM. Email comments