This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

February 05, 2004

XML Data Representation

I had an interesting thought in an email conversation with a friend yesterday. One problem many people have when using XML for data is a misunderstanding of what the XML is.

(If you don't believe in the data in XML approach, feel free to ignore me.<grin/>)

It's easy to make the mistake of treating the XML as if it is the data when you are first learning to use XML this way. But it is really important to realize that the XML represents the data, it is not the same as the data.

You would never have problems with the concept that a line chart or pie chart is not the data, they are just representations of the data. XML is just another representation.

How does that help? In much the same way that you decide to add or remove information from a line chart to make it serve its purpose better, you can do the same with XML. Let's look at some of the representation only issues you consider when making a line chart. The most obvious information removed is the actual values. On a line chart the trends and relative levels appear to be more important. On the other hand, many line charts color is often added to provide differentiation between different kinds of data or different levels. Error bars are sometimes added to enhance your understanding of the fuzziness of the data.

All of these changes do not actually change the data, they just change the representation. In some cases, they might add implied information (error range, data grouping) or remove extra unneeded details (values). But, the data remains.

I have realized that the same is true of XML (when used for data). You may include structure or grouping in data that isn't evident in the raw values. You may add scaling or units that are implied in the original data. You can even add links to explanations of results. This allows for a richer representation of the data. So you really aren't limited to how you represented your data inside your application. I have sometimes marked up a data set with exactly those pieces of implied information that have always given me problems when communicating between programs. Since I was using XML as an interchange format, making the implied assumption explicit simplifies the overall project.

Posted by GWade at February 5, 2004 05:45 PM. Email comments