Wednesday, February 22, 2006

XML, Future Wave of the Future

It is my goal here at OaO to write about math/science/engineering in a way that is neither math-y, nor science-y, nor engineering-y. It is a task that I constantly keep in mind whilst writing. It is, concurrently, a task at which I am consistently failing. This, apparently, is because I never explain the (supposedly) non-technical narrative model I'm using to talk about things. I am bad writer. But is okay. I fix problem. I start with XML.

You can look up the wikipedia entry for XML as well as I can, so I won't repeat anything definitional; that definition also lacks any of the nuances I was trying to invoke in my last entry anyway. In the comments, Sam has asked the existential question, "So is XML like a grammar or something?" Yes, it is, sort of. But it's more of a language for making languages. It's a language about language. A meta-language if you will. Mmm...meta.

Here are some key points of the nuts and bolts: xml is made up of tags which form a tree of information about something. Here's a sample one:

<people>
<person type="freak of nature">
<name>Sam</name>
<occupation>Professor</occupation>
<iPod>true</iPod>
<blog>
<url>http://secondamericano.blogspot.com</url>
<host>Blogger</host>
<subject>
How you can't really get down
to work until your second one.
</subject>
<coauthor ref="person" name="Rebecca"/>
</blog>
</person>
</people>

This tells you there's a class of things called "people." One of the members of that set is a "person," which apparently come in different types, like "freak of nature." Then a person has a bunch of attributes, like his name and his job, and whether he owns an iPod. Then in this model a person has a sub-attribute called "blog" which tells you information about his blog. Note that the blog co-author has the refence type "person"--this gives us a clue that somewhere else in this document, there might be a "person" named "Rebecca" defined. Note also that this tag doesn't have a close tag the way the others do--if we define all of the tag's attributes within the tag, we can just end it by putting a "/" before the closing ">".

The real key here, though, is what this document doesn't say. It doesn't say anything about what you should do with this information. It doesn't say how you should read it, or display it, or feed it into some sort of computer program. The point of XML is that other than tags and attributes (which are themselves pretty flexible), the people who pass these documents around decide what they mean. You could, for instance, write a little plug-in for your browser that read this document and produced an image of a person. The image could have a name tag that read "Sam," it could put him in his graduate robes since he's a professor, and he could be carrying a little iPod icon. Next to him could be an image of a computer that would navigate to his blog when you clicked on it. So an XML document can be extremely meaningful, or totally meaningless. It's just like HTML--HTML is meaningful to a web browser and meaningless some simple text-editing program. The Browser gets it and displays things in fonts and nicely formatted with colors and images and such. A text editor opens it up and doesn't know any of the rules that go with HTML, so lacking anything better to do, just shows you the text.

This is what a newsreader does with an RSS feed. An RSS feed is a simple XML document that you get via a web service call, and contains the bare minimum of data about the news:

<thenews>
<story>
<headline>
Cheney Shoots Bush in Hunting "Accident"
</headline>
<url>
http://www.nytimes.com/stories/accident.html
</url>
<summary>
Vice President Dick Cheney "accidentally" "shot"
President George Bush today whilst the two were
in the oval office. A spokesman said they were
"hunting" at the time.
</summary>
</story>
<story>
<headline>
Neoconservatives: Ethics Are Post-Hoc
Rationalizations To Justify What People Were
Already Going To Do Anyway
</headline>
<url>
http://www.washingtonpost.com/neocons.html
</url>
<summary>
The Project For The New American Century today
released a report revealing that the entire
foundation of Ethics is built upon a facade people
use to justify what they were already going to do
anyway. It is expected that this revelation will
have no effect on anything whatsoever.
</summary>
</story>
</thenews>


A newsreader or browser can take this and pretty much do whatever it wants with it, since nobody has really developed a standard for what RSS feeds should "look like." And, frankly, there's no real need to--the idea behind RSS feeds is that they're supposed to be simple, and that any style and formatting will be left up to the displayer. This is in marked contrast to HTML, where the formatting and styles are rigidly defined.

Calvino: I get it! HTML is a form of XML, except that HTML came first.

The Stoat: Yes. Ish. Technically, they are both offshoots of SGML, and HTML in its original form doesn't quite work as XML, but the W3 Consortium is working on a new standard for HTML called XHTML to fix that.

Calvino: Isn't "consortium" a great word?

The Stoat: It is.

Maybe you are, at this point, seeing the abstract beauty of XML and its derivations that exists outside of its rather abstract and technical nature. Or maybe not. Whatever. The examples I have given above are simple XML documents. You could also, as in HTML, rigidly define how those documents should be "understood." But it's a much more interesting metaphor if people just ignore the definitions, or if you just don't publish any. An XML document can have multiple translations, and all of them are meaningful, and they can all still be "true" translations of the original material. What I was trying to point out yesterday is that XML is all about Reader Response. You read the XML. You divine your own meaning. You build models and tools out of that meaning. You publish. The world is made new again. Repeat ad infinitum.

Next: Language doesn't work that way!
Tags: , ,

2 comments:

Rebecca said...

thesis du jour:
XML is Derridean. discuss.

Seriously. it's not reader-response. that's too structuralist. it's something better, more poetic. more about the role of the reader in (re)creating the text through reading. producing something new each time you read. very cool

Sam said...

Where the hell did my comment go? Was it removed by the 'moderator' (i.e. Calvino)?

Anyway, 24 hours ago, I said this:

Excellent! I GET it. And also, Rebecca is right that it's far beyond reader-response. I wish I could go write something in XML - perhaps the next chapter of my book.