Wednesday, December 06, 2006

Representing Knowledge using XML

Yesterday I was reading a book on XML and structure of a XML document and then suddenly it striked me that, it can be used for representing any kind of data.

1. XML is a metalanguage that allows user to define markup of their document using tags. User can define their own tags and are not bound by the number of pre-defined tags / keywords as in HTML.

2. Nesting of Tags introduces structures. The structure of documents can be controlled using DTDs or Schemas. This can help in verifying the data before it goes for processing.

3. XML separates content and structure from formatting. XML is meant to carry data only and leaves lots of rooms for programs to present it in a different way. Using appropriate XSLs we can generate from an input XML a PDF, CSV, HTML and many other outputs.

4. XML is the de facto standard for the representation of the structured information on the web and supports machine processing information.

5. XML supports the exchange of structured information across different applications through markup, structure and transformations.

6. XML documents can be queried using XML Query language.

While XML has lots of advantages it also has few drawbacks as described below:

1. Nesting of tags does not have standard meaning. They don't always represent the correct structure.

2. The semantics of XML documents is not accessible to machine, to humans only. Machines just parse the XML documents as Child Nodes or Attributes etc. They do not carry any special meaning for machines to understand.

3. Collaboration and exchange are supported if there is an underlying shared understanding of the vocabulary. XML is well-suited for closed collaboration, where domain or community-based vocabularies are used. It is not so well-suited for global communication. As long as the participating organizations use the same schema they can share the XML document or interpret them. There is no such thing as global representation etc.

While the number of benefits appear to be outweighing the drawbacks. I guess still there is lots of work which needs to be done in order to use XML as a way of representing data in a way which can be understood by machines.

A common semantics or grammar is required to do so. What is your opinion?

Until Next Time :)