Extensible Markup Language (XML)

 

XML describes a class of data objects called XML documents and partially describes the behavior of computer programs that process them. XML documents consist of storage units called entities that contain either parsed or unparsed data.  Parsed data consists of characters, some of which form markup; the rest form character data. Markup encodes a description of the document’s logical structure. Unparsed data is a resource whose content may or may not be text; it may not to be XML. A software module called XML processor is used to process XML documents and provide access to their structure and content.

 

XML was designed to be the standard format for describing and exchanging structured data on the web. It can be seen that XML/DTD is suitable for representing both the relational data model (which consists of tables and is flat) and the object-oriented data model (which consists of classes and is hierarchical). However, DTD is quit simple, so the following features of the relational data model cannot be directly represented in DTD:

 

- Primary keys and secondary keys

- Data types other than strings

- Constrains

For the same reason, the following features of the object-oriented data model cannot be directly represented in DTD:

 

- Keys

- Data types other than strings

- Inheritance

 

XML has a hierarchical, containment-based data model; consequently it can be used to advantage when representing an object-oriented data model.

 

For example, we can easily represent the fact that a Country may contain Cities and that, when it does, the City that it contains can reference it. 

 

Each attribute declaration provides information on whether the attribute’s presence is required, and if not, how an XML processor should react:

 

#REQUIRED – the attribute must always be provided.

#IMPLIED – no default value is provided

Default value – the default value is used if the attribute is not provided.

#FIXED default value – the attribute must always have the default value

 

ArcWorld.dtd

 

<?xml version=”1.0” encoding=”UTF-8”?>

<!—ArcWorld.dtd-->

<!ELEMENT ArcWorld(Country*, City*, Architect*, Building*, Church*)>

 

<!ELEMENT Country EMPTY>

<!ATTLIST Country

name CDATA #REQUIRED>

 

<!ELEMENT City EMPTY>

<!ATTLIST City

name CDATA #REQUIRED

country CDATA #IMPLIED>

 

<!ELEMENT Architect EMPTY>

<!ATTLIST Architect

name CDATA #REQUIRED

nationality CDATA #IMPLIED >

 

 

<!ELEMENT Building EMPTY>

<!ATTLIST Building

name CDATA #REQUIRED

type CDATA #IMPLIED

address CDATA #IMPLIED

city CDATA #IMPLIED

yearBuilt CDATA #IMPLIED

architect CDATA #IMPLIED

style CDATA #IMPLIED

description CDATA #IMPLIED>

 

<!ELEMENT Church EMPTY>

<!ATTLIST Church

name CDATA #REQUIRED

type CDATA #IMPLIED

address CDATA #IMPLIED

city CDATA #IMPLIED

yearBuilt CDATA #IMPLIED

architect CDATA #IMPLIED

style CDATA #IMPLIED

description CDATA #IMPLIED

denomination CDATA #IMPLIED

pastor CDATA #IMPLIED>

 

ArcWorld.xml

<?xml version=”1.0” ?>

<!--ArcWorld.xml -->

<!DOCTYPE ArcWorld SYSTEM “ArcWorld.dtd”>

 

<ArcWorld>

 

<Country name=”USA”/>

<Country name=”France”/>

<Country name=”China”/>

 

<City name=”Washington” country=”USA”/>

<City name=”Paris” country=”France”/>

<City name=”Beijing” country=”China”/>

 

<Building name=”Lincoln Memorial” city=”Washington”/>

<Building name=”National Gallery” city=”Washington”/>

<Building name=”The Capitol” city=”Washington”/>

<Building name=”Washington Monument” city=”Washington”/>

<Building name=”Arc de Triumph” city=”Paris”/>

<Building name=”Eiffel Tower” city=”Paris”/>

<Building name=”Louvre” city=”Paris”/>

<Building name=”Great Wall” city=”Beijing”/>

<Building name=”Tiananmen” city=”Beijing”/>

 

</ArcWorld>

XML provides a standard that can be used to encode the structure and content of all sorts of information, from simple to complex. XML can encode the representation for:

 

- An ordinary document

- A structured record, such as a purchase order

- A data record, such as the result set of a query

- An object, with data and methods, such as the persistent form of a Java object

- Metadata (schema) entities and types, such as XMI

- Meta-content about a Web site, such as CDF(Channel Definition Format)

 

XML can encode not only the information itself but also its metadata. In the case of metadata, it will be the meta-metadata. As such, the XML encoding is self-describing and can be parsed, interpreted, and processed by machines without human intervention.

 

XML provides a powerful and flexible format for expressing data. It can be used as:

 

- Exchange format for sharing data, such as between an application and a database.

- Write format for transferring data, such as between a client and a server

- Persistence format for storing data, such as in the case of a document repository

When used as exchange format for sharing data, XML by itself is not sufficient. Even with DTD, XML only encodes the syntactic (structural) information of that data. It does not provide any semantic information (meaning) about the data. What is needed is an XML vocabularies already exist in certain domains, and many more will come in the future. Some examples are:

 

- Channel Definition Format (CDF), for describing Web content

- Open Financial Exchange Format (OFX), for exchanging financial data and instructions among financial institutions

- Open Software Distribution(OSD), for describing software components, their versions, their underlying structure, and their relationships to other components

- Chemical Markup Language (CML)

- Mathematical Markup Language (MML)

 

XML, XML Namespace, Xlink, and Xpointer are the core technologies required to encode or represent data as XML documents. Additional technologies are needed to process and display them. The key technologies are DOM (Document Object Model), a platform-neutral and language-neutral interface that allows programs and scripts to dynamically access and update the structure and content of XML documents, and XSL (extensible Style Language), a mechanism for adding style (for example, fonts, colors, and spacing) to XML documents.

 

Also needed are various types of tools. The most urgent ones are: XML processors, XML views, and XML editors. Among these, a number of XML processors are widely in use, some validating and some not.