1. Computing

Discuss in my forum

Examining XML

A Closer Look at the Extensible Markup Language

By , About.com Guide

XML

XML

Image © J Kyrnin
See More About

The latest specification of the Extensible Markup Language (XML) is available online at the W3C. This specification completely describes XML. But it can be fairly difficult to understand. In this article, we will examine several parts of the XML specification in order to understand the basics of an XML document.

XML Definitions

characters
a character is one unit of text, such as a letter, numeral, space, tab, and other Unicode characters

DTD Document Type Definition, the grammar of the XML document

Document Type Declaration, the statement at the top of valid XML documents defining where to find the Document Type Definition

entity a storage unit for the XML document. Each XML document consists of one or more entities. For example, the HTML tag <html></html> defines an entire html entity.

XML Extensible Markup Language

XML document a document that is well-formed as described in the XML specification.

XML Documents

As mentioned in the definitions, an XML document is comprised of entities and is well-formed if it conforms to the standards in the XML specification. There are a few basic aspects of an XML document:

  • white space
    XML treats white space (spaces, tabs, carriage returns) the way HTML does. Multiple white space characters are collapsed down to just one.
  • character tags
    XML uses the same characters as HTML for indicating tags and elements, specifically <, >, and &. It also uses the colon (:) within XML names for namespaces.
  • other characters
    Other ASCII and Unicode characters are taken as literal unless the DTD or other element of the document redefines them.
  • comments
    XML also uses the same comment style you are familiar with in HTML <-- -->
  • processing instructions
    These are special tags created to contain instructions for applications. They are indicated with <? and ?> tags
  • CDATA
    When you have a large block of XML code you would like to comment out quickly or information you need to mark as data rather than actual code, you can use the <![CDATA[ tag and end the section with ]]>

When you start an XML document, you should begin with an XML declaration that indicates the version of XML used in the document. To write a valid XML document, you must also have an associated document type definition (DTD) before the first element in the document. Here is a sample XML document (that would be validated in a validating parser):

<?xml version="1.0"?>
<!DOCTYPE firstxml SYSTEM "first.dtd">
<firstxml>
<greeting>Hello World!</greeting>
</firstxml>

The first line <?xml version="1.0"?> defines the version of XML being used. If your XML document does not conform to the version specified then your document has an error.

Line 2 <!DOCTYPE firstxml SYSTEM "first.dtd"> is the document type declaration. It indicates the name of the DTD "firstxml" (and this is also the name of the root element for the XML document) and identifies the URL of the DTD as "first.dtd" which is found in the same directory as the XML document.

The third line of the document <firstxml> is the root element of the document. It is named in the declaration line.

The fourth line of the document, <greeting>Hello World!</greeting> is the actual XML. The <greeting> tag was defined in the DTD, "first.dtd".

Finally, the last line in the document is the closing of the root element </firstxml>

There is Lots More XML to Learn!

Because XML is so generic, it can get very confusing very quickly. Here are some basic points to remember when starting XML:

  1. Each XML document should start with the version of XML you are using
    <?xml version="1.0"?>
  2. The second line of your document should be the DTD or Document Type Definition, this includes the name of your DTD and its URI or location
    <!DOCTYPE mydocument SYSTEM "mydtd.dtd">
    Note: if you do not need your document to be validated against a DTD, you may omit this line. You would use different notation if you were going to validate against an XML Schema document.
  3. The elements in an XML document are defined by <and >. XML is case sensitive (ie. <greeting> and <GREETING> and <Greeting> are three different entities), and I recommend that it be written in lower case.
  4. Standard comments are just like HTML, <!-- -->, but cannot comment out XML tags.
  5. To comment out XML tags, you need to use the CDATA tag:
    <greeting>Hello World</greeting>
    <![CDATA[
    This information is data for the XML document, but is ignored when it is parsed.
    <cdata_tag>even this tag is ignored</cdata_tag>
    But the tag following the next line is again part of the XML code
    ]]>

    <closing>Goodbye World!</closing>

Once you understand the basic aspects of an XML document, you’re ready to start creating your own XML documents.

Most Recent Articles

  1. About.com
  2. Computing
  3. Web Design / HTML
  4. XML
  5. Beginning XML
  6. Beginning XML - Examining XML - A Closer Look at the Extensible Markup Language

©2013 About.com. All rights reserved.