Search over 1.4 million articles by over 600 experts
  1. Home
  2. Computing & Technology
  3. Web Design / HTML

More from About.com

Browse Topics A-Z
Examining XML
A Closer Look at the Extensible Markup Language
 Join the Discussion
"I'm interested in learning more about it but have no clue where to start."
DOOZEN
 
 Related Resources
• What is XML?
• What are All Those MLs?
• XML Explained
• XML Articles
• Beginning XML
• XML Information
 
 Elsewhere on the Web
• W3C XML Specification
 
 Stay Up-to-Date
  with Web Design
  Email:
  

By Jennifer Kyrnin

The latest specification of the Extensible Markup Language is available online at the W3C. This specification completely describes XML. But it can be fairly difficult to understand. In this article, we will examine several parts of the XML specification in order to understand the basics of an XML document.

Definitions

characters

a character is one unit of text, such as a letter, numeral, space, tab, and other Unicode characters
DTD
Document Type Definition, the actual grammar of the XML document
Document Type Declaration, the statement at the top of valid XML documents defining where to find the Document Type Definition
entity
a storage unit for the XML document. Each XML document consists of one or more entities. For example, the HTML tag <html></html> defines an entire html entity.
XML
Extensible Markup Language
XML document
a document that is well-formed as described in the XML specification

XML Documents
As mentioned in the definitions, an XML document is comprised of entities and is well-formed if it conforms to the standards in the XML specification. There are some basic aspects of an XML document.

  • white space
    XML treats white space (spaces, tabs, carriage returns) the way HTML does. One or more white space character is treated as only one.
  • character tags
    XML uses the same characters as HTML for indicating tags and elements, specifically <, >, and &. It also uses the colon (:) within XML names for namespaces.
  • other characters
    Other ASCII and Unicode characters are taken as literal unless the DTD or other element of the document redefines them.
  • comments
    XML also uses the same comment style you are familiar with in HTML <-- -->
  • processing instructions
    These are special tags created to contain instructions for applications. They are indicated with <? and ?> tags
  • CDATA
    When you have a large block of XML code you would like to comment out quickly or information you need to mark as data rather than actual code, you can use the <![CDATA[ tag and end the section with ]]>

When you start an XML document, you should begin with an XML declaration that indicates the version of XML used in the document. To write a valid XML document, you must also have an associated document type definition (DTD) before the first element in the document. Here is a sample XML document (that would be validated in a validating parser):

<?xml version="1.0"?>
<!DOCTYPE firstxml SYSTEM "first.dtd">
<firstxml>
<greeting>Hello World!</greeting>
</firstxml>

The first line <?xml version="1.0"?> defines the version of XML being used. If your XML document does not conform to the version specified then your document has an error.

Line 2 <!DOCTYPE firstxml SYSTEM "first.dtd"> is the document type declaration. It indicates the name of the DTD "firstxml" (and this is also the name of the root element for the XML document) and identifies the URL of the DTD as "first.dtd" which is found in the same directory as the XML document.

The third line of the document <firstxml> is the root element of the document. It is named in the declaration line.

The fourth line of the document, <greeting>Hello World!</greeting> is the actual XML. The <greeting> tag was defined in the DTD, "first.dtd".

Finally, the last line in the document is the closing of the root element </firstxml>

Confused Yet?
Because XML is so generic, it can get very confusing very quickly. Here are some basic points to remember when starting XML:

  1. Each XML document should start with the version of XML you are using
    <?xml version="1.0"?>


  2. The second line of your document should be the DTD or Document Type Definition, this includes the name of your DTD and its URI or location
    <!DOCTYPE mydocument SYSTEM "mydtd.dtd">
    Note: if you do not need your document to be validated against a DTD, you may omit this line. You would use different notation if you were going to validate against an XML Schema document.


  3. The elements in an XML document are defined by < and >. XML is case sensitive (ie. <greeting> and <GREETING> and <Greeting> are three different entities), and I recommend that it be written in lower case.


  4. Standard comments are just like HTML, <!-- -->, but cannot comment out XML tags.


  5. To comment out XML tags, you need to use the CDATA tag
    :
    <greeting>Hello World</greeting>
    <![CDATA[
    This information is data for the XML document, but is ignored when it is parsed.
    <cdata_tag>even this tag is ignored</cdata_tag>
    But the tag following the next line is again part of the XML code
    ]]>
    <closing>Goodbye World!</closing>

Once you understand the basic aspects of an XML document, you're ready to start creating your own.

Previous Features

  1. Home
  2. Computing & Technology
  3. Web Design / HTML
  4. About.com Web Design A to Z
  5. Web Design Articles A-H
  6. Web Design/HTML Articles E
  7. Examining XML

©2008 About.com, a part of The New York Times Company.

All rights reserved.