The latest specification of the Extensible Markup Language (XML) is available online at the W3C. This specification completely describes XML. But it can be fairly difficult to understand. In this article, we will examine several parts of the XML specification in order to understand the basics of an XML document.
a character is one unit of text, such as a letter, numeral, space, tab, and other Unicode characters
DTD Document Type Definition, the grammar of the XML document
Document Type Declaration, the statement at the top of valid XML documents defining where to find the Document Type Definition
a storage unit for the XML document. Each XML document consists of one or more entities. For example, the HTML tag
<html></html> defines an entire html entity.
XML Extensible Markup Language
As mentioned in the definitions, an XML document is comprised of entities and is well-formed if it conforms to the standards in the XML specification. There are a few basic aspects of an XML document:
- white space
XML treats white space (spaces, tabs, carriage returns) the way HTML does. Multiple white space characters are collapsed down to just one.
- character tags
XML uses the same characters as HTML for indicating tags and elements, specifically
<, >, and &. It also uses the colon (:) within XML names for namespaces.
- other characters
Other ASCII and Unicode characters are taken as literal unless the DTD or other element of the document redefines them.
XML also uses the same comment style you are familiar with in HTML
- processing instructions
These are special tags created to contain instructions for applications. They are indicated with
When you have a large block of XML code you would like to comment out quickly or information you need to mark as data rather than actual code, you can use the
<![CDATA[tag and end the section with
When you start an XML document, you should begin with an XML declaration that indicates the version of XML used in the document. To write a valid XML document, you must also have an associated document type definition (DTD) before the first element in the document. Here is a sample XML document (that would be validated in a validating parser):
<!DOCTYPE firstxml SYSTEM "first.dtd">
The first line
<?xml version="1.0"?> defines the version of XML being used. If your XML document does not conform to the version specified then your document has an error.
<!DOCTYPE firstxml SYSTEM "first.dtd"> is the document type declaration. It indicates the name of the DTD "firstxml" (and this is also the name of the root element for the XML document) and identifies the URL of the DTD as "first.dtd" which is found in the same directory as the XML document.
The third line of the document
<firstxml> is the root element of the document. It is named in the declaration line.
The fourth line of the document,
<greeting>Hello World!</greeting> is the actual XML. The <greeting> tag was defined in the DTD, "first.dtd".
Finally, the last line in the document is the closing of the root element
There is Lots More XML to Learn!
Because XML is so generic, it can get very confusing very quickly. Here are some basic points to remember when starting XML:
- Each XML document should start with the version of XML you are using
- The second line of your document should be the DTD or Document Type Definition, this includes the name of your DTD and its URI or location
<!DOCTYPE mydocument SYSTEM "mydtd.dtd">
Note: if you do not need your document to be validated against a DTD, you may omit this line. You would use different notation if you were going to validate against an XML Schema document.
- The elements in an XML document are defined by
>. XML is case sensitive (ie. <greeting> and <GREETING> and <Greeting> are three different entities), and I recommend that it be written in lower case.
- Standard comments are just like HTML,
<!-- -->, but cannot comment out XML tags.
- To comment out XML tags, you need to use the CDATA tag:
This information is data for the XML document, but is ignored when it is parsed.
<cdata_tag>even this tag is ignored</cdata_tag>
But the tag following the next line is again part of the XML code
Once you understand the basic aspects of an XML document, you’re ready to start creating your own XML documents.