1. Technology

What are Markup Languages?

By

HTML is a Markup Language

HTML is a Markup Language

Image courtesy J Kyrnin

What Do the Abbreviations with “ML” in Them Mean?:

Nearly every acronym on the web that has an “ML” in it is a “markup language.” Markup languages are the languages that build the web.

There are many different markup languages. This site focuses on HTML and XML, but there are lots of other markup languages. And there are three that you should be aware of if you are doing web design or development: HTML, XML, and XHTML.

What is a Markup Language?:

A markup language is a language that annotates text so that the computer can manipulate the text. Most markup languages are human readable because the annotations are written in a way to distinguish them from the text. For example, with HTML, XML, and XHTML, the markup tags are < and >. Any text that appears within one of those characters is considered part of the markup language and not part of the annotated text. For example

<p>
this is a paragraph of text written in HTML
</p>

When you format text to be printed (or displayed on a computer or TV), you need to distinguish between the text itself and the instructions for printing the text. The markup is the instructions for displaying or printing the text.

Markup doesn’t have to be computer readable. Annotations done in print or in a book are also markup. For example, many students in school will highlight certain phrases in their text books. This indicates that the highlighted text is more important than the surrounding text. The highlight color is markup.

Markup becomes a language when rules are codified around how to write and use the markup. That same student could have their own “note taking markup language” if they codified rules like “purple highlighter is for definitions, yellow highlighter is for exam details, and pencil notes in the margins are for additional resources.” But most markup languages are defined by an outside authority for use by many different people.

HTML—HyperText Markup Language:

HTML or HyperText Markup Language is the language of the web. All web pages are written in HTML. HTML defines the way that images, multimedia, and text are displayed in web browsers. It includes elements to connect your documents (hypertext) and make your web documents interactive (such as with forms).

HTML is a defined standard markup language. That standard was developed by the World Wide Web Consortium (W3C). It is based upon SGML (Standard Generalized Markup Language). It is a language that uses tags to define the structure of your text. Elements and tags are defined by the < and > characters.

But HTML is no longer the only standard for web development. As HTML was developed it got more and more complicated and the style and content tags combined into one language. Eventually, the W3C decided that there was a need for a separation between the style of a web page and the content. A tag that defines the content alone, such as H1, would remain in HTML while, tags that define style, such as FONT, are deprecated in HTML 4.01 in favor of style sheets.

And the newest version of HTML is HTML5. HTML5 adds more features into HTML and removes some of the strictness that was imposed by XHTML. But HTML5 is still a markup language.

XML—eXtensible Markup Language:

The eXtensible Markup Language is the language that another version of HTML is based on. Like HTML, XML is also based off of SGML. It is less strict than SGML and more strict than plain HTML, and provides the extensibility to create various different languages.

XML is a language for writing markup languages. For example, if you are working on genealogy, you might create tags using XML to define the father, mother, daughter, and son in your XML like this: <father> <mother> <daughter> <son>. There are also several standardized languages already created with XML: MathML for defining mathematics, SMIL for working with multimedia, XHTML, and many others.

XHTML—eXtended HyperText Markup Language:

XHTML 1.0 is HTML 4.0 redefined to meet the XML standard. There aren't a lot of major differences between HTML and XHTML:

  • XHTML is written in lower case. While HTML tags can be written in UPPER case, MiXeD case, or lower case, to be correct, XHTML tags must be all lower case.
  • All XHTML elements must have an end tag. Elements with only one tag, such as HR and IMG need a closing slash (/) at the end of the tag:
    <hr />
    <img />
  • All attributes must be quoted in XHTML. Some people remove the quotes around attributes to save space, but they are required for correct XHTML.
  • XHTML requires that tags are nested correctly. If you open a bold (B) element and then an italics (I) element, you must close the italics element (</i>) before you close the bold (</b>).
  • HTML Attributes must have a name and a value. Attributes that are stand-alone in HTML must be declared with values as well, for example, the HR attribute noshade would be written noshade="noshade".

©2014 About.com. All rights reserved.