1. Computing

Discuss in my forum

Meta Charset Tag in HTML5

Setting Character Encoding in HTML5

By

In HTML4 to set the character encoding on a document with a META element, you would write:

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">

What is important to notice are the quotation marks around the content attribute: content="text/html; charset=iso-8859-1". They indicate that the entire string text/html; charset=iso-8959-1 is the content of this element.

But a lot of web developers were writing the character encoding META element without any quotation marks:

<meta http-equiv=content-type content=text/html; charset=iso-8859-1>

So, for browsers to interpret this they had to see the element and recognize that charset was not a separate attribute, but rather a part of the content attribute. Luckily for us, browser manufacturers are smart (and kind) and built the browsers so that they would recognize what the developer was trying to say and set the character encoding correctly, even though the element was written incorrectly.

HTML5 Cut Out the Extra Stuff

The HTML5 editors looked at the fact that developers were writing the meta tag incorrectly and that browsers were interpreting it anyway, and decided to make this shortened syntax valid. Now with HTML5 you can add your character encoding with a much easier to remember META element:

<meta charset=utf-8>

Always Include the Character Encoding

You should always include character encoding for your web pages, even if you never use any special characters. If you don't, your site becomes vulnerable to a cross site scripting attack using UTF-7.

The attacker sees that your site has no character encoding defined, so it makes the browser think that the character encoding is UTF-7. Then the attacker injects UTF-7 encoded scripts into the web page, and your site is hacked.

The Character Encoding Should be the First Line of Your HTML After the Root and Head Elements

This ensures that the browser knows what the character encoding is before it does anything else. Your HTML should read:

<!DOCTYPE HTML>
<html>
<head>
<meta charset="UTF-8">
...

You can also specify the character encoding in the HTTP headers. This is even more secure than adding it to the HTML, but you need to have access to the server configurations or .htaccess files.

In Apache, you can set the default character set for your entire site by adding: AddDefaultCharset UTF-8 to your root .htaccess file. Apache's default character set is ISO-8859-1.

©2014 About.com. All rights reserved.