1. About.com
  2. Computing & Technology
  3. Web Design / HTML

Discuss in my forum

Jennifer Kyrnin

Word writes ugly HTML

By , About.com Guide   January 9, 2007

Follow me on:

It used to be true that FrontPage and other HTML editors (including Dreamweaver, in fact) wrote HTML that was ugly. It was often bulky and had lots of extra tags or styles or just extraneous stuff.
Dreamweaver 8
Dreamweaver 8
Image courtesy Adobe.
Well, now we have that same problem - but with Word.

Yes, Word will write HTML, but it's painfully ugly. When you add any special styles to your Word documents, Word saves those styles out as inline CSS in every paragraph. It adds huge lists of classes to the top of the document and they're named such eminently clear things as "MsoNormal" and "SpellE". I suppose there is someone out there who knows what that means. And it adds special Microsoft only styles like "mso-font-signature:536871559 0 0 0 415 0;" Font-signature?

One page I built in Word was 367 characters (with spaces) total as a Word document. When I saved it as a Web page the HTML was 6,204 characters (with spaces). That's almost 17 times as many characters. Converting the page to HTML in Dreamweaver used only 1,476 characters.

Don't use Word to write HTML - use an HTML or Web editor, like Dreamweaver, instead.

Comments
January 9, 2007 at 3:52 pm
(1) Steve says:

MSO stands for MicroSoft Office and all those extra tags are there so that Office knows how to convert it back into a Word document.

January 9, 2007 at 4:19 pm
(2) Dwight Blubaugh says:

I saw the same behaviour and it served its purpose as a very quick and dirty means of putting a flier on a website. Steve is probably right that MSoffice always wants to leave a trail so that when you regain your senses and want the document back in Word format you can still get there :-) I find it only moderately better to convert the .DOC file into a PDF using OpenOffice but that seems to add so many extra steps.

January 9, 2007 at 7:14 pm
(3) phil says:

Well, duh! Even worse, in that it’s deceptive on the part of Microsoft, is that Word XP has a “filtered html” save often (available as a plugin for earlier versions of Word) that allegedly filters out word specific code. I guess it depends on what MS believes to be “word specific”, but even their filtered code is horribly bloated with a bunch of nonsensical, MS-specific code. Even worse, there are no “filter” options that I’m aware of for other MS Office products, and don’t even get me started on MS Publisher files.
The best option where I work is to save a word document as a .txt file, copy/paste it into Dreamweaver, and apply the markup all over again, using of course a proper .css file. Sometimes I wonder just how much faster the www would be if it were free of MS bloat-code.

January 9, 2007 at 8:02 pm
(4) Roger says:

I guess I’ve been around too long. This is the very reason that my HTML editor is still Notepad.

January 9, 2007 at 8:42 pm
(5) Gary says:

I learned HTML viewing source in Notepad.

January 9, 2007 at 8:46 pm
(6) Gary says:

I learned HTML viewing source in Notepad and use it for quick edits. Dreamweaver is great for the WYSIWYG and very clean.

Why not use Word for text creation and copy & paste into Dreamweaver which maintains formatting other than font. Why save as a txt doc first?

January 9, 2007 at 11:13 pm
(7) theron says:

Wrote my first site for work in a text editor about 12 years ago. I’ll admit to using Word’s “save as HTML” when I was in a hurry and wanted the page to look exactly like the original document (on the intranet at work). For real web pages Notepad is fine but FirstPage or HTML-Kit have nice code snippet buttons and they’re free.

January 10, 2007 at 6:02 am
(8) Tara says:

I’ll stick with notepad and other variation of text editors to do my coding. To me, I like those better than the WYSIWYG editors.

January 10, 2007 at 12:57 pm
(9) Jennifer Kyrnin says:

Gary: you asked “Why save as a txt doc first?” and I can give you an answer that I came upon every day in my 9 years working as a corporate Web developer.

The Web developer receives Word docs from the content teams – like Marketing – that they want put up on the Web. In many cases, my Marketing teams (and this happened at 4 companies that I worked for, so I don’t think it’s that uncommon) would style their documents for print and then want those styles to come through in the Web version of the document as well. For example, they might build a fact sheet on a product, and then want the fact sheet information on the product page. So they’d send me a Word document and ask me to put it up on the site.

Converting the Word document to HTML while maintaining the styles and reducing bloated code was always an issue. And of course, they would send me the doc at about 4pm on the day they wanted it to go live – so speed was usually an issue too. :-)

But you’re right, if you’re the person who builds the content, and the content is only going to go on the Web site, then writing it in Word first might not make a lot of sense.

Jennifer

January 10, 2007 at 1:12 pm
(10) Phil says:

Gary asked “why save as a .txt file?”
I work for a government agency, and I don’t know what happens in private industry, but a bureaucracy is a bureaucracy.
Virtually every single word doc I’ve ever received to post on our websites is written using manual markup, rather than using style assignments (body text, heading 1, caption, etc.) This in spite of our agency (including me) providing training for several years in how to properly create word documents. People just don’t seem to understand how important this is for any re-purposing of a document, and it ticks me off because I get paid way too much and have too high of a skillset to be doing their clerical work, but I digress.
I have to convert the files to a .txt file and then restyle it manually to preserve the visual intent. I’m using Dreamweaver MX, not v. 8, and maybe 8 has a better way of doing this, but copy/pasting a word doc sometimes preserves all of the crappy manual markup, and while it’s amusing to see sometimes, it’s a horrible thing to experience when you have tight deadlines.
Please forgive me if I seem to be ranting, but for years I’ve had people beating me up because they send me crap and can’t seem to understand that there isn’t one button I can push to get their content on the web, and they either won’t or can’t believe that they cause most of the delay because they’re too self-absorbed to do their part of the job.
… not that I have an opinion about these things…

January 11, 2007 at 8:07 am
(11) Riann says:

@Phil:
That is exactly what I have to put up with every day. So I am not alone after all ;)
Our clients are using a CMS called WebEdition, and I have to tell them over and over to first save their word content to a txt. Otherwise the CMS will use all the word styles and not the internal css we want it to use – and the site starts to look crappy.

January 12, 2007 at 2:23 am
(12) jeff says:

I’ve seen people use all kinds of software from Microsoft and others to show me what they want for their website, and I’ve created a few available here: WebCityPages.com
It’s actually really helpful since people are more familiar with them than they are with Dreamweaver. However, I always copy the texual content to plain text editor, (notepad) then style or dress them up in DW or other tools. I would never keep MS HTML format since it would take much longer to render on the web.

Leave a Comment

Line and paragraph breaks are automatic. Some HTML allowed: <a href="" title="">, <b>, <i>, <strike>

©2012 About.com. All rights reserved. 

A part of The New York Times Company.