Making Web Pages With Word?

September 2, 2021

If you’ve ever examined the messy HTML that results from doing a Save As HTML from Microsoft Word, you can appreciate [Jim Yuill]’s motivation for his WordWebNav (WWN) project. [Jim] uses Word to document his technical projects, and wanted an easy way to generate web pages. Not only is Word-generated HTML nearly unreadable, [Jim] notes there are known bugs, as well. His project attempts to solve these shortcomings, and adds new features like a navigation pane and headers, among others. Here is a link to a dummy project which shows off these features.

There are, of course, other ways of generating web pages from your technical documentation — there is the Markdown / Pandoc combination, various Wiki solutions, or GitHub Pages, for example. If you’re Python-focused, there’s always the Jupyter Notebooks / JupyterLab approach which we wrote about in 2019. But these presume the source documents are in a certain format. If you have years of existing documentation in Word, or you prefer (or are required) to use Word, [Jim]’s WWN tool might be of interest.

The open source, Python-based program can be found in the project’s GitHub repository. [Jim] has a lot of experience writing software, and the clean and well-organized source code reflects this. Do you convert project documentation to HTML for browsing, be it local or online? If so, share your techniques in the comments below.

28 thoughts on “Making Web Pages With Word?”

ubvt says:

September 2, 2021 at 11:17 pm

Hint: use TeX.

Report comment

Reply
1. Chris says:
  
  September 3, 2021 at 6:49 am
  
  Hinthint: LaTeX ( Texstudio is my preferred editor)
  Plain TeX is… *Tilt*
  
  Report comment
  
  Reply
Glen Kleinschmidt says:

September 3, 2021 at 12:01 am

IMO, the best way to publish and distribute a lengthy technical document in MS Word is to print it to a PDF. I find it annoying when I stumble upon a technical document worth saving/archiving and it’s presented in HTML – damn awful to print too.

Report comment

Reply
Luke says:

September 3, 2021 at 3:32 am

Bring back FrontPage Express!

Report comment

Reply
1. geocrasher says:
  
  September 3, 2021 at 5:14 am
  
  /passes out
  
  Report comment
  
  Reply
2. bastetfurry says:
  
  September 3, 2021 at 6:02 am
  
  *plasters everything with “Proudly made with Notepad” buttons*
  
  Report comment
  
  Reply
  1. ArcReactorKC says:
    
    September 3, 2021 at 8:55 am
    
    *ahem* notepad++
    
    Report comment
    
    Reply
    1. brucedesertrat says:
      
      September 3, 2021 at 12:14 pm
      
      edlin or bust. Had a co-orker who wrote a term paper in high school with it.
      
      Report comment
      
      Reply
thoriumbr says:

September 3, 2021 at 3:49 am

Word is great for normal text, but terrible for technical papers. I had lost count of how many times I got a script from Word and all dashes and quotes are helpfully changed.
I use Markdown for that, it’s made for it. TeX works too, but it’s way more easy to learn Markdown. I reserve TeX for scientific papers.

Report comment

Reply
1. Elliot Williams says:
  
  September 3, 2021 at 4:24 am
  
  Back in the day, I had a Markdown++ that I’d do academic work with, and a Python pre-processor that would turn it into either Latex for publication or Beamer/Latex for presentation, depending.
  
  It was all hacks and regex text substitutions, but it make life easy.
  
  Then along came AsciiDoc, which basically sorted that whole biscuit. https://asciidoc.org/
  
  Report comment
  
  Reply
Adrián Pardini says:

September 3, 2021 at 4:27 am

The html files created with WordPerfect are completely fine.

Report comment

Reply
1. Gregg Eshelman says:
  
  September 3, 2021 at 11:06 pm
  
  20-ish years ago, whatever was the then latest (and last from whomever was the owner of Word Perfect at the time) Word Perfect for Windows was completely useless for HTML. I started a simple document and saved it. When I re-opened it, it was a horrible mess. I closed it then opened the HTML file in Notepad. I dunno what Word Perfect did but it was like the transporter incident in that early Star Trek movie.
  
  Report comment
  
  Reply
Dan says:

September 3, 2021 at 11:30 am

The HRML word spits out is awful, and each version is slightly different, but (within a version) it is consistent.
I worked on a big project to convert hundreds of large word docs to HTML, and later to a domain-specific markup language.

The key things are:
– make sure your word doc is consistently formatted. It’s generally much easier to fix that in word than later.
– use styles! It makes conversions much easier
– double-check everything at every stage, and at the end do a text comparison with the original doc.

Technically, we used a mix of word macros to scan for and fix word formatting issues, Perl scripts for the bulk of regexing (fixing up word’s html, removing spurious rubbish from the files, and some sanity and internal consistency checks), and nodejs for manipulating the DOM and final consistency checks.

Report comment

Reply
1. Dan says:
  
  September 3, 2021 at 11:30 am
  
  *HTML
  … please can we get an edit button?!
  
  Report comment
  
  Reply
  1. brucedesertrat says:
    
    September 3, 2021 at 12:16 pm
    
    I dunno HRML seems a pretty good description of Word’s HTML :-) Hyper Really Messy Language
    
    Report comment
    
    Reply
    1. Ricardo says:
      
      September 3, 2021 at 2:12 pm
      
      HaRMfuL
      
      Report comment
      
      Reply
2. Gregg Eshelman says:
  
  September 3, 2021 at 11:02 pm
  
  For the love of whatever, don’t use UTF-8 or any other multi-byte representation (such as “HTML Friendly” codes that start with an ampersand) of any character that’s in the extended ASCII set. Except for one character in Norwegian, every character for any language whose alphabet is based on the English alphabet is in Extended ASCII, and that includes things like left and right single and double quotes. I’ve run into a large number of documents where for some reason ALL punctuation was there with multi-byte encoding. Why? Who the heck knows because there was no reason at all to have it that way.
  
  I had to convert them for PalmOS software and nothing on PalmOS supports Unicode, so without pre-processing away the multi-byte characters, the punctuation would either all come out as ? or would get removed and replaced with nothing, not even a space.
  
  Back in the day I would have happily paid money for software to add Unicode support to PalmOS 5.
  
  For gits and shiggles I’d like to see an entire HTML document converted to UTF-8 codes, WITH the leading zeroes that are optional for character numbers under 1000. Displaying such should be part of the ACID web browser test.
  
  Report comment
  
  Reply
3. cmholm says:
  
  September 5, 2021 at 10:38 am
  
  These are good suggestions.
  
  They take me back to 2001 or so, when a new intern’s CV included HTML/web development experience, which we quickly realized consisted of formatting in Word to get the desired look, then saving as HTML. Since she was an intern, it wasn’t the end of the world, and devoting time for training is part of the job.
  
  The issue I used to see was that most colleagues (and I) were trained in Word at Jumprightin U, seldom used styles or constant formatting, and kept reformatting individual portions of a document until it passed muster on a printer.
  
  Report comment
  
  Reply
e says:

September 3, 2021 at 12:46 pm

There I was, thinking golf was the archetypal “invented by sadists, for masochists” activity

Report comment

Reply
Ren says:

September 3, 2021 at 1:44 pm

I had never considered converting a Word doc to HTML (ahem! HRML) before reading this article.
So, I opened a recent Word doc to Save As: an HTML file, just to see how “bad” it could be.
I did not see an HTML option, just a couple XML options, and the one I tried didn’t turn out bad.
The doc had a number of photos and a couple of tables along with the text, but it was still readable…

Oh, come to think of it, the file I used didn’t have any URLs…

Report comment

Reply
1. Benny Harvey says:
  
  September 3, 2021 at 2:05 pm
  
  LibreOffice Writer supports exporting as HTML.
  
  Report comment
  
  Reply
  1. Drone says:
    
    September 4, 2021 at 7:12 pm
    
    Plus LibreOffice is free, open-source, cross-platform, available with an installer or portable, and it doesn’t spy on you. I’m very experienced with both Microsoft Office and LibreOffice, and I struggle to understand why someone would prefer Microsoft Office, especially since Microsoft is data harvesting for money and clearly moving toward a cloud-based SAAS business model. Before it’s too late, jump off the Microsoft Office train now so you can ease into LibreOffice at your own pace.
    
    Report comment
    
    Reply
Cbob says:

September 3, 2021 at 5:05 pm

It’s better than copy con>whatever.html or whatever the syntax used to be, but it’s something I’d rather do on pre-aol Netscape as opposed Word

Report comment

Reply
1. Gregg Eshelman says:
  
  September 3, 2021 at 11:08 pm
  
  Back in the day I used to write autoexec.bat and config.sys with Edlin and low level format hard drives with debug.
  
  Report comment
  
  Reply
  1. Amiable Ninja says:
    
    September 4, 2021 at 7:50 am
    
    Irrelevant, incompetent and immaterial.
    
    Report comment
    
    Reply
KenN says:

September 4, 2021 at 9:04 am

Heard the saying “every day I don’t have to wear a suit is a good day”? I feel the same way about Word. If I’m using Word (or another word processor), it means I’m doing something I’d rather not be doing.

There are many better tools than Word for web pages. For electronic distribution of more formal and/or printable docs, PDF is my preference. (but probably composing in Word or Open Office [sad face] then converted)

Report comment

Reply
Stellan says:

September 4, 2021 at 12:33 pm

I’m annoyed at how rare it is to see a pure-HTML word processor. 99% of what I’ve ever done in a word processor could be done in HTML4 with a small chunk of CSS to make it all pretty. Even fancy things like Word’s auto-outline function could easily be done with HTML and some basic parsing. Especially since HTML can be viewed on the vast majority of computing devices nowadays, and if you can mentally parse HTML, on pretty much any computing device ever created.

Some years ago, I had a word processor that did exactly that, it just used HTML as it file storage format, the top of the file contained some CSS to define the various styles and formatting used in the document. The program itself was a tiny statically compiled binary that produced some very clean HTML. The version I had was forked and rebranded by my employer, and I’ve completely forgot what the original was called.

Report comment

Reply
Jouni says:

September 6, 2021 at 2:42 am

Could we all just Markdown, mmmkay?

Report comment

Reply

Hackaday

Making Web Pages With Word?

28 thoughts on “Making Web Pages With Word?”

Leave a Reply to DanCancel reply

Search

Never miss a hack

If you missed it

How Vibe Coding Is Killing Open Source

Building Natural Seawalls To Fight Off The Rising Tide

Ask Hackaday: How Do You Digitize Your Documents?

The Amazing Maser

Zombie Netscape Won’t Die

Our Columns

Hackaday Links: February 1, 2026

Secret Ingredients

Hackaday Podcast Episode 355: Person Detectors, Walkie Talkies, Open Smartphones, And A WiFi Traffic Light

Did We Overestimate The Potential Harm From Microplastics?

FLOSS Weekly Episode 862: Have Your CAKE And Eat It Too

28 thoughts on “Making Web Pages With Word?”

Leave a Reply to DanCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns