HTML to Word

If there are no images, both Netscape and IE work fine - so the rest of this paper assumes that the HTML files have graphic images embedded. Creating one Microsoft Word document, that contains all the images without having to use OLE (Object Linking and Embedding) is the goal. If you use OLE, then you must have the main Word file, and all the images as seperate files. If you move the graphic files, or if the main document is emailed to someone without them - then Word will only display an error image with a square, triangle, and circle.


DO NOT USE NETSCAPE - Netscape does not save the images at all, so do not use it. I have not tried the newest version (6.0 I believe) - which "may" save them.


Use IE (Internet Explorer) and Word

IE allows you to save the entire web page, including the images. When Word opens and converts the HTML file - it will initially see the files as external links to your drive where the files reside, and therefore if you move or email the Word file, you lose the pictures, But - within Word, you can edit the links to cause them to instead, save all the pictures in the Word document.

NOTE : if you want to try this - I used the following web page to test this : http://www.dcbnet.com/notes/9611t1.html

Method 1 - Save the Web Page and Open in Word

1) open the page in IE and let it complete loading

2) File/Save As . . . 
Save as Type: Web Page Complete
enter a name for the file, such as temp1.htm
Click Save, and select a temporary folder, such as c:\temp
NOTE: the main html file will be saved in c:\temp, and IE will
create a new folder under c:\temp and place the images there
3) open Word, and open the file, temp1.htm in Word 

NOTE: you must have the HTML converter installed in your Word setup - most installs do have this feature). Word will convert the html file to Word format and will also link to the image files in the folder where IE saved them.

4) Edit/Links . . .

Select all the files and links by clicking one on the top one, 
holding the Shift key down, and clicking once on the bottom one

Click the check box, "Save picture in document", and click OK

5) File/Save as . . . - and make sure to select Save as Type: Word Document (*.doc)

DONE ! !

 

Method 2 - Copy and Paste from HTML to Word

*** causes the "Line Feeds Problem" - aka the "Narrow Column" problem) - here we list a fix for this

*** if you want to try this, use any of the newsgroup postings.  Go to www.google.com and click the "Groups" tab, then type in a search for anything, go to that page, and select the posting to copy.  Of course, this works for any web page that has text !!

This fix is for conversion of an article on a web page - but NOT for conversion of images that may be present in the article.  This is for a simple document that you want to convert to a text file or a Word file.  In particular, this problem applies to all those helpful tips and methods of solving PC problems, that users post up on the NewsGroups (available by going to www.google.com and clicking "Groups").  

If you have ever tried this - then you know the problem.  The HTML web page has paragraph line feeds at the end of every line.  The lines are short, so you and up with a lot of pages consisting of a narrow column.  These line feeds are automatically inserted by News-Readers such as Free Agent when a user sends a post to a newsgroup.  Here is an example - a portion of an actual post that I wanted to convert to a text file and save to my hard drive for future reference:

Example of the "Line Feeds" Problem (Narrow Column)

When you copy and paste the HTML into Notepad or Word - these annoying line feeds are pasted as well !!  Word assumes that they were placed there by the author.  It has no idea that newsreaders insert them automatically.   Here we will describe how to get rid of them, resulting in the following:

 

Example of the "Line Feeds Problem" Fixed (full-width column)

Using Word

You want to keep all the double paragraph line feeds, because they separate out the actual paragraphs - but you want to delete all the single paragraph marks that are the cause of the document being a narrow column.  This way Notepad and/or Word will simply format the rows of text at full width.  To do this we need to temporarily rename all instances of two consecutive paragrapgh marks to save them - then replace all sing paragraph marks, and then reclaim the double paragraph marks - as follows:

(this look like a long process - but it can be done in about one minute !!)

  1. select the article in your web browser and hit CTRL-C to copy
  2. open Word, click the cursor at the beginning of a new document, and hit CTRL-V to paste
    Replace double-paragraph marks with a Placeholder, "qqqqq"
  3. edit/Replace . . .
  4. click the cursor to place it in the "Find what" field
  5. click "More", then "Special", then "Paragraph Mark"
  6. click Special then Paragraph mark again
  7. click the "Replace with" field and enter something that does not exist anywhere is the document - I use "qqqqq"
    click Replace All
    Replace single-paragraph marks with a Space
  8. erase the contents of both fields
  9. click the cursor to place it in the "Find what" field
  10. click "More", then "Special", then "Paragraph Mark"
  11. click the "Replace with" field and enter one "Space"
  12. click Replace All
    Retrieve the double-paragraph Marks by replacing the placeholders (qqqqq) with Double-Paragraph Marks
  13. erase the contents of both fields
  14. enter "qqqqq" in the "Find what" field
  15. click the cursor in the "Replace with" field
  16. click "More", then "Special", then "Paragraph Mark"
  17. click Special then Paragraph mark again
  18. click "Replace All"
  19. Done

Using FrontPage

  1. select the article in your web browser and hit CTRL-C
  2. open FrontPage, and click the "New Page" icon on the left of the Toolbar
  3. click the top of the page to place the cursor there
  4. click Edit/Paste Special  . . . Normal Paragraphs     OR     Edit/Paste
    NOTE: "Edit/Paste Special  . . . Normal Paragraphs" will keep the paragraph breaks intact but you lose links and font properties such as Bold, Italic, Coloring, etc.  "Edit/Paste" will retain the font properties and links - but the text will all be in one long paragraph.
  5. reformat where necessary
  6. Done