There’s another way to archive web pages besides saving them in personal software or printing them to PDF. It’s called MHTML.
Scrapbook is a good place to save pages. I have my whole website there. But printing them is messy.
Printing to PDF out of Firefox means you get the whole page including sidebars and advertising. And it splits text and images across page breaks and throws sidebars across the bottom.
Leaving out the sidebars and pasting into EverNote 2 does the best job of it. You can copy & paste just the post and comments out of a blog, then use a PDF driver (like PDF Creator) to print to pdf. It puts a page break between each post and doesn’t split the images.
PDF Printing Kills Links
All the PDF printers I’ve seen (and that would be about a dozen) kill links unless the links are in the raw http format.
Print Friendly as well as Web2PDF will print web pages keeping the links intact but Print Friendly only prints posts leaving the comments out, and Web2PDF will print entire pages including things you might not want.
When you save a web page as HTML what you get is an HTML file and a folder of all the images and other files attached to it.
MHTML is a way of printing web pages that embeds the images and other files into the page so your output is just one page.
MHTML in Different Browsers
IE9 can output MHTML pages but it doesn’t do it well. The sidebars get thrown around and the font is inconsistent.
Firefox before v.4 doesn’t either produce or read MHTML files. Firefox, after version 3, can output MHTML pages using a plugin called UnMHT. It makes very nice looking pages and Firefox renders them well.
It can also save multiple tabs at once but makes only one page for each tab.
Where I first discovered MHTML was through EverNote 2 under Export. Right-click on a note header to find this option.
And then choose Web Archive under the Save As options.
It’s possible in EverNote 2 to select more than one note at a time and export them all into one tidy MHTML page that opens in your browser.
And it does a lovely job of it, putting a thin blue border around each note and separating the notes with a blue spacer bar.
I exported 45 blog posts from EverNote 2 at one time for a total of 7.4 MB and it took IE9 about 15 seconds to open it.
Google Chrome reads MHTML but not consistently. It likes pages produced with UnMHT in Firefox. It cannot read MHTML produced by IE9 at all. It reads MHTML produced by EverNote 2 but leaves out the images.
Generally, GIF images tend to come out a bit blurry. JPG’s are fine. As far as I can tell the images are actually embedded in the pages, not just linked to them, which is the point.
Text links are preserved as they are in web pages. Unless the links are going to a site that no longer exists.
Combining MHTML files
The only way I know of combining MHTML files right now is to keep my web pages or snippets in EverNote 2 and then select the ones I want and export them to MHTML together.
You can also use the Scrapbook extension for Firefox but it’s not a great solution if there are images involved. First save each page as an MHT file and then save them with Scrapbook. Then use the Combine wizard to combine multiple pages. This will only work well with text-only pages because any images will be linked from the location of each individual MHT page and if you delete those or move them there go your images.