Batch-saving multiple HTML / Web Pages to PDF

2012/10/24 00:51:49 BST

Earlier today I put out a plea:

WANT: to get Chrome/Firefox/Safari to save a web page to PDF without invoking dialogue box / 3rd-party service. One-click or shellscript.

— Alec Muffett (@AlecMuffett) October 23, 2012

…which (long story short) led to a bunch of suggestions none of which quite worked for me.

A more succinct problem statement would be: I have a list of about 50 wiki pages, want to create PDF versions of each one, and don’t care too much about links/inclusions because they are text heavy.

Fighting frustration I passed through a barrier:

Am currently going with “fuck it, i’ll compile lynx” as a strategy.Have moved further forward in 15 minutes than 90mins of Google+GUI

— Alec Muffett (@AlecMuffett) October 23, 2012

…then someone mentioned convert on OSX which sparked a memory from long ago, leading to some Googling and the recognition that this was essentially cupsfilter in disguise (good tip, bad slideshow with broken and flaky script examples) which led me to a solution along the lines of:

while read foo do curl http://server/$foo/ > $foo.html cupsfilter -f $foo.html -a media=A4 -a scaling=75 > $foo.pdf done

…and that worked. Got what I wanted anyway. Not sure what would happen if I passed entire HTML directories into it.

⊞

software

Comments

4 responses to “Batch-saving multiple HTML / Web Pages to PDF”

2012/10/24

Dogsbody

I have a list of about 50 wiki pages, want to create PDF versions of each one, and don’t care too much about links/inclusions because they are text heavy.

Now why didn’t you ask that as this is exactly what I’m doing at the moment! :-p

I have ~120 MediaWiki pages that I just want to archive in an easy form. My solution has been…
– Use MediaWiki’s Special:Export to dump all versions of all pages as an XML.
– Write a quick bit of Perl to pull out each article
– Pass each article to the awesome unoconv program that writes them out as OpenOffice odt files.

unoconv will also write out to pdf.

YMMV

I love that there are so many solutions to problems 🙂

Reply
2012/10/24

Alan Burlison

Which Wiki package was this? Some of them will export pages directly as HTML.

Reply
1. 2012/10/24
  
  alecm
  
  exporting as HTML is not too hard. As PDF…
  
  Reply
2012/10/24

Colin E.

Hi Alec!

You piqued my interest, because this kind-of looks similar to one of my current projects. I need to pull off and save chunks of web sites in a nice self-contained “bundle”, preferably each chunk being one file, self-decribing.

First priority is “near WYSIWYG” (i.e. the file opens in a browser, looking like the old site). Second priority would be an “archived” read-only (or at least hard to edit) copy, PDF is an obvious candidate.

For No. 1 I’ve had some success with Firefox, ScrapBook, the ScrapBook MAF writer extension, and MAF for Firefox. This is fine as a quick+dirty solution, but capture is a bit uncontrollable. Something like HTTrack for capture would be better, but I haven’t found a MAFF writer for HTTrack or it’s equivalents.

A slightly off the wall alternative is to mirror the HTML and convert it into an eBook (ePub). More finicky than MAF, and it can’t handle non-HTML attachments, but it’s a nice way to preserve readable content long termish.

Adobe Acrobat claims to do bulk HTML->PDF conversion, preserving internal links etc.. I haven’t tried it, and of course it’s ££. Any experiences other have, and ideally an Open Source alternative to filling Adobe’s coffers woudl be great to hear.

Reply

Dropsafe

Batch-saving multiple HTML / Web Pages to PDF

Comments

4 responses to “Batch-saving multiple HTML / Web Pages to PDF”

Leave a Reply Cancel reply

More posts

UK Government King’s Speech proposes “Cyber ASBO” with obvious risk of scope creep into censorship

Don’t take away our freedom to play games when we want | 38 Degrees

‘Von der Leyen Announces the EU’s New Age Verification App Claiming it is “Completely Anonymous” & users “Cannot be Tracked”’ | …the EU is about to have its ‘Clipper’ moment

From 2019: ‘If there were 5 million “bad people” on Facebook – terrorists, criminals, drug-dealers, whatever – that would be 0.2% of the userbase’