Creating paged documents in a web environment

As a web developer, it is an ingrained design pattern (with plenty of accompanying tools) to generate mark up which is then rendered in a clients browser.

There are however, often situations when working on web projects where it is desirable to output documents / books / or other paper based information from an existing database of data. In many cases much of the business logic already exists to extract the data from the database, however turning your beautifully crafted html into something that looks good on paper, or as an attachment to an email, particularly breaking the content so that it spans page breaks gracefully is not always easy.

There are a number of ways of achieving this:

  • Print style sheets – quick and dirty, and rarely handles page breaks well.
  • PDF converters. These appear to have improved significantly since I last had to do this. Programs of particular note are:
    • Prince xml¬†This is an expensive proprietary application, but it does have a free version for non commercial use, which you could use for testing. It does have good integration with web programming languages, so it is easy to generate files from within your web application.
    • Aspose – I used this as a .net application, but I think there are at least¬†Java¬†implimentations too. The project I was working on had very high expectations in terms of the quality and flexibility of output, but it did appear clumsy to get it to do what we wanted. This may be unfair and out of date.
    • There are lots of others – this SO post suggests some others.
  • Use the new generation of iOS ebook publishers. The Baker and Laker frameworks allow html5 web pages to be converted into books on the iPhone and iPad, with more devices coming soon. I think this would need some hacking to get this working, but there is definitely some potential here. I am not yet sure about conversion to pdf, but this could be a useful intermediate step for a print style sheet, or there could be a conversion to pdf available.
  • LaTeX – This is a mark up language specifically for documents, with all the features required for making well laid out and formatted documents. It looks as if it will require some time investment to get started, and create something worthwhile. This in depth tutorial looks like a good starting point.

There are a number of options, I will try some out and see what happens.