Use of XSL-FO, CSS3 instead of CSS2 to create Paginated documents like PDF?

35,262

Solution 1

Thanks all comments and answers!

Now, 2014, passed over 1.5 years of my post (May 17 '12), is time to consolidate: no answer was, for me, a "full answer", but all answers (see Nenotlep's and Alex's) contributed to form a big picture. My main motivation now, to consolidate, is the @mzjn's news (here) of 2013-11.

XSL-FO is officially dying

On Sat, 2013-11-02, Liam R. E. Quin wrote: "We have closed the Working Group because not enough people were taking part", W3C XML Activity Lead, about the failure of XSL-FO 2.0 continuity. (see a better copy here).

The last update for the Working Draft was in January 2012, and now confirmed: W3C stop developing XSL-2.

Why? It will be replaced by CSS3-page, see below.

PS: to discuss the "official statment", use https://stackoverflow.com/a/21345449/287948

CSS3 is officially growing

The standard CSS3-page is a draft, but many applications, like PrinceXML v9 and AntennaHouse Formatter v6 demonstrated that it is ready (!); and, the expected launch of HTML5 for 2014 is carrying along the forecast release CSS3.

So, I understand that for W3C, CSS3-page do all that we need to express good prints and good PDF.

Other motivations

One day, in a far future... PDF will dead — it is complex and is not part of the XML family or W3C investments —, and many claim that EPUB will replace it. This is another good motivation: tablet readers and PC browsers will print (HTML, XHTML and EPUB) as well as PDF. So PDF will be not necessary... And, for this day, the only standard need for, ex. Webkit printing project, will be the CSS3-page standard.

CSS3 is the key point in two strategic affairs: 1) to generate good PDF from XML or HTML contents; 2) to replace PDF.


NOTE: another 2014's updates for the links of the question: wkHtmlToPDF is now here. About "new texts", now we have many, see ex. Building Books with CSS3.



An updated answer for programmers, for this page's question, Why use XSL-FO instead of CSS2, for transform HTML into good PDF?

If you go further and implement a new system for XML-Publishing, there are no good reason to use XSL-FO. SUMMARIZING:

  • XSL-FO is a dead technology today, only used by niche companies, to give maintenance to legacy systems in big publishing companies, like Elsevier... Most writers/readers of Stackoverflow are from small and medium companies. Companies like O'Reilly Media, Inc. already use CSS3 for print.

  • CSS3 will replace CSS2, covering all gaps (and fears as @AlexS's) of CSS2.

  • today (2014), as you can check by Google or my links (see PrinceXML v9 and AntennaHouse Formatter v6), we have some good software to render content with CSS2 or CSS3.

  • as @bytebuster say, "CSS is much easier to develop" (and easier to learn!).

  • as I say above, CSS3 is not isolated, it is a piece of the "XML/HTML/SVG" family.

  • is much cheaper to develop "HTML+CSS templates" (hourly cost of a standard web designer doing a simple task), than "XSL-FO templates" (hourly cost of a rare professional in a complex task).

  • ....



News...

Jan'2016, the definitive CSS3 standard is coming!

About W3C standards: the old "css-page" was replaced by "css-break", and "paged media" to "fragmentation"... Now it is a Candidate Recommendation, see https://www.w3.org/TR/css-break-3

Apr'2020, Blimey, +4 years and nothing!... Ok, need more tests

Total 8 years from question's post, and 4 years from "css-break-3 fineshed!" announcement ...

Chrome was the first to finesh in 2019 but some was wrong in test validation team of W3C, and in 2020 back... Now the status (in 23 tests) is:

  • Chrome's Blink engine fail 1 test;
  • Firefox's Gecko engine fail 3 tests.

xxx

The draft now is here and tests here.

Solution 2

Updated 01.10.2015

I used to do CSS to PDF (wkhtmltopdf) and XSL-FO to PDF and I prefer CSS, but there are lots of issues with it. IMO the best CSS/HTML to PDF renderer is wkhtmltopdf, but it has tons of problems like print-quality material issues, page breaking issues, CMYK coloring, exact positioning and fullscreen rendering.

Requirements like "move that box 1.8mm to the right and up so that it touches the top of the paper" and "we need the last page to be a 100% wide marginless table" are both quite doable in XSL-FO but in CSS it is too frightening to even consider. In some cases CSS just doesn't cut it as good enough software to render it doesn't exist even if the tags do. Even wkhtmltopdf (0.11, not sure about later) uses XSLT when rendering the TOC and doesn't really support @page.

I can't speak for PrinceXML as although it looks great I know in advance that the price tag would be impossible so it's not an option - I suspect this is true for a lot of developers and companies.

If there was better software to do the rendering and more user I really do think CSS would be a better option usually as it's so much nicer to write (both css and the source (x)html) and there are tons of editors out there. It's a bit like the old Linux vs Windows debate - IMO Linux is nicer to use but lacks the software, existing expertise and support that is often required.

And to echo the comments, source material is always an issue with CSS. CSS for XML is a bit uncharted territory and just about everything everywhere is XML. Unfortunately. I have a severe dislike for XML even though it's practically much more usable than (X)HTML.

Solution 3

One possible reason for banking on CSS rather than XSL-FO in the future is that the XML Print and Page Layout Working Group at W3C is no longer active. There was not enough interest to sustain this working group. The group published an XSL 2.0 working draft in early 2012, but now it seems quite unlikely that an updated W3C recommendation will ever emerge.

There is a very recent thread on the XSL-List mailing list about the reasons for closing the working group and about the future of XSL-FO vs. CSS. See http://markmail.org/thread/65j2ah2kulcp35fm.

And by the way, even though this is an interesting topic, I'm not sure if the question is a good fit for Stack Overflow. IMHO, it is more of an open-ended invitation to discuss something rather than a question about a specific, practical, answerable problem.

Solution 4

I agree with some of what has been posted by @Nenotlep. But I am not sure if CSS markup is yet as extensive for Paginated documents as XSL-FO. But I would not know that.

I also added this part to his answer because I was unable to "comment" on the answer.

There is some history to the whole issue.

Additionally, the richness of XSL-FO and its learnings & burn-in curve over the last 10+ years on the FO rendering has had quite a tenure to get "more" things ironed out.

I was responsible for proof of concept and prototyping an Enterprise wide XML Content related system for a Fortune 20 back in 2003.

One of the pieces of that system had to render PDF, Word, X/HTML versions of documents on the fly as people changed, added & modified content XML.

Even XSL-FO > PDF and to Word-ML had a bunch of teething issues at the time.

These were inherent due to the following reason:

  • Original and new goals and capabilities of the Markup & Styling languages
  • Ability & Limitations of the Final Rendering Component to accurately represent the given markup (i.e. XSL-FO to PDF Component or X/HTML to Screen via Web Browser)

It has been 10 years since I have been frequently hands on with XSL-FO / HTML/ CSS but the above issues were interesting to discuss with the Gods of XML/ XSL world at the time (Dave Pawson, Michael Kay, Wendell Piez etc.)

It is quite possible that all representative markup that XSL-FO had over CSS for Paginated output, is now (2013) possibly replicated in CSS3 and is rendered appropriately.

I hope this helps.

2017 Edit:

Apparently CSS is still playing catch up in some ways and I remember having most of this in 2003 - That is 14 years and in web tech that's an eon too slow :) .

https://twitter.com/t_machine_org/status/917025348646199297

enter image description here

Share:
35,262
Peter Krauss
Author by

Peter Krauss

Hello! I use PostgreSQL, PHP, Javascript, jQuery, HTML, XML, XSLT, and ... "Everybody stand back, I know regular expressions!" ─ xkcd 208 2015 consulting on the following areas, LexML (XML for law): see lexML.gov.br JATS (XML for Science): see NISO's Journal Article Tag Suite HTML+RDFa and Web Semantic ... Corporate Social Responsibility ...

Updated on April 10, 2020

Comments

  • Peter Krauss
    Peter Krauss about 4 years

    There are a lot of old texts, like this 2002 book, stating that we must use "CSS for Web" and "XSL-FO for print". I think in nowadays (2012) we can, finally, to use CSS with render engines that understand paged media of CSS2 and something of CSS3... But where the "new texts", the consensus of programmers, and the investment of softhouses?

    XSL-FO or "XSL Formatting Objects" (a W3C standard) was the most often used technology to generate PDF documents, from XML or XHTML content. Version 1.1 of XSL-FO was published in 2006, 1.0 in 2001.

    CSS2.1 is from 2011, but CSS2.0 is a 1998 standard, revised in 2008... I think standard ages are not a problem. CSS with HTML, XHTML or XML have "the power of print": see tools like PrinceXML, WebKit print module (or wkhtmltopdf), ABCpdf and others.

    Choosing between CSS and XSL-FO: with CSS2 you can fit the text exactly to the paper page, etc. It's not a matter of pagination, multiple column layouts, place footnotes, running headers, or margins of a page... Both, CSS (paged media) and XSL-FO, are good standards to do this.

    PS: there are some related questions/answers for this context, about webkit transform, converting with with PHP and about Generation PDF from HTML. No one with good answer for this presented question.