HTML/PDF to DOC(X) in Linux command line?

17,461

Solution 1

I've just stumbled on this question and after a bit more googling, found pandoc: http://johnmacfarlane.net/pandoc/README.html

A simple command will create a docx or pdf (or rtf etc) file from html input like so:

pandoc -o output.docx input.html

It can also write to stdout (with some formats) and read from stdin.

Not sure if it's in the debian / ubuntu repos but it's in the epel 6 repo for Redhat / CentOS 6 ( yum install pandoc ).

Hope this helps someone :)

Solution 2

You can convert HTML into .doc using an OpenOffice macro, see this thread:

http://www.oooforum.org/forum/viewtopic.phtml?p=44367#44367

converting pdf to .doc is much harder, due the multitude of different content that could be inside a PDF - quite often PDFs are used for things such as scanned text.

Solution 3

You can use pdftohtml to make an html file from a pdf.

Word can open html files directly.

Solution 4

You might be able to do the latter using OpenOffice from the command line. There are also bridges for Scripting languages - find out more on OpenOffice's website. There is one for PHP called PUNO, however I have no personal experience with it yet.

Share:
17,461

Related videos on Youtube

studiohack
Author by

studiohack

Updated on September 17, 2022

Comments

  • studiohack
    studiohack over 1 year

    I need to convert PDF or HTML+CSS into DOC or DOCX under Linux, it can be from the command line or with a scripting language.

    Any idea?

  • saunderl
    saunderl over 14 years
    no one works for me
  • Sevki
    Sevki over 14 years
    It parses HTML very badly, ignoring most of it's CSS