get docx file contents using javascript/jquery

javascript jquery docx

37,760

Solution 1

With docxtemplater, you can easily get the full text of a word (works with docx only) by using the doc.getFullText() method.

HTML code:

<body>
    <button onclick="gettext()">Get document text</button>
</body>
<script src="https://cdnjs.cloudflare.com/ajax/libs/docxtemplater/3.26.2/docxtemplater.js"></script>
<script src="https://unpkg.com/[email protected]/dist/pizzip.js"></script>
<script src="https://unpkg.com/[email protected]/dist/pizzip-utils.js"></script>
<script>
    function loadFile(url, callback) {
        PizZipUtils.getBinaryContent(url, callback);
    }
    function gettext() {
        loadFile(
            "https://docxtemplater.com/tag-example.docx",
            function (error, content) {
                if (error) {
                    throw error;
                }
                var zip = new PizZip(content);
                var doc = new window.docxtemplater(zip);
                var text = doc.getFullText();
                console.log(text);
                alert("Text is " + text);
            }
        );
    }
</script>

Solution 2

I know this is an old post, but doctemplater has moved on and the accepted answer no longer works. This worked for me:

function loadDocx(filename) {
  // Read document.xml from docx document
  const AdmZip = require("adm-zip");
  const zip = new AdmZip(filename);
  const xml = zip.readAsText("word/document.xml");
  // Load xml DOM
  const cheerio = require('cheerio');
  $ = cheerio.load(xml, {
    normalizeWhitespace: true,
    xmlMode: true
  })
  // Extract text
  let out = new Array()
  $('w\\:t').each((i, el) => {
    out.push($(el).text())
  })
  return out
}

37,760

Author by

Abdul Ali

Interested in web application development using microsoft technologies..

Updated on November 06, 2021

Comments

Abdul Ali over 2 years

wish to open / read docx file using client side technologies (HTML/JS).

kindly assist if this is possible . have found a Javascript library named docx.js but personally cannot seem to locate any documentation for it. (http://blog.innovatejs.com/?p=184)

the goal is to make a browser based search tool for docx files and txt files .

any help appreciated.
Abdul Ali about 9 years

thank you for the reply. will look into it. although it seems to solve the issue.
Bit_hunter almost 8 years

your code is not working with jszip version 3.0.0. Would u please update it?
edi9999 almost 8 years

Docxtemplater still depends on [email protected] , you can still install it so it should be working. In future versions, docxtemplater will work with JSZip 3.x
Tyler B. Wear almost 7 years

Why does that API squash all the newlines?
edi9999 almost 7 years

It is how it works, to just return a single string, or we would have to use formatting (array of strings or HTML)
edi9999 about 5 years

You could use pandoc for that : Convert docx to html for example : github.com/jgm/pandoc
fdrv over 2 years

Use DocxGen() instead
fdrv over 2 years

Uncaught Error: The constructor with parameters has been removed in JSZip 3.0, please check the upgrade guide. Docxgen is old
Udi over 2 years

Life saver, thanks for this!
garek007 about 2 years

Is this node JS? What is cheerio?
James almost 2 years

Hi, thanks for the answer. Is there a way we could get the link break for it as well. getFullText seems have no line break. Thanks
edi9999 almost 2 years

Hello @James, I've released a new enhanced code part here that will get the different paragraphs. docxtemplater.com/faq/…
James almost 2 years

@edi9999, thanks for the link, but the problem is that it is node.js version which seems to be runned over server side. Any idea of client side use user's broswer only? Thanks