get docx file contents using javascript/jquery
37,760
Solution 1
With docxtemplater, you can easily get the full text of a word (works with docx only) by using the doc.getFullText() method.
HTML code:
<body>
<button onclick="gettext()">Get document text</button>
</body>
<script src="https://cdnjs.cloudflare.com/ajax/libs/docxtemplater/3.26.2/docxtemplater.js"></script>
<script src="https://unpkg.com/[email protected]/dist/pizzip.js"></script>
<script src="https://unpkg.com/[email protected]/dist/pizzip-utils.js"></script>
<script>
function loadFile(url, callback) {
PizZipUtils.getBinaryContent(url, callback);
}
function gettext() {
loadFile(
"https://docxtemplater.com/tag-example.docx",
function (error, content) {
if (error) {
throw error;
}
var zip = new PizZip(content);
var doc = new window.docxtemplater(zip);
var text = doc.getFullText();
console.log(text);
alert("Text is " + text);
}
);
}
</script>
Solution 2
I know this is an old post, but doctemplater has moved on and the accepted answer no longer works. This worked for me:
function loadDocx(filename) {
// Read document.xml from docx document
const AdmZip = require("adm-zip");
const zip = new AdmZip(filename);
const xml = zip.readAsText("word/document.xml");
// Load xml DOM
const cheerio = require('cheerio');
$ = cheerio.load(xml, {
normalizeWhitespace: true,
xmlMode: true
})
// Extract text
let out = new Array()
$('w\\:t').each((i, el) => {
out.push($(el).text())
})
return out
}
Author by
Abdul Ali
Interested in web application development using microsoft technologies..
Updated on November 06, 2021Comments
-
Abdul Ali over 2 years
wish to open / read docx file using client side technologies (HTML/JS).
kindly assist if this is possible . have found a Javascript library named docx.js but personally cannot seem to locate any documentation for it. (http://blog.innovatejs.com/?p=184)
the goal is to make a browser based search tool for docx files and txt files .
any help appreciated.
-
Abdul Ali about 9 yearsthank you for the reply. will look into it. although it seems to solve the issue.
-
Bit_hunter almost 8 yearsyour code is not working with jszip version 3.0.0. Would u please update it?
-
edi9999 almost 8 yearsDocxtemplater still depends on [email protected] , you can still install it so it should be working. In future versions, docxtemplater will work with JSZip 3.x
-
Tyler B. Wear almost 7 yearsWhy does that API squash all the newlines?
-
edi9999 almost 7 yearsIt is how it works, to just return a single string, or we would have to use formatting (array of strings or HTML)
-
edi9999 about 5 yearsYou could use pandoc for that : Convert docx to html for example : github.com/jgm/pandoc
-
fdrv over 2 yearsUse DocxGen() instead
-
fdrv over 2 yearsUncaught Error: The constructor with parameters has been removed in JSZip 3.0, please check the upgrade guide. Docxgen is old
-
Udi over 2 yearsLife saver, thanks for this!
-
garek007 about 2 yearsIs this node JS? What is cheerio?
-
James almost 2 yearsHi, thanks for the answer. Is there a way we could get the link break for it as well. getFullText seems have no line break. Thanks
-
edi9999 almost 2 yearsHello @James, I've released a new enhanced code part here that will get the different paragraphs. docxtemplater.com/faq/…
-
James almost 2 years@edi9999, thanks for the link, but the problem is that it is node.js version which seems to be runned over server side. Any idea of client side use user's broswer only? Thanks