Replacing a text in Apache POI XWPF
Solution 1
The method you need is XWPFRun.setText(String). Simply work your way through the file until you find the XWPFRun of interest, work out what you want the new text to be, and replace it. (A run is a sequence of text with the same formatting)
You should be able to do something like:
XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx"));
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null && text.contains("needle")) {
text = text.replace("needle", "haystack");
r.setText(text, 0);
}
}
}
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null && text.contains("needle")) {
text = text.replace("needle", "haystack");
r.setText(text,0);
}
}
}
}
}
}
doc.write(new FileOutputStream("output.docx"));
Solution 2
Here is what we did for text replacement using Apache POI. We found that it was not worth the hassle and simpler to replace the text of an entire XWPFParagraph instead of a run. A run can be randomly split in the middle of a word as Microsoft Word is in charge of where runs are created within the paragraph of a document. Therefore the text you might be searching for could be half in one run and half in another. Using the full text of a paragraph, removing its existing runs, and adding a new run with the adjusted text seems to solve the problem of text replacement.
However there is a cost of doing the replacement at the paragraph level; you lose the formatting of the runs in that paragraph. For example if in the middle of your paragraph you had bolded the word "bits", and then when parsing the file you replaced the word "bits" with "bytes", the word "bytes" would no longer be bolded. Because the bolding was stored with a run that was removed when the paragraph's entire body of text was replaced. The attached code has a commented out section that was working for replacement of text at the run level if you need it.
It should also be noted that the below works if the text you are inserting contains \n return characters. We could not find a way to insert returns without creating a run for each section prior to the return and marking the run addCarriageReturn(). Cheers
package com.healthpartners.hcss.client.external.word.replacement;
import java.util.List;
import org.apache.commons.lang.StringUtils;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class TextReplacer {
private String searchValue;
private String replacement;
public TextReplacer(String searchValue, String replacement) {
this.searchValue = searchValue;
this.replacement = replacement;
}
public void replace(XWPFDocument document) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph xwpfParagraph : paragraphs) {
replace(xwpfParagraph);
}
}
private void replace(XWPFParagraph paragraph) {
if (hasReplaceableItem(paragraph.getText())) {
String replacedText = StringUtils.replace(paragraph.getText(), searchValue, replacement);
removeAllRuns(paragraph);
insertReplacementRuns(paragraph, replacedText);
}
}
private void insertReplacementRuns(XWPFParagraph paragraph, String replacedText) {
String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacedText, "\n");
for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) {
String part = replacementTextSplitOnCarriageReturn[j];
XWPFRun newRun = paragraph.insertNewRun(j);
newRun.setText(part);
if (j+1 < replacementTextSplitOnCarriageReturn.length) {
newRun.addCarriageReturn();
}
}
}
private void removeAllRuns(XWPFParagraph paragraph) {
int size = paragraph.getRuns().size();
for (int i = 0; i < size; i++) {
paragraph.removeRun(0);
}
}
private boolean hasReplaceableItem(String runText) {
return StringUtils.contains(runText, searchValue);
}
//REVISIT The below can be removed if Michele tests and approved the above less versatile replacement version
// private void replace(XWPFParagraph paragraph) {
// for (int i = 0; i < paragraph.getRuns().size() ; i++) {
// i = replace(paragraph, i);
// }
// }
// private int replace(XWPFParagraph paragraph, int i) {
// XWPFRun run = paragraph.getRuns().get(i);
//
// String runText = run.getText(0);
//
// if (hasReplaceableItem(runText)) {
// return replace(paragraph, i, run);
// }
//
// return i;
// }
// private int replace(XWPFParagraph paragraph, int i, XWPFRun run) {
// String runText = run.getCTR().getTArray(0).getStringValue();
//
// String beforeSuperLong = StringUtils.substring(runText, 0, runText.indexOf(searchValue));
//
// String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacement, "\n");
//
// String afterSuperLong = StringUtils.substring(runText, runText.indexOf(searchValue) + searchValue.length());
//
// Counter counter = new Counter(i);
//
// insertNewRun(paragraph, run, counter, beforeSuperLong);
//
// for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) {
// String part = replacementTextSplitOnCarriageReturn[j];
//
// XWPFRun newRun = insertNewRun(paragraph, run, counter, part);
//
// if (j+1 < replacementTextSplitOnCarriageReturn.length) {
// newRun.addCarriageReturn();
// }
// }
//
// insertNewRun(paragraph, run, counter, afterSuperLong);
//
// paragraph.removeRun(counter.getCount());
//
// return counter.getCount();
// }
// private class Counter {
// private int i;
//
// public Counter(int i) {
// this.i = i;
// }
//
// public void increment() {
// i++;
// }
//
// public int getCount() {
// return i;
// }
// }
// private XWPFRun insertNewRun(XWPFParagraph xwpfParagraph, XWPFRun run, Counter counter, String newText) {
// XWPFRun newRun = xwpfParagraph.insertNewRun(counter.i);
// newRun.getCTR().set(run.getCTR());
// newRun.getCTR().getTArray(0).setStringValue(newText);
//
// counter.increment();
//
// return newRun;
// }
Solution 3
my task was to replace texts of the format ${key} with values of a map within a word docx document. The above solutions were a good starting point but did not take into account all cases: ${key} can be spread not only across multiple runs but also across multiple texts within a run. I therefore ended up with the following code:
private void replace(String inFile, Map<String, String> data, OutputStream out) throws Exception, IOException {
XWPFDocument doc = new XWPFDocument(OPCPackage.open(inFile));
for (XWPFParagraph p : doc.getParagraphs()) {
replace2(p, data);
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
replace2(p, data);
}
}
}
}
doc.write(out);
}
private void replace2(XWPFParagraph p, Map<String, String> data) {
String pText = p.getText(); // complete paragraph as string
if (pText.contains("${")) { // if paragraph does not include our pattern, ignore
TreeMap<Integer, XWPFRun> posRuns = getPosToRuns(p);
Pattern pat = Pattern.compile("\\$\\{(.+?)\\}");
Matcher m = pat.matcher(pText);
while (m.find()) { // for all patterns in the paragraph
String g = m.group(1); // extract key start and end pos
int s = m.start(1);
int e = m.end(1);
String key = g;
String x = data.get(key);
if (x == null)
x = "";
SortedMap<Integer, XWPFRun> range = posRuns.subMap(s - 2, true, e + 1, true); // get runs which contain the pattern
boolean found1 = false; // found $
boolean found2 = false; // found {
boolean found3 = false; // found }
XWPFRun prevRun = null; // previous run handled in the loop
XWPFRun found2Run = null; // run in which { was found
int found2Pos = -1; // pos of { within above run
for (XWPFRun r : range.values())
{
if (r == prevRun)
continue; // this run has already been handled
if (found3)
break; // done working on current key pattern
prevRun = r;
for (int k = 0;; k++) { // iterate over texts of run r
if (found3)
break;
String txt = null;
try {
txt = r.getText(k); // note: should return null, but throws exception if the text does not exist
} catch (Exception ex) {
}
if (txt == null)
break; // no more texts in the run, exit loop
if (txt.contains("$") && !found1) { // found $, replace it with value from data map
txt = txt.replaceFirst("\\$", x);
found1 = true;
}
if (txt.contains("{") && !found2 && found1) {
found2Run = r; // found { replace it with empty string and remember location
found2Pos = txt.indexOf('{');
txt = txt.replaceFirst("\\{", "");
found2 = true;
}
if (found1 && found2 && !found3) { // find } and set all chars between { and } to blank
if (txt.contains("}"))
{
if (r == found2Run)
{ // complete pattern was within a single run
txt = txt.substring(0, found2Pos)+txt.substring(txt.indexOf('}'));
}
else // pattern spread across multiple runs
txt = txt.substring(txt.indexOf('}'));
}
else if (r == found2Run) // same run as { but no }, remove all text starting at {
txt = txt.substring(0, found2Pos);
else
txt = ""; // run between { and }, set text to blank
}
if (txt.contains("}") && !found3) {
txt = txt.replaceFirst("\\}", "");
found3 = true;
}
r.setText(txt, k);
}
}
}
System.out.println(p.getText());
}
}
private TreeMap<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
int pos = 0;
TreeMap<Integer, XWPFRun> map = new TreeMap<Integer, XWPFRun>();
for (XWPFRun run : paragraph.getRuns()) {
String runText = run.text();
if (runText != null && runText.length() > 0) {
for (int i = 0; i < runText.length(); i++) {
map.put(pos + i, run);
}
pos += runText.length();
}
}
return map;
}
Solution 4
If somebody needs also to keep the formatting of the text, this code works better.
private static Map<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
int pos = 0;
Map<Integer, XWPFRun> map = new HashMap<Integer, XWPFRun>(10);
for (XWPFRun run : paragraph.getRuns()) {
String runText = run.text();
if (runText != null) {
for (int i = 0; i < runText.length(); i++) {
map.put(pos + i, run);
}
pos += runText.length();
}
}
return (map);
}
public static <V> void replace(XWPFDocument document, Map<String, V> map) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph paragraph : paragraphs) {
replace(paragraph, map);
}
}
public static <V> void replace(XWPFDocument document, String searchText, V replacement) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph paragraph : paragraphs) {
replace(paragraph, searchText, replacement);
}
}
private static <V> void replace(XWPFParagraph paragraph, Map<String, V> map) {
for (Map.Entry<String, V> entry : map.entrySet()) {
replace(paragraph, entry.getKey(), entry.getValue());
}
}
public static <V> void replace(XWPFParagraph paragraph, String searchText, V replacement) {
boolean found = true;
while (found) {
found = false;
int pos = paragraph.getText().indexOf(searchText);
if (pos >= 0) {
found = true;
Map<Integer, XWPFRun> posToRuns = getPosToRuns(paragraph);
XWPFRun run = posToRuns.get(pos);
XWPFRun lastRun = posToRuns.get(pos + searchText.length() - 1);
int runNum = paragraph.getRuns().indexOf(run);
int lastRunNum = paragraph.getRuns().indexOf(lastRun);
String texts[] = replacement.toString().split("\n");
run.setText(texts[0], 0);
XWPFRun newRun = run;
for (int i = 1; i < texts.length; i++) {
newRun.addCarriageReturn();
newRun = paragraph.insertNewRun(runNum + i);
/*
We should copy all style attributes
to the newRun from run
also from background color, ...
Here we duplicate only the simple attributes...
*/
newRun.setText(texts[i]);
newRun.setBold(run.isBold());
newRun.setCapitalized(run.isCapitalized());
// newRun.setCharacterSpacing(run.getCharacterSpacing());
newRun.setColor(run.getColor());
newRun.setDoubleStrikethrough(run.isDoubleStrikeThrough());
newRun.setEmbossed(run.isEmbossed());
newRun.setFontFamily(run.getFontFamily());
newRun.setFontSize(run.getFontSize());
newRun.setImprinted(run.isImprinted());
newRun.setItalic(run.isItalic());
newRun.setKerning(run.getKerning());
newRun.setShadow(run.isShadowed());
newRun.setSmallCaps(run.isSmallCaps());
newRun.setStrikeThrough(run.isStrikeThrough());
newRun.setSubscript(run.getSubscript());
newRun.setUnderline(run.getUnderline());
}
for (int i = lastRunNum + texts.length - 1; i > runNum + texts.length - 1; i--) {
paragraph.removeRun(i);
}
}
}
}
Solution 5
There is the replaceParagraph
implementation that replaces ${key}
with value
(the fieldsForReport
parameter) and saves format by merging runs
contents ${key}
.
private void replaceParagraph(XWPFParagraph paragraph, Map<String, String> fieldsForReport) throws POIXMLException {
String find, text, runsText;
List<XWPFRun> runs;
XWPFRun run, nextRun;
for (String key : fieldsForReport.keySet()) {
text = paragraph.getText();
if (!text.contains("${"))
return;
find = "${" + key + "}";
if (!text.contains(find))
continue;
runs = paragraph.getRuns();
for (int i = 0; i < runs.size(); i++) {
run = runs.get(i);
runsText = run.getText(0);
if (runsText.contains("${") || (runsText.contains("$") && runs.get(i + 1).getText(0).substring(0, 1).equals("{"))) {
//As the next run may has a closed tag and an open tag at
//the same time, we have to be sure that our building string
//has a fully completed tags
while (!openTagCountIsEqualCloseTagCount(runsText))) {
nextRun = runs.get(i + 1);
runsText = runsText + nextRun.getText(0);
paragraph.removeRun(i + 1);
}
run.setText(runsText.contains(find) ?
runsText.replace(find, fieldsForReport.get(key)) :
runsText, 0);
}
}
}
}
private boolean openTagCountIsEqualCloseTagCount(String runText) {
int openTagCount = runText.split("\\$\\{", -1).length - 1;
int closeTagCount = runText.split("}", -1).length - 1;
return openTagCount == closeTagCount;
}
Gagan93
Senior Software Developer with over 2 years of experience in backend development, design, estimation, cloud infrastructure management and optimization.
Updated on July 09, 2022Comments
-
Gagan93 almost 2 years
I just found Apache POI library very useful for editing Word files using Java. Specifically, I want to edit a DOCX file using Apache POI's XWPF classes. I found no proper method / documentation following which I could do this. Can somebody please explain in steps, how to replace some text in a DOCX file.
** The text may be in a line / paragraph or in a table row/column
Thanks in Advance :)
-
Gagan93 about 10 yearsthanks for this, but this is not reading the table data. Is there some other class which I need to use for that ??
-
Gagan93 about 10 yearsand you defined RUN as text with similar formatting, Right ? It is breaking the text with similar formatting into parts. How to correct that ?
-
Gagravarr about 10 yearsApache POI just gives you the text in the file, it has no control over how Word chooses to structure it in terms of Runs... And Word is known to do strange things! If need be, check nearby runs for part of the text.
-
Justin Skiles about 10 yearsI'm using Apache PIO 3.10 and it appears that
getCells()
onXWPFTableRow
is nowgetTableCells()
. -
Cole almost 10 yearsThe first chunk of code is giing me a NullPointerException, anyone know what is wrong?
-
Gagravarr almost 10 years@Cole Your best bet would be to ask a new question, include the code you're using, the stacktrace you get, and clarify where exactly it's giving the NPE
-
Justin Skiles over 9 yearsPlease update your answer.
tbl.getRow()
should betbl.getRows()
androw.getCells()
should berow.getTableCells()
. -
mike rodent about 8 yearsI confirm finding an oddity.
run.setText( new_text, 0 )
works OK for me, in fact, butrun.setText( new_text )
does indeed appendnew_text
to the existing text of theXWPFRun
. A bug, surely? PS using Jython. -
Ced over 7 years@Cole (null != text) && text.contains("${exercice}")
-
Ced over 7 yearsIs it working well ? Is the file structure etc intact after ? I tried your code, but I didn't manage to make it work. You might wanna add some comments
-
ron over 7 yearshello, yes it is working well for me and the structure is intact. what problems do you have ? i will add some comments and update the code.
-
Gullbyrd over 7 yearsHate to tell you this, but this approach does not work. It MIGHT work in some cases, but every time I try to do it, the text is arbitrarily broken up into multiple runs. It doesn't depend on formatting or punctuation... the runs can be broken anywhere in the text. So searching and replacing within individual runs is doomed to failure. The only choices are to replace at the paragraph level (probably not acceptable because of loss of formatting) or to find the text at the paragraph level, then map the runs to the offsets of the found text, and manipulate all overlapping runs accordingly.
-
Monir almost 7 yearsThis doesn't always work because word sometimes decides to split single camel case word into multiple runs.
-
martian over 6 yearsI want to replace
{10}
to the corresponding text in a table, but the words not in the same XWPFRun. -
Antimo almost 6 yearsWe have been using this library to text replacement and in our feature, it works like a charm! Thank you!
-
EnGoPy over 4 yearsIf you want to replace specific values in a template document (in my case) my post in topic : stackoverflow.com/questions/19137818/… might help you. It's little bit way around but in my case solved problems with divided runs and get specific word in separated run.
-
spy about 3 yearsthis didn't work if i had two ${} tokens in the same run. It's the regex i think
-
Diego Juliao almost 3 yearsThis is the most accurate solution that I've found across the web. It's aware the run composition is very unpredictable and you need to find the tag that you want to replace.
-
Admin about 2 yearsSuggestion for improval:
paragraph.getRuns()
does not return runs that e.g. contain fields. Usingparagraph.getIRuns()
(that returnsIRunElement
s) gives you more runs. -
David almost 2 yearsYour java lib is really nice ! But you use "contains()" so to can't make difference between $rabbit and $rabbitYellow during replacement. @see stackoverflow.com/questions/42904361/… So you can update source with : Pattern.compile("(?<!\\S)" + Pattern.quote(bookmark) + "(?!\\S)");
-
roma2341 almost 2 yearsIt doesn't work, becuase runsText equal to $ at line:
run.setText(runsText.contains(find) ? runsText.replace(find, fieldsForReport.get(key)) : runsText, 0);
-
roma2341 almost 2 yearsbest answer ! thanks) Perfectly works with placeholders taking two runs.