How can I remove blank page from PDF in iText
19,194
Solution 1
There are a few ways I am sure. But here is an example of how I have done it. I just check for amount of data on the page and if it is < 20 bytes I don't include it:
public void removeBlankPdfPages(String pdfSourceFile, String pdfDestinationFile, boolean debug)
{
try
{
// step 1: create new reader
PdfReader r = new PdfReader(pdfSourceFile);
RandomAccessFileOrArray raf = new RandomAccessFileOrArray(pdfSourceFile);
Document document = new Document(r.getPageSizeWithRotation(1));
// step 2: create a writer that listens to the document
PdfCopy writer = new PdfCopy(document, new FileOutputStream(pdfDestinationFile));
// step 3: we open the document
document.open();
// step 4: we add content
PdfImportedPage page = null;
//loop through each page and if the bs is larger than 20 than we know it is not blank.
//if it is less than 20 than we don't include that blank page.
for (int i=1;i<=r.getNumberOfPages();i++)
{
//get the page content
byte bContent [] = r.getPageContent(i,raf);
ByteArrayOutputStream bs = new ByteArrayOutputStream();
//write the content to an output stream
bs.write(bContent);
logger.debug("page content length of page "+i+" = "+bs.size());
//add the page to the new pdf
if (bs.size() > blankPdfsize)
{
page = writer.getImportedPage(r, i);
writer.addPage(page);
}
bs.close();
}
//close everything
document.close();
writer.close();
raf.close();
r.close();
}
catch(Exception e)
{
//do what you need here
}
}
Solution 2
C# (as requested by kalyan)
public static void removeBlankPdfPages(string pdfSourceFile, string pdfDestinationFile, bool debug) {
// step 0: set minimum page size
const int blankPdfsize = 20;
// step 1: create new reader
var r = new PdfReader(pdfSourceFile);
var raf = new RandomAccessFileOrArray(pdfSourceFile);
var document = new Document(r.GetPageSizeWithRotation(1));
// step 2: create a writer that listens to the document
var writer = new PdfCopy(document, new FileStream(pdfDestinationFile, FileMode.Create));
// step 3: we open the document
document.Open();
// step 4: we add content
PdfImportedPage page = null;
//loop through each page and if the bs is larger than 20 than we know it is not blank.
//if it is less than 20 than we don't include that blank page.
for (var i=1 ; i <= r.NumberOfPages; i++)
{
//get the page content
byte[] bContent = r.GetPageContent(i, raf);
var bs = new MemoryStream();
//write the content to an output stream
bs.Write(bContent, 0, bContent.Length);
Console.WriteLine("page content length of page {0} = {1}", i, bs.Length);
//add the page to the new pdf
if (bs.Length > blankPdfsize)
{
page = writer.GetImportedPage(r, i);
writer.AddPage(page);
}
bs.Close();
}
//close everything
document.Close();
writer.Close();
raf.Close();
r.Close();}
Author by
Tushar Ahirrao
I know Javascript, HTML5, CSS3, Java, PHP and more.... I enjoy building things, learning programming languages, listening to music.
Updated on June 04, 2022Comments
-
Tushar Ahirrao about 2 years
I want to remove a blank page from a PDF generated using the iText library in Java.
How do I do it?
-
Simon about 11 yearswhat is the point of writing a byte array to a memorystream just to get the length of the memorystream??
-
Filippo Vitale about 11 yearsSorry @Simon , I don't remember! ...this is a 3 years old question.
-
Simon about 11 yearswell i did the compare directly on bContent.Length and it worked. not sure if u want to bother updating the answer :)
-
ThanhLD over 5 yearsI think should check the content of pdf pages, in my case, the blank page has size > 20 so it passed the check, i recommend use this line code: string extractedText = PdfTextExtractor.GetTextFromPage(pdfreader, pageNum, new LocationTextExtractionStrategy()); and check it if null or empty