How to create a PDF document from languages of Unicode char set regarding using third party Fonts

14,097

Solution 1

If you are using iText, it has quite good support.

In iText in Action (chapter 2.2.2) you can read more.

You have to download some unicode Fonts like arialuni.ttf and do it like this :

    public static File fontFile = new File("fonts/arialuni.ttf");

    public static void createITextDocument(File from, File to) throws DocumentException, IOException {

        Document document = new Document();
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(to));
        document.open();
        writer.getAcroForm().setNeedAppearances(true);
        BaseFont unicode = BaseFont.createFont(fontFile.getAbsolutePath(), BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

        FontSelector fs = new FontSelector();
        fs.addFont(new Font(unicode));

        addContent(document, getParagraphs(from), fs);
        document.close();
    }

    private static void addContent(Document document, List<String> paragraphs, FontSelector fs) throws DocumentException { 

        for (int i = 0; i < paragraphs.size(); i++) {
            Phrase phrase = fs.process(paragraphs.get(i));
            document.add(new Paragraph(phrase));
        }
    }

arialuni.ttf fonts work for me, so far I checked it support for

BG, ES, CS, DA, DE, ET, EL, EN, FR, IT, LV, LT, HU, MT, NL, PL, PT, RO, SK, SL, FI, SV

and only PDF in Romanian language wasn't created properly...

With PDFBox it's almost the same:

private void createPdfBoxDoc() throws IOException, FileNotFoundException, COSVisitorException {
    PDDocument document = new PDDocument();
    PDPage page = new PDPage();
    document.addPage(page);
    PDPageContentStream contentStream = new PDPageContentStream(document, page);

    PDFont font = PDTrueTypeFont.loadTTF(document, "fonts/arialuni.ttf");
    contentStream.setFont(font, 12);
    contentStream.beginText();
    contentStream.moveTextPositionByAmount(100, 400);
    contentStream.drawString("š");
    contentStream.endText();
    contentStream.close();
    document.save("test.pdf");
    document.close();
}

However as Gagravarr says, it doesn't work because of this issue PDFBOX-903 . Even with 1.6.0-SNAPSHOT version. Maybe trunk will work. I suggest you to use iText. It works there perfectly.

Solution 2

You may find this answer helpful - it confirms that you can't do what you need with one of the standard type 1 fonts, as they're Latin1 only

In theory, you just need to embed a suitable font into the document, which handles all your codepoints, and use that. However, there's at least one open bug with writing unicode strings, so there's a chance it might not work just yet... Try the latest pdfbox from svn trunk too though to see if it helps!

Share:
14,097
lisak
Author by

lisak

Github

Updated on June 29, 2022

Comments

  • lisak
    lisak almost 2 years

    I'm using PDFBox and iText to create a simple (just paragraphs) pdf document from various languages. Something like :

    pdfBox:

    private static void createPdfBoxDocument(File from, File to) {
        PDDocument document = null;
        try {
            document = new TextToPDF().createPDFFromText(new FileReader(from));
            document.save(new FileOutputStream(to));
        } finally {
            if (document != null)
                document.close();
        }
    }
    
    private void createPdfBoxDoc() throws IOException, FileNotFoundException, COSVisitorException {
        PDDocument document = new PDDocument();
        PDPage page = new PDPage();
        document.addPage(page);
        PDPageContentStream contentStream = new PDPageContentStream(document, page);
    
        PDType1Font font = PDType1Font.TIMES_ROMAN;
        contentStream.setFont(font, 12);
        contentStream.beginText();
        contentStream.moveTextPositionByAmount(100, 400);
        contentStream.drawString("š");
        contentStream.endText();
        contentStream.close();
        document.save("test.pdf");
        document.close();
    }
    

    itext:

    private static Font blackFont = new Font(Font.FontFamily.COURIER, 12, Font.NORMAL, BaseColor.BLACK);
    
    private static void createITextDocument(File from, File to) {
        Document document = new Document();
        PdfWriter.getInstance(document, new FileOutputStream(to));
        document.open();
        addContent(document, getParagraphs(from));
        document.close();
    }
    
    private static void addContent(Document document, List<String> paragraphs) { 
    
        for (int i = 0; i < paragraphs.size(); i++) {
            document.add(new Paragraph(paragraphs.get(i), blackFont));
        }
    }
    

    The input files are encoded in UTF-8 and some languages of Unicode char set, like Russian alphabet etc., are not rendered properly in pdf. The Fonts in both libraries don't support Unicode charset I suppose and I can't find any documentation on how to add and use third party fonts. Could please anybody help me out with an example ?