How to convert txt file or PDF to Word doc using python?
14,377
Solution 1
Using python-docx I was able to pretty easily convert the txt files to Word docs.
Here's what I did.
from docx import Document
import re
import os
path = '/users/tdobbins/downloads/smithtxt'
direct = os.listdir(path)
for i in direct:
document = Document()
document.add_heading(i, 0)
myfile = open('/path/to/read/from/'+i).read()
myfile = re.sub(r'[^\x00-\x7F]+|\x0c',' ', myfile) # remove all non-XML-compatible characters
p = document.add_paragraph(myfile)
document.save('/path/to/write/to/'+i+'.docx')
Solution 2
You could check out python-docx. It can create Word docs with python so you could store the text files into word. See python-docx - what-it-can-do
Author by
tmthyjames
Updated on June 04, 2022Comments
-
tmthyjames almost 2 years
Is there a way to convert PDFs (or text files) to Word docs in python? I'm doing some web-scraping for my professor and the original docs are PDFs. I converted all 1,611 of those to text files and now we need to convert them to Word docs. The only thing I could find was a Word-to-txt converter, not the reverse.
Thanks!
-
tmthyjames about 9 yearsThanks. I'm checking it out. Other than installing it being a pain, it looks like it'll work.
-
Anmol Monga about 6 yearsI don't want to do formatting by my code. Is there any way which accept input file and covert that to .doc/.docx?
-
Anmol Monga about 6 yearsI don't want to do formatting by my code. Is there any way which accept input file and covert that to .doc/.docx?