Sort a list of files using Python
Solution 1
What you need is to implement "Natural Order String Comparison". Hopefully someone has done this already and shared it.
EDIT: Here's a brute force example of doing this in Python.
import re
digits = re.compile(r'(\d+)')
def tokenize(filename):
return tuple(int(token) if match else token
for token, match in
((fragment, digits.search(fragment))
for fragment in digits.split(filename)))
# Now you can sort your PDF file names like so:
pdfList.sort(key=tokenize)
Solution 2
try putting () after pdfList.sort as in:
pdfList.sort()
The way you've got it written it won't actually sort the list. I grabbed your list of file names stuck them in an array and they sorted in the order you show them.
Solution 3
Replace pdfList.sort
by
pdfList = sorted(pdfList, key = lambda x: x[:-4])
or
pdfList = sorted(pdfList, key = lambda x: x.rsplit('.', 1)[0])
to ignore file extension while sorting
![Admin](/assets/logo_square_200-5d0d61d6853298bd2a4fe063103715b4daf2819fc21225efa21dfb93e61952ea.png)
Admin
Updated on June 29, 2022Comments
-
Admin about 2 years
I need to combine a folder full of pdfs into one file. However they must be combined in a certain order. A sample of the file names is:
WR_Mapbook__1.pdf WR_Mapbook__1a.pdf WR_Mapbook__2.pdf WR_Mapbook__2a.pdf WR_Mapbook__3.pdf WR_Mapbook__3a.pdf etc...
The way that they are sorted in windows explorer is the way I need them to be added to the a single file. However my script adds all the "a" files first, and then the files without an "a". Why does it do that? How can I sort it so that the files are added in the way I want?
See the code below. Thanks!
from pyPdf import PdfFileWriter, PdfFileReader import glob outputLoc = "K:\\test\\pdf_output\\" output = PdfFileWriter() pdfList = glob.glob(r"K:\test\lidar_MB_ALL\*.pdf") pdfList.sort print pdfList for pdf in pdfList: print pdf input1 = PdfFileReader(file(pdf, "rb")) output.addPage(input1.getPage(0)) # finally, write "output" to document-output.pdf outputStream = file(outputLoc + "WR_Imagery_LiDar_Mapbook.pdf", "wb") output.write(outputStream) print ("adding " + pdf) outputStream.close()