Sort a list of files using Python

19,666

Solution 1

What you need is to implement "Natural Order String Comparison". Hopefully someone has done this already and shared it.

EDIT: Here's a brute force example of doing this in Python.

import re

digits = re.compile(r'(\d+)')
def tokenize(filename):
    return tuple(int(token) if match else token
                 for token, match in
                 ((fragment, digits.search(fragment))
                  for fragment in digits.split(filename)))

# Now you can sort your PDF file names like so:
pdfList.sort(key=tokenize)

Solution 2

try putting () after pdfList.sort as in:

pdfList.sort()

The way you've got it written it won't actually sort the list. I grabbed your list of file names stuck them in an array and they sorted in the order you show them.

Solution 3

Replace pdfList.sort by

pdfList = sorted(pdfList, key = lambda x: x[:-4])

or

pdfList = sorted(pdfList, key = lambda x: x.rsplit('.', 1)[0]) to ignore file extension while sorting

Share:
19,666
Admin
Author by

Admin

Updated on June 29, 2022

Comments

  • Admin
    Admin about 2 years

    I need to combine a folder full of pdfs into one file. However they must be combined in a certain order. A sample of the file names is:

    WR_Mapbook__1.pdf  
    WR_Mapbook__1a.pdf  
    WR_Mapbook__2.pdf  
    WR_Mapbook__2a.pdf  
    WR_Mapbook__3.pdf  
    WR_Mapbook__3a.pdf  
    etc...  
    

    The way that they are sorted in windows explorer is the way I need them to be added to the a single file. However my script adds all the "a" files first, and then the files without an "a". Why does it do that? How can I sort it so that the files are added in the way I want?

    See the code below. Thanks!

    from pyPdf import PdfFileWriter, PdfFileReader  
    import glob
    
    outputLoc = "K:\\test\\pdf_output\\"
    output = PdfFileWriter()
    
    
    pdfList = glob.glob(r"K:\test\lidar_MB_ALL\*.pdf")
    pdfList.sort
    print pdfList
    for pdf in pdfList:
        print pdf
        input1 = PdfFileReader(file(pdf, "rb"))
        output.addPage(input1.getPage(0))
        # finally, write "output" to document-output.pdf
        outputStream = file(outputLoc + "WR_Imagery_LiDar_Mapbook.pdf", "wb")
        output.write(outputStream)
        print ("adding " + pdf)
    
     outputStream.close()