Non-alphanumeric list order from os.listdir()

199,273

Solution 1

I think the order has to do with the way the files are indexed on your FileSystem. If you really want to make it adhere to some order you can always sort the list after getting the files.

Solution 2

You can use the builtin sorted function to sort the strings however you want. Based on what you describe,

sorted(os.listdir(whatever_directory))

Alternatively, you can use the .sort method of a list:

lst = os.listdir(whatever_directory)
lst.sort()

I think should do the trick.

Note that the order that os.listdir gets the filenames is probably completely dependent on your filesystem.

Solution 3

Per the documentation:

os.listdir(path)

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.

Order cannot be relied upon and is an artifact of the filesystem.

To sort the result, use sorted(os.listdir(path)).

Solution 4

Python for whatever reason does not come with a built-in way to have natural sorting (meaning 1, 2, 10 instead of 1, 10, 2), so you have to write it yourself:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

You can now use this function to sort a list:

dirlist = sorted_alphanumeric(os.listdir(...))

PROBLEMS: In case you use the above function to sort strings (for example folder names) and want them sorted like Windows Explorer does, it will not work properly in some edge cases.
This sorting function will return incorrect results on Windows, if you have folder names with certain 'special' characters in them. For example this function will sort 1, !1, !a, a, whereas Windows Explorer would sort !1, 1, !a, a.

So if you want to sort exactly like Windows Explorer does in Python you have to use the Windows built-in function StrCmpLogicalW via ctypes (this of course won't work on Unix):

from ctypes import wintypes, windll
from functools import cmp_to_key

def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

This function is slightly slower than sorted_alphanumeric().

Bonus: winsort can also sort full paths on Windows.

Alternatively, especially if you use Unix, you can use the natsort library (pip install natsort) to sort by full paths in a correct way (meaning subfolders at the correct position).

You can use it like this to sort full paths:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

Starting with version 7.1.0 natsort supports os_sorted which internally uses either the beforementioned Windows API or Linux sorting and should be used instead of natsorted().

Solution 5

I think by default the order is determined with the ASCII value. The solution to this problem is this

dir = sorted(os.listdir(os.getcwd()), key=len)
Share:
199,273

Related videos on Youtube

marshall.ward
Author by

marshall.ward

I call myself a scientist, but I spend all day reading programming language standards.

Updated on January 27, 2022

Comments

  • marshall.ward
    marshall.ward over 2 years

    I often use python to process directories of data. Recently, I have noticed that the default order of the lists has changed to something almost nonsensical. For example, if I am in a current directory containing the following subdirectories: run01, run02, ... run19, run20, and then I generate a list from the following command:

    dir = os.listdir(os.getcwd())
    

    then I usually get a list in this order:

    dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]
    

    and so on. The order used to be alphanumeric. But this new order has remained with me for a while now.

    What is determining the (displayed) order of these lists?

  • Joachim Sauer
    Joachim Sauer over 13 years
    The reason that you're seeing soted output may depend on a lot of factors, such as OS, filesystem, time of creation of files, actions during the last defragmentation, ...
  • Lavaman65
    Lavaman65 over 11 years
    This explains why they are seeing the behaviour, without offering a solution.
  • Denis
    Denis over 11 years
    OP just want to know why, not how.
  • Dimitris
    Dimitris over 11 years
    @Denis thanks for pointing this out - I didn't notice it before
  • Denis
    Denis over 11 years
    @DanielWatkins OK, Not it isnt.)
  • Elliot
    Elliot almost 10 years
    Does not change the order if dealing with number-first filenames (ie 59.9780radps-0096 is still before 9.9746radps-0082). I think it's because everything is a string, so the decimal is not treated properly.
  • mgilson
    mgilson almost 10 years
    @Elliot -- Correct, It's sorting lexicographically as strings. To get it to sort some other way, you'd need to define a key function that determined the sort order. In your case, you'd want the key function to look at the string and return 59.9780 or 9.9746 (as float) for your filenames respectively.
  • Elliot
    Elliot almost 10 years
    Or use the natsort library, which I just found.
  • AXO
    AXO over 8 years
    This is expected behavior. ('5' > '403') is True.
  • Andrew
    Andrew almost 7 years
    @AXO is correct, because at this point you're comparing the alphanumeric sort, not quantitative values of the numbers. In order to get a sort similar to your expectation, you may want to use number padding on your folders... ['002', '003', '004', '005', '403', '404', '405', '406']
  • paul_h
    paul_h over 6 years
    Only sorted(listdir) worked for me. listdir.sort() gave me: TypeError: 'NoneType' object is not iterable
  • Sean_Syue
    Sean_Syue almost 6 years
    @paul_h -- listdir.sort() won't work for statements like for i in listdir.sort(), because list.sort() method change the order of items in lists IN PLACE, which means process the list itself but won't return anything but None. So you need to use a_list = listdir('some_path'); a_list.sort() then do for i in a_list
  • Alex B
    Alex B almost 6 years
    Do you know how to change the order to ascending or descending using .sort ?
  • mgilson
    mgilson almost 6 years
    @AlexB -- sure ... just pass reverse=True to make it descending sort.
  • Farid Alijani
    Farid Alijani over 4 years
    That is more accurate than sorted()! Thanks
  • user136036
    user136036 over 4 years
    Works perfectly fine. print( sorted_aphanumeric(["1", "10", "2", "foo_10", "foo_8"]) ) -> ['1', '2', '10', 'foo_8', 'foo_10']. Exactly as expected.
  • SethMMorton
    SethMMorton about 4 years
    There is a longstanding open issue on natsorted to get Windows Explorer matching functionality implemented. Perhaps you should contribute a solution? github.com/SethMMorton/natsort/issues/41
  • user3895596
    user3895596 about 4 years
    @mgilson is it possible to do it in like one line? something like lst = os.listdir(whatever_directory).sort() - this of course will just make lst = None, but do we need to do it in two lines?
  • mgilson
    mgilson about 4 years
    @user3895596 -- I think that the sorted thing written first does it in a single line OK?
  • Amit Amola
    Amit Amola almost 4 years
    Unarguably best answer here.
  • Spider999
    Spider999 over 3 years
    None of the above worked for me, that "key-len" seemed to be the last remaining trick, thanks so much.
  • Puddle
    Puddle over 3 years
    oh wow that sure solves the problem doesn't it. it just doesn't get sorted. accept it. what a genius answer! so useful! you deserve a ton of reputation for this!
  • Elegant Code
    Elegant Code over 3 years
    @Puddle That's very kind of you. Thank you very much.
  • Amin Guermazi
    Amin Guermazi over 3 years
    The winsort function was exactly what I needed :)
  • Paloha
    Paloha about 3 years
    You can use key in sorted to parse more complex filenames. A simple example of sorting a list like this ['0001.txt', '0002.txt'] is: sorted(os.listdir(path), key=lambda filename: int(filename.split('.')[0]))