Listing directories based on size from largest to smallest on single line

8,538

Solution 1

If you are confident that the directory names do not contain whitespace, then it is simple to get all the directory names on one line:

du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2" "}'

Getting the information into python

If you want to capture that output in a python program and make it into a list. Using python2.7 or better:

import subprocess
dir_list = subprocess.check_output("du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2\" \"}'", shell=True).split()

In python2.6:

import subprocess
subprocess.Popen("du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2\" \"}'", shell=True, stdout=subprocess.PIPE).communicate()[0].split()

We can also take advantage of python's features to reduce the amount of work done by the shell and, in particular, to eliminate the need for awk:

subprocess.Popen("du -sk [a-z]*/ | sort -nr", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0].split()[1::2]

One could go further and read the du output directly into python, convert the sizes to integers, and sort on size. It is simpler, though, just to do this with sort -nr in the shell.

Specifying a directory

If the directories whose size you want are not in the current directory, there are two possibilities:

du -sk /some/path/[a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2" "}'

and also:

cd /some/path/ && du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2" "}'

The difference between these two is whether /some/path is included in the output or not.

Solution 2

Using paste

du -sk [a-z]* 2>/dev/null | sort -nr| cut -f2- | paste -s -

Solution 3

zsh has the ability to sort its globs using globbing qualifiers. You can also define your own glob qualifiers with functions. For instance:

zdu() REPLY=$(du -s -- "$REPLY")

print -r -- [[:alpha:]]*(/nO+zdu)

would print the directories (/) whose name starts with a letter (btw, [a-z] only makes sense in the C locale) numerically (n) reverse sorted (O) using the zdu function.

Note that when you do:

du -s a b

If a and b contain hardlinks to the same files, their disk usage will be counted for a but not for b. The zsh approach here avoids that.

If you're going to use python, I'd do the same from there: call du -s for each of the files, and sort that list there. Remember that file names can contain any character including space, tab and newline.

Share:
8,538

Related videos on Youtube

etho201
Author by

etho201

Updated on September 18, 2022

Comments

  • etho201
    etho201 over 1 year

    I can use the following command to get a list of directories and their sizes and sort them from largest to smallest (in the example I renamed the directories to numbers to make this easier to understand).

    $: du -sk [a-z]* 2>/dev/null | sort -nr
    413096  one
    106572  two
    97452   three
    76428   four
    55052   five
    45068   six
    33680   seven
    23220   eight
    17716   nine
    

    I'm writing a program that requires input of these directories from largest to smallest, but for matters of convenience it needs them all on one line. Is there a command that will allow me to sort the directories from largest to smallest on one line without the size?

    I would like the output to be like this:

    one two three four five six seven eight nine
    
    • etho201
      etho201 almost 10 years
      Better yet... Since I will be pasting the one line into Python and splitting it into a list, is there any way I can just have Python check a specified directory for the directories within and sort them by the size of its contents, and produce a list like this: [one, two, three, four, five, six, seven, eight, nine]?
  • etho201
    etho201 almost 10 years
    Oh I found a way... I can just use: cd dir && du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2" "}'
  • John1024
    John1024 almost 10 years
    @user2554129 Good. I added that and an alternative to the answer.
  • etho201
    etho201 almost 10 years
    AttributeError: 'module' object has no attribute 'check_output' --> I should have mentioned I'm constrained to using Python 2.6. It doesn't look like "check_output" works on 2.6. I read somewhere to use subprocess.Popen() but that doesn't seem to work as expected. Any ideas what I am doing wrong?
  • etho201
    etho201 almost 10 years
    I like the nice clean list that produces. Thank you!
  • etho201
    etho201 almost 10 years
    I like the alternate method, and that it is clean and easy to understand... Is this more efficient than using "du -sk [a-z]* 2>/dev/null | sort -nr | awk '{print $2}' | sed ':a;N;$!ba;s/\n/ /g'" ?
  • Neven
    Neven almost 10 years
    No problem glad to help. Can you please just then mark this as the answer. If this is what you looking for, of course. :)
  • Stéphane Chazelas
    Stéphane Chazelas almost 10 years
    @user2554129, yes, and more portable and more reliable.
  • etho201
    etho201 almost 10 years
    I'm really new to Python so I'm having trouble understanding what you're trying to say, but it seems good because it is capable of accurately recognizing files names with spaces. If you had the folders located here: $: cd $ORACLE_BASE/admin $: du -sk [a-z]* 2>/dev/null | sort -n 14994 two words 12194 oneword 1692 one 1499 two 1432 this folder 1300 three 1129 four How would you do what you're suggesting with Python to get a sorted list like this: [two words, oneword, one, two, this folder, three, four]
  • Bernhard
    Bernhard almost 10 years
    I would use tr instead of paste