How to use regex with cut at the command line?

11,640

Solution 1

This answer tackles the question as asked, but consider George Vasiliou's helpful find solution as a potentially superior alternative.

  • cut only supports a single, literal character as the delimiter (-d), so it isn't the right tool to use.

  • For extracting tokens (fields) that are separated with a variable amount of whitespace per line, awk is the best tool, so the solution proposed by George Vasiliou is the simplest one:
    ls -alth | awk '{print $5}'
    extracts the 5th whitespace-separated field ($5), which is the size.

  • Rather than use -h first and then reconvert the human-readable suffixes (such as B, M, and G) back to the mere byte counts (incidentally, the multipliers must be multiples of 1024, not 1000), simply omit -h from the ls command, which outputs the raw byte counts by default:
    ls -alt | awk '{print $5}'

Solution 2

Alternative to the awk solution that will treat whitespace correctly , one can also use the find utility that can provide results similar to ls.

Actually you can use find to display directly size of the results without the need of any other tool/pipe like cut or awk.

So, to list mere bytes you can use:

$ find . -maxdepth 1 -printf %s\\n
173
3
684

You can combine filename + bytes in find with

$ find . -maxdepth 1 -printf %f-%s\\n
bsd.txt-173
file4-3
shellcolors.sh-684

You can consult man find to see a lot of available options under -printf.

Moreover, by removing -maxdepth option you can also have a listing of all the files in the subdirectories.

One more alternative is to use du utility, that is capable to provide results in human readable format:

$ du -a -b -h -d1
1.9M    ./appsfiles
173 ./bsd.txt
3   ./file4
684 ./shellcolors.sh

-a : all files and directories. Remove this option to get only directories size
-b : Reports the real size of file - Removing this option will report the disk size occupied by this file (i.e a file of 3 kB occupies 4K in reality)
-h : human readable size
-d1 : depth1

You can further parse the results of du with |cut -d" " -f1 or with |awk '{print $1}'

Solution 3

I was getting annoyed with having to look up awk(ward) syntax and wrote my own:

https://www.npmjs.com/package/cutr

Install with

npm i -g cutr
ls --full-time | cutr -d ' +' -f 6-

or run with something like

ls --full-time | npx cutr -d ' +' -f 6-

Your command could be

ls -alth | cutr -f 5 -d '\s+'
Share:
11,640
makansij
Author by

makansij

I'm a PhD Student at University of Southern California.

Updated on June 05, 2022

Comments

  • makansij
    makansij almost 2 years

    I have some output like this from ls -alth:

    drwxr-xr-x    5 root    admin   170B Aug  3  2016 ..
    drwxr-xr-x    5 root    admin    70B Aug  3  2016 ..
    drwxr-xr-x    5 root    admin     3B Aug  3  2016 ..
    drwxr-xr-x    5 root    admin     9M Aug  3  2016 ..
    

    Now, I want to parse out the 170B part, which is obviously the size in human readable format. I wanted to do this using cut or sed, because I don't want to use tools that are any more complicated/difficult to use than necessary.

    Ideally I want it to be robust enough to handle the B, M or K suffix that comes with the size, and multiply accordingly by 1, 1000000 and 1000 accordingly. I haven't found a good way to do that, though.

    I've tried a few things without really knowing the best approach:

    ls -alth | cut -f 5 -d \s+
    

    I was hoping that would work because I'd be able to just delimit it on one or more spaces.

    But that doesn't work. How do I supply cut with a regex delimiter? or is there an easier way to extract only the size of the file from ls -alth?

    I'm using CentOS6.4

  • makansij
    makansij about 7 years
    That's actually a great ida to omit the -h flag from ls -alth. I hadn't thought of that.