How to use regex with cut at the command line?
Solution 1
This answer tackles the question as asked, but consider George Vasiliou's helpful find
solution as a potentially superior alternative.
cut
only supports a single, literal character as the delimiter (-d
), so it isn't the right tool to use.For extracting tokens (fields) that are separated with a variable amount of whitespace per line,
awk
is the best tool, so the solution proposed by George Vasiliou is the simplest one:
ls -alth | awk '{print $5}'
extracts the 5th whitespace-separated field ($5
), which is the size.Rather than use
-h
first and then reconvert the human-readable suffixes (such asB
,M
, andG
) back to the mere byte counts (incidentally, the multipliers must be multiples of1024
, not1000
), simply omit-h
from thels
command, which outputs the raw byte counts by default:
ls -alt | awk '{print $5}'
Solution 2
Alternative to the awk solution that will treat whitespace correctly , one can also use the find
utility that can provide results similar to ls
.
Actually you can use find
to display directly size of the results without the need of any other tool/pipe like cut
or awk
.
So, to list mere bytes you can use:
$ find . -maxdepth 1 -printf %s\\n
173
3
684
You can combine filename + bytes in find with
$ find . -maxdepth 1 -printf %f-%s\\n
bsd.txt-173
file4-3
shellcolors.sh-684
You can consult man find
to see a lot of available options under -printf
.
Moreover, by removing -maxdepth
option you can also have a listing of all the files in the subdirectories.
One more alternative is to use du
utility, that is capable to provide results in human readable format:
$ du -a -b -h -d1
1.9M ./appsfiles
173 ./bsd.txt
3 ./file4
684 ./shellcolors.sh
-a
: all files and directories. Remove this option to get only directories size
-b
: Reports the real size of file - Removing this option will report the disk size occupied by this file (i.e a file of 3 kB occupies 4K in reality)
-h
: human readable size
-d1
: depth1
You can further parse the results of du with |cut -d" " -f1
or with |awk '{print $1}'
Solution 3
I was getting annoyed with having to look up awk(ward) syntax and wrote my own:
https://www.npmjs.com/package/cutr
Install with
npm i -g cutr
ls --full-time | cutr -d ' +' -f 6-
or run with something like
ls --full-time | npx cutr -d ' +' -f 6-
Your command could be
ls -alth | cutr -f 5 -d '\s+'
Comments
-
makansij almost 2 years
I have some output like this from
ls -alth
:drwxr-xr-x 5 root admin 170B Aug 3 2016 .. drwxr-xr-x 5 root admin 70B Aug 3 2016 .. drwxr-xr-x 5 root admin 3B Aug 3 2016 .. drwxr-xr-x 5 root admin 9M Aug 3 2016 ..
Now, I want to parse out the
170B
part, which is obviously the size in human readable format. I wanted to do this usingcut
orsed
, because I don't want to use tools that are any more complicated/difficult to use than necessary.Ideally I want it to be robust enough to handle the
B
,M
orK
suffix that comes with the size, and multiply accordingly by1
,1000000
and1000
accordingly. I haven't found a good way to do that, though.I've tried a few things without really knowing the best approach:
ls -alth | cut -f 5 -d \s+
I was hoping that would work because I'd be able to just delimit it on one or more spaces.
But that doesn't work. How do I supply
cut
with a regex delimiter? or is there an easier way to extract only the size of the file fromls -alth
?I'm using CentOS6.4
-
makansij about 7 yearsThat's actually a great ida to omit the
-h
flag fromls -alth
. I hadn't thought of that.