Easily get a particular column from output without sed or awk

text-processing sed awk columns

20,297

Solution 1

I'm not sure why

ls -hal / | awk '{print $5, $9}'

seems to you to be much more disruptive to your thought processes than

ls -hal / | cut -d'\s' -f5,9

would have been, had it worked. Would you really have to write that down? It only takes a few awk lines before adding the {} becomes automatic. (For me the hardest issue is remembering which field number corresponds to which piece of data, but perhaps you don't have that problem.)

You don't have to use all of awk's features; for simply outputing specific columns, you need to know very little awk.

The irritating issue would have been if you'd wanted to output the symlink as well as the filename, or if your filenames might have spaces in them. (Or, worse, newlines). With the hypothetical regex-aware cut, this is not a problem (except for the newlines); you would just replace -f5,9 with -f5,9-. However, there is no awk syntax for "fields 9 through to the end", and you're left with having to remember how to write a for loop.

Here's a little shell script which turns cut-style -f options into an awk program, and then runs the awk program. It needs much better error-checking, but it seems to work. (Added bonus: handles the -d option by passing it to the awk program.)

#!/bin/bash
prog=\{
while getopts f:d: opt; do
  case $opt in
    f) IFS=, read -ra fields <<<"$OPTARG"
       for field in "${fields[@]}"; do
         case $field in
           *-*) low=${field%-*}; high=${field#*-}
                if [[ -z $low  ]]; then low=1; fi
                if [[ -z $high ]]; then high=NF; fi
                ;;
            "") ;;
             *) low=$field; high=$field ;;
         esac
         if [[ $low == $high ]]; then
           prog+='printf "%s ", $'$low';'
         else
           prog+='for (i='$low';i<='$high';++i) printf "%s ", $i;'
         fi
       done
       prog+='printf "\n"}'
       ;;
    d) sep="-F$OPTARG";;
    *) exit 1;;
  esac
done
if [[ -n $sep ]]; then
  awk "$sep" "$prog"
else
  awk "$prog"
fi

Quick test:

$ ls -hal / | ./cut.sh -f5,9-
7.0K bin 
5.0K boot 
4.2K dev 
9.0K etc 
1.0K home 
8.0K host 
33 initrd.img -> /boot/initrd.img-3.2.0-51-generic 
33 initrd.img.old -> /boot/initrd.img-3.2.0-49-generic 
...

Solution 2

I believe that there are no simpler solution than sed or awk. But you can write your own function.

Here is list function (copy paste to your terminal):

function list() { ls -hal $1 | awk '{printf "%-10s%-30s\n", $5, $9}'; }

then use list function:

list /

list /etc

Solution 3

You can't just talk about "columns" without also explaining what a column is!

Very common in unix text processing is having whitespace as the column (field) separator and (naturally) newline as the row or record separator. Then awk is an excellent tool, that is very readable as well:

# for words (columns) 5 and 9:
ls -lah | awk '{print $5 " " $9}'
# or this, for the fifth and the last word:
ls -lah | awk '{print $5 " " $NF}'

If the columns are instead ordered character-wise, perhaps cut -c is better.

ls -lah | cut -c 31-33,46-

You can tell awk to use other field separators with the -F option. If you don't use -c (or -b) with cut, use -f to specify which columns to output.

The trick is knowledge about the input

Generally speaking, it's not always a good idea to parse output of ls, df, ps and similar tools with text-processing tools, at least not if you wish to be portable/compatible. In those cases, try to force the output in a POSIX-defined format. Sometimes this can be achieved by passing a certain option (-P perhaps) to the command generating the output. Sometimes by setting an environment variable such as POSIXLY_CORRECT or calling a specific binary, such as /usr/xpg4/bin/ls.

Solution 4

This is an old question, but at the risk of rocking the boat, I have to say I agree with @iconoclast: there really ought to be a good, simple way of extracting selected columns in Unix.

Now, yes, awk can easily do this, it's true. But I think it's also true that it's "overkill" for a simple, common task. And even if the overkill factor isn't a concern, the extra typing certainly is: given how often I have columns to extract, I'd really rather not have to type print, and those braces, and those dollar signs, and those quotes, every time. And if the existence of awk and sed really imply that we don't need a simple column extractor, then by the same token I guess we don't need grep, either!

The cut utility ought to be the answer, but it's sadly broken. Its default is not "whitespace separated columns", despite the (IMO) overwhelming predominance of that need. And, in fact, it can't do arbitrary whitespace separation at all! (Thus iconoclast's question.)

Something like 35 years ago, before I'd even heard of cut, I wrote my own version. It works well; I'm still using it every day; I commend it to anyone who would like a better cut and who isn't hung up on using only "standard" tools. It's got one significant drawback in that its name, "column", has since been taken by a BSD utility.

Anyway, with this version of column in hand, my answer to iconoclast's question is

ls -hal / | column 5 9

or if you wish

ls -hal / | column 5,9

Man page and source tarball at http://www.eskimo.com/~scs/src/#column . Use it if you're interested; ignore it as the off-topic answer I suppose this is if you're not.

Solution 5

If you just want to display these two attributes (size and name), you can also use the stat tool (which is designed for just that - querying file attributes):

stat -c "%s  %n" .* *

will display the size and name of all files (including "hidden" files) in the current directory.

Notice: You chose ls as one application example for the use cases where you want to extract specific columns from the output of a program. Unfortunately, ls is one of the examples where you should really avoid using text-processing tools to parse the output.

View more solutions

20,297

iconoclast

Contractor at Infinite Red, a mobile app & web site design & development company with employees worldwide, experts in React Native, Rails, Phoenix, and all things JavaScript!

Updated on September 18, 2022

Comments

iconoclast over 1 year
Is there a quicker way of getting a couple of column of values than futzing with sed and awk?

For instance, if I have the output of ls -hal / and I want to just get the file and directory names and sizes, how can I easily and quickly doing that, without having to spend several minutes tweaking my command.
```
total 16078
drwxr-xr-x    33 root  wheel   1.2K Aug 13 16:57 .
drwxr-xr-x    33 root  wheel   1.2K Aug 13 16:57 ..
-rw-rw-r--     1 root  admin    15K Aug 14 00:41 .DS_Store
d--x--x--x     8 root  wheel   272B Jun 20 16:40 .DocumentRevisions-V100
drwxr-xr-x+    3 root  wheel   102B Mar 27 12:26 .MobileBackups
drwx------     5 root  wheel   170B Jun 20 15:56 .Spotlight-V100
d-wx-wx-wt     2 root  wheel    68B Mar 27 12:26 .Trashes
drwxrwxrwx     4 root  wheel   136B Mar 30 20:00 .bzvol
srwxrwxrwx     1 root  wheel     0B Aug 13 16:57 .dbfseventsd
----------     1 root  admin     0B Aug 16  2012 .file
drwx------  1275 root  wheel    42K Aug 14 00:05 .fseventsd
drwxr-xr-x@    2 root  wheel    68B Jun 20  2012 .vol
drwxrwxr-x+  289 root  admin   9.6K Aug 13 10:29 Applications
drwxrwxr-x     7 root  admin   238B Mar  5 20:47 Developer
drwxr-xr-x+   69 root  wheel   2.3K Aug 12 21:36 Library
drwxr-xr-x@    2 root  wheel    68B Aug 16  2012 Network
drwxr-xr-x+    4 root  wheel   136B Mar 27 12:17 System
drwxr-xr-x     6 root  admin   204B Mar 27 12:22 Users
drwxrwxrwt@    6 root  admin   204B Aug 13 23:57 Volumes
drwxr-xr-x@   39 root  wheel   1.3K Jun 20 15:54 bin
drwxrwxr-t@    2 root  admin    68B Aug 16  2012 cores
dr-xr-xr-x     3 root  wheel   4.8K Jul  6 13:08 dev
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 etc -> private/etc
dr-xr-xr-x     2 root  wheel     1B Aug 12 21:41 home
-rw-r--r--@    1 root  wheel   7.8M May  1 20:57 mach_kernel
dr-xr-xr-x     2 root  wheel     1B Aug 12 21:41 net
drwxr-xr-x@    6 root  wheel   204B Mar 27 12:22 private
drwxr-xr-x@   68 root  wheel   2.3K Jun 20 15:54 sbin
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 tmp -> private/tmp
drwxr-xr-x@   13 root  wheel   442B Mar 29 23:32 usr
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 var -> private/var
```
I realize there are a bazillion options for ls and I could probably do it for this particular example that way, but this is a general problem and I'd like a general solution to getting specific columns easily and quickly.

cut doesn't cut it because it doesn't take a regular expression, and I virtually never have the situation where there's a single space delimiting columns. This would be perfect if it would work:
```
ls -hal / | cut -d'\s' -f5,9
```
awk and sed are more general than I want, basically entire languages unto themselves. I have nothing against them, it's just that unless I've recently being doing a lot with them, it requires a pretty sizable mental shift to start thinking in their terms and write something that works. I'm usually in the middle of thinking about some other problem I'm trying to solve and suddenly having to solve a sed/awk problem throws off my focus.

Is there a flexible shortcut to achieving what I want?
- Admin about 10 years
  
  futz |fəts| verb [ no obj. ] informal waste time; idle or busy oneself aimlessly; Getting to know sed and awk is in no way futzing my friend. If it is anything it is opposite as it saves many many hours.
- Admin about 10 years
  
  That's an overly-narrow definition of "futz". Would you prefer I used "fiddle"? I'm in no way disputing the value of sed or awk, just pointing out that I don't want to shift focus from one thing to another.
- Admin almost 4 years
  
  Opposed to most others in this thread, I think this is a good idea. AWK and SED are hard to get into, if you are new to this type of language. Especially, if you don't use it much. I use both from time to time, not too often, but definitely regularly and it is definitely not really easy to handle it, especially after a long break from them. I guess it would be helpful if there were a awk/sed wrapper with a much simpler API. That would be 1. a possible 2. not bad solution, I think.
- Admin over 2 years
  
  See also du -ahd1 for this particular task. Disk usage, -a to include files, -h for human-readable, -d1 for max depth 1.
iconoclast about 10 years

Actually, the problem here is that the function doesn't take arguments, so it's very narrowly focussed on a specific case, and is not flexible. If you could work out the quoting so that you can pass arguments into it, that would be useful. And of course remove everything before the pipe, so you can use it with any arbitrary input, and specify which columns you want.
Zac Thompson over 6 years

in other words, please re-create awk but give it a different name and fewer features
iconoclast over 4 years

@ZacThompson: yeah, like 0.1% of the features, and a much simpler API. That would be quite useful.
iconoclast over 4 years

Awk is quite literally another language with different syntax. Would you understand if we were talking about AppleScript instead? Or take human languages: no matter how well you know another human language, switching back and forth requires extra effort, unless you have a lot of practice making that switch frequently.
iconoclast over 2 years

Thanks this is interesting but not relevant to the question of getting columns of data. I only gave that ls example for the sake of having something concrete to talk about. I'm concerned with easily getting columns from shell output, not with that particular use case.
AdminBee over 2 years

@iconoclast I see, but unfortunately your example (the output of ls) is one use case where you really should avoid using text-processing tools for parsing.
iconoclast over 2 years

Pretend this doesn't use ls. ls is 1000% irrelevant to the question.
Stephen Kitt over 2 years

Your first command doesn’t use find. Your second option misses the point of the question, which is about generic extraction of column information, not the specific file information from ls.
αғsнιη over 2 years

also your awk command says the same thing as mentioned in other existing answers