Select a particular column using awk or cut or perl

87,581

Solution 1

If the data is unambiguously tab-separated, then cut will cut on tabs, not spaces:

cut -f7 filename

You can certainly do that with awk, too:

awk -F'\t' '{ print $7 }'

Solution 2

If fields are separated by tabs and your concern is that some fields contain spaces, there is no problem here, just:

cut -f 7

(cut defaults to tab delimited fields.)

Solution 3

Judging by the format of your input file, you can get away with delimiting on - instead of spaces:

awk 'BEGIN{FS="-"} {print $2}' filename
  • FS stands for Field Separator, just think of it as the delimiter for input.
  • Given that we are now delimiting on -, your 7th field before now becomes the 2nd field.
  • Save a cat! Specify input file filename as an argument to awk instead.

Alternatively, if your data fields are separated by tabs, you can do it more explicitly as follows:

awk 'BEGIN{FS="\t"} {print $7}' filename

And this will resolve the issue since Out Global Doc Mark looks to be separated by spaces.

Solution 4

This might work for you (GNU sed):

sed -r 's/(([^\t]*)\t?){7}.*/\2/' file

This substitute command selects everything in the line and returns the 7th non-tab. In sed the last thing grouped by (...) will be returned in the lefthand side of the substitution by using a back-reference. In this case the first back-reference would return both the non-tab characters and the tab character (if present N.B. the ? meta-character which either one or none of the proceeding pattern).The .* just swallows up what was left on the line if any.

Share:
87,581

Related videos on Youtube

javed
Author by

javed

Updated on October 20, 2020

Comments

  • javed
    javed over 3 years

    I have a requirement to select the 7th column from a tab delimited file. eg:

    cat filename | awk '{print $7}'
    

    The issue is that the data in the 4th column has multiple values with blank in between. example - The last line in the below output:

    user  \Adminis FL_vol Design         0         -       1       -
    group        0 FL_vol Design   19324481         -    3014       -
    user      \MAK FL_vol Design   16875161         -    2618       -
    tree       826 FL_vol Out Global Doc Mark     16875162         -    9618       - /vol/FL_vol/Out Global Doc Mark
    
  • F. Hauri  - Give Up GitHub
    F. Hauri - Give Up GitHub over 11 years
    ... And whipe space: awk 'BEGIN{FS="[ \t]*-[ \t]*"} {print $2}'
  • javed
    javed over 11 years
    Sometimes the 5th column has numbers in it. It need not be a "-" all the time. Also it could be a "-" in the 4th column instead.
  • potong
    potong over 9 years
    @shgnInc the substitute command selects everything in the line and returns the 7th non-tab. In sed the last thing grouped by (...) will be returned in the lefthand side of the substitution by using a backreference. In this case the first backreference would return both the non-tab characters and the tab character (if present N.B. the ? metacharacter which either one or none of the preceeding pattern).The .* just swallows up what was left on the line if any.