Select a particular column using awk or cut or perl
Solution 1
If the data is unambiguously tab-separated, then cut
will cut on tabs, not spaces:
cut -f7 filename
You can certainly do that with awk
, too:
awk -F'\t' '{ print $7 }'
Solution 2
If fields are separated by tabs and your concern is that some fields contain spaces, there is no problem here, just:
cut -f 7
(cut defaults to tab delimited fields.)
Solution 3
Judging by the format of your input file, you can get away with delimiting on -
instead of spaces:
awk 'BEGIN{FS="-"} {print $2}' filename
-
FS
stands for Field Separator, just think of it as the delimiter for input. - Given that we are now delimiting on
-
, your 7th field before now becomes the 2nd field. -
Save a cat! Specify input file
filename
as an argument to awk instead.
Alternatively, if your data fields are separated by tabs, you can do it more explicitly as follows:
awk 'BEGIN{FS="\t"} {print $7}' filename
And this will resolve the issue since Out Global Doc Mark
looks to be separated by spaces.
Solution 4
This might work for you (GNU sed):
sed -r 's/(([^\t]*)\t?){7}.*/\2/' file
This substitute command selects everything in the line and returns the 7th non-tab. In sed
the last thing grouped by (...)
will be returned in the lefthand side of the substitution by using a back-reference. In this case the first back-reference would return both the non-tab characters and the tab character (if present N.B. the ?
meta-character which either one or none of the proceeding pattern).The .*
just swallows up what was left on the line if any.
Related videos on Youtube
javed
Updated on October 20, 2020Comments
-
javed over 3 years
I have a requirement to select the 7th column from a tab delimited file. eg:
cat filename | awk '{print $7}'
The issue is that the data in the 4th column has multiple values with blank in between. example - The last line in the below output:
user \Adminis FL_vol Design 0 - 1 - group 0 FL_vol Design 19324481 - 3014 - user \MAK FL_vol Design 16875161 - 2618 - tree 826 FL_vol Out Global Doc Mark 16875162 - 9618 - /vol/FL_vol/Out Global Doc Mark
-
F. Hauri - Give Up GitHub over 11 years... And whipe space:
awk 'BEGIN{FS="[ \t]*-[ \t]*"} {print $2}'
-
javed over 11 yearsSometimes the 5th column has numbers in it. It need not be a "-" all the time. Also it could be a "-" in the 4th column instead.
-
potong over 9 years@shgnInc the substitute command selects everything in the line and returns the 7th non-tab. In sed the last thing grouped by
(...)
will be returned in the lefthand side of the substitution by using a backreference. In this case the first backreference would return both the non-tab characters and the tab character (if present N.B. the?
metacharacter which either one or none of the preceeding pattern).The.*
just swallows up what was left on the line if any.