how to use awk/cut to get column data which has space

sed awk cut

13,940

Solution 1

If the columns are separated by tabs you can specify the tab character as the field separator. This will prevent the default behavior of awk to treat spaces as separate columns.

cat <data file> | awk -F"\t" '{print $1, $2}'

root@ubuntu32:/tmp# cat testtext | awk -F"\t" '{print $1, $2}'
16 SQL*Plus
16 TOAD background query session

Solution 2

Liked @Costas suggestion, and another option is:

gawk '
  {
    f1=substr($0,2,2)
    f2=substr($0,4,36)
    gsub(/ *$/, "", f2)
    print f1 " " f2
  }
'

Solution 3

One way to do this could involve unexpand. The description for it and the expand utility can be found here:

The unexpand utility shall copy files or standard input to standard output, converting <blank> characters at the beginning of each line into the maximum number of <tab> characters followed by the minimum number of <space> characters needed to fill the same column positions originally filled by the translated <blank> characters. By default, tabstops shall be set at every eighth column position. Each <backspace> shall be copied to the output, and shall cause the column position count for tab calculations to be decremented; the count shall never be decremented to a value less than one.

You'd probably want the -a switch though.

-a - In addition to translating <blank> characters at the beginning of each line, translate all sequences of two or more <blank> characters immediately preceding a tab stop to the maximum number of <tab> characters followed by the minimum number of <space> characters needed to fill the same column positions originally filled by the translated <blank> characters.

It's a simple utility for converting many spaces in sequence to tabs instead. In that way you could...

unexpand -a <<\IN | cut -f1
 16 SQL*Plus                            vilconv1                  dox-conv2
 16 TOAD background query session       Disha                     WORKGROUP\AD
IN

...which prints...

 16 SQL*Plus
 16 TOAD background query session

I just use cut there, but if you wanted to you could use awk or anything else really. I only suggest it because you almost definitely already have it installed, it is very simple to use, and very fast. It solves the space problem by swapping delimiters - and it does so very easily.

I also use a here-document just to show how it works, but you'd probably want to do instead...

unexpand -a <infile | filter program

13,940

stackoverflow_unicorn

Updated on September 18, 2022

Comments

stackoverflow_unicorn over 1 year
I have data in below format:
```
 16 SQL*Plus                            vilconv1                  dox-conv2
 16 TOAD background query session       Disha                     WORKGROUP\AD
```
now I want to get data by column, I am using below command
```
awk '{print $1,$2}' 
```
but since column 2 has spaces it;s giving me below output :
```
16 SQL*Plus      
  16 TOAD
```
whereas what I want is:
```
16 SQL*Plus  
   16 TOAD background query session   
```
- Admin about 9 years
  
  Does your data fit a fixed-width format?
- Admin about 9 years
  
  Is each row delimited by tabs, or are those literal tabs? Also, is it only the second column that has spaces? Are there spaces in any other column?
- Admin about 9 years
  
  Use cut -c -40
- Admin about 9 years
  
  And maybe sed 's/ *$//' to remove trailing spaces, if that matters
- Admin about 9 years
  
  @Costas that's by far the best method if there's no tabs, why not make it an answer?
jasonwryan about 9 years

No need to flog the feline; Awk can be passed a filename for input...
Arunas Bartisius over 4 years

thanks, this helped me a lot to find a way to extract part of string of specific column