how to use awk/cut to get column data which has space

13,940

Solution 1

If the columns are separated by tabs you can specify the tab character as the field separator. This will prevent the default behavior of awk to treat spaces as separate columns.

cat <data file> | awk -F"\t" '{print $1, $2}'

root@ubuntu32:/tmp# cat testtext | awk -F"\t" '{print $1, $2}'
16 SQL*Plus
16 TOAD background query session

Solution 2

Liked @Costas suggestion, and another option is:

gawk '
  {
    f1=substr($0,2,2)
    f2=substr($0,4,36)
    gsub(/ *$/, "", f2)
    print f1 " " f2
  }
'

Solution 3

One way to do this could involve unexpand. The description for it and the expand utility can be found here:

  • The unexpand utility shall copy files or standard input to standard output, converting <blank> characters at the beginning of each line into the maximum number of <tab> characters followed by the minimum number of <space> characters needed to fill the same column positions originally filled by the translated <blank> characters. By default, tabstops shall be set at every eighth column position. Each <backspace> shall be copied to the output, and shall cause the column position count for tab calculations to be decremented; the count shall never be decremented to a value less than one.

You'd probably want the -a switch though.

  • -a - In addition to translating <blank> characters at the beginning of each line, translate all sequences of two or more <blank> characters immediately preceding a tab stop to the maximum number of <tab> characters followed by the minimum number of <space> characters needed to fill the same column positions originally filled by the translated <blank> characters.

It's a simple utility for converting many spaces in sequence to tabs instead. In that way you could...

unexpand -a <<\IN | cut -f1
 16 SQL*Plus                            vilconv1                  dox-conv2
 16 TOAD background query session       Disha                     WORKGROUP\AD
IN

...which prints...

 16 SQL*Plus
 16 TOAD background query session

I just use cut there, but if you wanted to you could use awk or anything else really. I only suggest it because you almost definitely already have it installed, it is very simple to use, and very fast. It solves the space problem by swapping delimiters - and it does so very easily.

I also use a here-document just to show how it works, but you'd probably want to do instead...

unexpand -a <infile | filter program
Share:
13,940

Related videos on Youtube

stackoverflow_unicorn
Author by

stackoverflow_unicorn

Updated on September 18, 2022

Comments

  • stackoverflow_unicorn
    stackoverflow_unicorn over 1 year

    I have data in below format:

     16 SQL*Plus                            vilconv1                  dox-conv2
     16 TOAD background query session       Disha                     WORKGROUP\AD
    

    now I want to get data by column, I am using below command

    awk '{print $1,$2}' 
    

    but since column 2 has spaces it;s giving me below output :

    16 SQL*Plus      
      16 TOAD
    

    whereas what I want is:

    16 SQL*Plus  
       16 TOAD background query session   
    
    • Admin
      Admin about 9 years
      Does your data fit a fixed-width format?
    • Admin
      Admin about 9 years
      Is each row delimited by tabs, or are those literal tabs? Also, is it only the second column that has spaces? Are there spaces in any other column?
    • Admin
      Admin about 9 years
      Use cut -c -40
    • Admin
      Admin about 9 years
      And maybe sed 's/ *$//' to remove trailing spaces, if that matters
    • Admin
      Admin about 9 years
      @Costas that's by far the best method if there's no tabs, why not make it an answer?
  • jasonwryan
    jasonwryan about 9 years
    No need to flog the feline; Awk can be passed a filename for input...
  • Arunas Bartisius
    Arunas Bartisius over 4 years
    thanks, this helped me a lot to find a way to extract part of string of specific column