Extracting column from comma separated text

text-processing sed awk perl csv

25,787

Solution 1

awk -F , -v OFS='\t' 'NR == 1 || $6 > 4 {print $1, $6, $7, $8}' input.txt

Solution 2

I agree that awk is the best solution. You can do this just in bash with a couple of other tools:

cut -d , -f 2,6,7,8 filename | {
    read header
    tr , $'\t' <<< "$header"
    while IFS=, read -r id num4 num5 num6; do
        # bash can only do integer arithmetic
        if [[ $(bc <<< "$num4 >= 4.0") = 1 ]]; then
           printf "%s\t%s\t%s\t%s\n" "$id" "$num4" "$num5" "$num6"
        fi
    done
}

Solution 3

Really can't beat the awk script above, but here's a ruby solution,

#!/usr/bin/ruby1.9.1

puts "id\tnumber4\tnumber5\tnumber6"

ARGF.each_line do |line|
  arr = line.split(',')
  puts "#{arr[1]}\t#{arr[5]}\t#{arr[6]}\t#{arr[7]}" if arr[5].to_f > 4.0
end

To use the script call it with the filename or pipe the file into it.

Solution 4

Perl solution:

perl -F, -le '$, = "\t"; print @F[1,5,6,7] if $F[5] > 4 || $. == 1' file

-F, specifies the pattern to split on. -F implicitly sets -a

-a turns on autosplit mode when used with a -n. An implicit split command to the @F array is done as the first thing inside the implicit while loop produced by the -n. -a implicitly sets -n

-n causes Perl to assume the loop around your program, which makes it iterate over filename arguments somewhat like sed -n or awk

-l enables automatic line-ending processing. It has two separate effects. First, it automatically chomps the input record separator (\n). Second, it assigns the output record separator to \n.

-e used to enter one line of program

So, perl -F, -le '$, = "\t"; print @F[1,5,6,7] if $F[5] > 4 || $. == 1' do something like this:

use English;

$OUTPUT_RECORD_SEPARATOR = $INPUT_RECORD_SEPARATOR;

while (<>) { # iterate over each line of the each file
    chomp;
    @F = split(',');
    $OUTPUT_FIELD_SEPARATOR = "\t";
    print @F[1,5,6,7] if $F[5] > 4 || $INPUT_LINE_NUMBER == 1;
}

View more solutions

25,787

Baraskar Sandeep

Updated on September 18, 2022

Comments

Baraskar Sandeep over 1 year

I have a long comma-separated delimited file with 20K lines. Here's a sample:

"","id","number1","number2","number3","number4","number5","number6","number7"
"1","MRTAT_1of3.RTS",17.1464602742708,17.1796255746079,17.1132949739337,0.996138996138996,-0.0055810322632996,1,1
"2","MRTAT_2of3.RTS",3.88270908946253,6.13558056235995,1.62983761656512,0.265637065637066,-1.91247162787182,0.718084341158075,1
"3","MRTAT_3of3.RTS",3.87323328936623,1.22711611247199,6.51935046626046,5.31274131274131,2.40945646701554,0.676814519398334,1

I want to print like the columns with id, number4, number5 and number 6 with tab-delimited setting the condition number4 is greater than 4.0. Here's some sample output:

id         number4           number5           number6
MRTAT_3of3.RTS 5.31274131274131  2.40945646701554  0.676814519398334

Evgeny Vereshchagin almost 9 years

should be {print $2, $6, $7, $8}
roaima almost 9 years

To add value to your answer please could you take a few moments to explain how it works.
Evgeny Vereshchagin almost 9 years

@roaima, ok. one moment.
Evgeny Vereshchagin almost 9 years

@roaima, done. what do you think?