Sorting a file based on one column using Unix and Awk

6,266

A possible solution is to get each two lines together, sort, then split again the joined lines

awk '{ getline line; print $0, line }' input_file | 
    sort -k6,6nr -k15,15nr | 
    awk '{ $10 = "\n" $10; print }'
Share:
6,266

Related videos on Youtube

Namrata
Author by

Namrata

I am currently at the Swiss Institute of Bioinformatics (SIB) and UNIL,Switzerland after completing a Masters in Bioinformatics from King's College London, UK. I work with Next Generation Sequencing Data Analysis and Genome Assemblies.

Updated on September 18, 2022

Comments

  • Namrata
    Namrata over 1 year

    I need to sort the input file according the 6th column, which is the score.

    Input File:

    Sc2/80  20 . A T 86 Pass N=2 F=5;U=4
    Sc2/80  20 . A C 80 Pass N=2 F=5;U=4
    Sc2/60  55 . G T 90 Pass N=2 F=5;U=4
    Sc2/60  55 . G C 99 Pass N=2 F=5;U=4
    Sc2/20  39 . C T 97 Pass N=2 F=5;U=4
    Sc2/20  39 . C A 99 Pass N=2 F=5;U=4
    

    Expected Output:

    Sc2/20 39 . C T 97 Pass N=2 F=5;U=4
    Sc2/20 39 . C A 99 Pass N=2 F=5;U=4
    Sc2/60 55 . G T 90 Pass N=2 F=5;U=4
    Sc2/60 55 . G C 99 Pass N=2 F=5;U=4
    Sc2/80 20 . A T 86 Pass N=2 F=5;U=4
    Sc2/80 20 . A C 80 Pass N=2 F=5;U=4
    

    Logic: All the even lines of the input file should be compared and ranked according to the score (Descending Order) and the corresponding odd line of the file should be printed as well with it. If any of the scores (of the even lines) are equal then we need to look at the score of the corresponding odd line and therefore, the higher score gets priority and is printed first.

    • Kartik
      Kartik almost 11 years
      This seems related to some DNA (AGCT). Is it really related to DNA of some kind?
    • Namrata
      Namrata almost 11 years
      @Kartik : Yes, you are right. This work is a part of genome data analysis.