Sort a text file according to a character within a field

18,519

Solution 1

To sort on a specific character within a field (i.e. a block of character surrounded by blank characters). You can use this specific syntax :

sort -k 1.4 file

This will sort on the fourth character of the file. See https://stackoverflow.com/questions/12383706/unix-sort-on-column-without-separator for details.

If you experience counterintuitive results while playing with -k, add the option -b. This will make sort ignore the blanks. So

sort -b -k 2.2 file

gives what you want : second character of second field, ignoring blanks.

Solution 2

Third field means third field when separated by spaces (standard shell word splitting). This is exactly what you got and what anyone would expect from this feature (people usually sort tables, with arbitrarily long words or numbers in the fields, this is actually the first time I've seen sorting by single characters). If you want to sort by a character column you need to split it to characters, sort, and squeeze back. If these are spaces, we can put additional tabs in between with sed, sort, and remove the tabs:

cat "newfile" | sed 's/./&\t/g' | sort -k3 | tr -d '\t'

You could also provide the filename to sed directly, but I usually do it with a pipe because I may have to receive the input from another script anyway.

If you already have spaces AND tabs in your file, you will have to be more creative to avoid deleting original whitespace too.

Share:
18,519
Computernerd
Author by

Computernerd

I like to ask questions

Updated on September 18, 2022

Comments

  • Computernerd
    Computernerd over 1 year

    I have a file named : newfile which consist of the following data

    1 AC BB CC
    2 AB CC DD
    3 CA BB CC
    4 BE DD EE
    5 BD AA AA
    

    I type the following command in bash to sort the data according to the second character, second field

    sort -k3 newfile

    I expected the following results

    3 CA BB CC
    2 AB CC DD
    1 AC BB CC
    5 BD AA AA
    4 BE DD EE
    

    why am i getting the following results and how am i suppose to solve according to the third character (ignoring the blank)

    5 BD AA AA
    1 AC BB CC
    3 CA BB CC
    2 AB CC DD
    4 BE DD EE
    
    • devnull
      devnull about 10 years
      What is your definition of third field?
    • devnull
      devnull about 10 years
      Your expected result doesn't seem to be sorted by any field.
    • lgeorget
      lgeorget about 10 years
      A field is not a single letter! By default, it's a block of characters separated by blank characters.
    • cuonglm
      cuonglm about 10 years
      It seems he want to sort by the second char of second column.
    • Computernerd
      Computernerd about 10 years
      @Gnouc yea i want to sort the second char of second column
    • Computernerd
      Computernerd about 10 years
      why does it sort according to the second column , first field when i type -k2 instead of -k3
  • orion
    orion about 10 years
    Vote up! I didn't know about extended keydef in sort.
  • Computernerd
    Computernerd about 10 years
    why does it sort according to the second column , first field when i type -k2 instead of -k3
  • lgeorget
    lgeorget about 10 years
    @Computernerd The first character of the -k option is the field number. The second one is the position inside the field. By default, it's 1. Hence, -k2 is equivalent to -k2.1 : first character of second field.
  • user208145
    user208145 over 6 years
    Only 6 upvotes!? This answer is incredible!