Remove all duplicate word from string using shell script

28,890

Solution 1

One more awk, just for fun:

$ a="aaa bbb aaa bbb ccc aaa ddd bbb ccc"
$ echo "$a" | awk '{for (i=1;i<=NF;i++) if (!a[$i]++) printf("%s%s",$i,FS)}{printf("\n")}'
aaa bbb ccc ddd 

By the way, even your solution works fine with variables:

$ b="zebra ant spider spider ant zebra ant" 
$ echo "$b" | xargs -n1 | sort -u | xargs
ant spider zebra

Solution 2

With tr, sort and uniq

echo "zebra ant spider spider ant zebra ant" | tr ' ' '\n' | sort -u

or

echo "zebra ant spider spider ant zebra ant" | tr ' ' '\n' | sort -u | xargs 

to get one line

Solution 3

$ echo "zebra ant spider spider ant zebra ant"  | awk -v RS="[ \n]+" '!n[$0]++' 
zebra
ant
spider

Solution 4

With gnu sed:

sed ':s;s/\(\<\S*\>\)\(.*\)\<\1\>/\1\2/g;ts'

You may add ;s/ */ /g to remove dublicate spaces.

Functions like this: If a word is a second time in this line, remove it and start over until no dublication is found anymore.

Solution 5

perl -lane '$,=$";print grep { ! $h{$_}++ } @F'
Share:
28,890

Related videos on Youtube

Urvashi
Author by

Urvashi

Updated on September 18, 2022

Comments

  • Urvashi
    Urvashi over 1 year

    I have a string like

    "aaa,aaa,aaa,bbb,bbb,ccc,bbb,ccc"
    

    I want to remove duplicate word from string then output will be like

    "aaa,bbb,ccc"
    

    I tried This code Source

    $ echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs
    

    It is working fine with same value,but when I give my variable value then it is showing all duplicate word also.

    How can I remove duplicate value.

    UPDATE

    My question is adding all corresponding value into a single string if user is same .I have data like this ->

       user name    | colour
        AAA         | red
        AAA         | black
        BBB         | red
        BBB         | blue
        AAA         | blue
        AAA         | red
        CCC         | red
        CCC         | red
        AAA         | green
        AAA         | red
        AAA         | black
        BBB         | red
        BBB         | blue
        AAA         | blue
        AAA         | red
        CCC         | red
        CCC         | red
        AAA         | green
    

    In coding I fetch all distinct user then I concatenate color string successfully .For that I am using code -

    while read the records 
    
        if [ "$c" == "" ]; then  #$c I defined global
            c="$colour1"
        else
            c="$c,$colour1" 
        fi
    

    When I print this $c variable i get the output (For User AAA)

    "red,black,blue,red,green,red,black,blue,red,green,"
    

    I want to remove duplicate color .Then desired output should be like

    "red,black,blue,green"
    

    For this desired output i used above code

     echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs
    

    but it is displaying the output with duplicate values .Like

    "red,black,blue,red,green,red,black,blue,red,green," Thanks

    • terdon
      terdon about 7 years
      Please clarify what is wrong with what you are using. I don't understand what you mean by "when I give my variable value". What value do you give? Where does it fail?
    • Sundeep
      Sundeep about 7 years
      echo 'aaa aaa aaa bbb bbb ccc bbb ccc' | xargs -n1 | sort -u | xargs gives aaa bbb ccc.. so you need to show exact code you tired and output you got.. with the string in variable: s='aaa aaa aaa bbb bbb ccc bbb ccc'; echo "$s" | xargs -n1 | sort -u | xargs
    • Urvashi
      Urvashi about 7 years
      string value comes dynamically. It is printing same value (contain duplicate value).
    • Sundeep
      Sundeep about 7 years
      yeah, show the code that failed, otherwise how would we know what could've gone wrong?
    • Jacob Vlijm
      Jacob Vlijm about 7 years
      Does the order matter?
    • Urvashi
      Urvashi about 7 years
      @JacobVlijm yes order matter.I updated my question so you can easily understand.
    • Urvashi
      Urvashi about 7 years
      @Sundeep i updated my answer please see.
    • Sundeep
      Sundeep about 7 years
      @Urvashi your string uses , as delimiter while the code you found worked on space as delimiter... why do you expect it to work on your string? all answers attempted will now be invalidated because of that
    • Sundeep
      Sundeep about 7 years
      again we cannot debug code which you don't show, also your expected output "red,black,blue,red,green," has red repeated... and , at end of string is required?
    • Urvashi
      Urvashi about 7 years
      @Sundeep red,black,blue,green" this is desired ,It was typing mistake.I corrected.
    • Sundeep
      Sundeep about 7 years
      try a simple awk + paste command instead of shell scripting, awk '$1=="AAA" {if(!seen[$3]++) print $3}' input.txt | paste -sd, where you need to replace input.txt with name of your file
    • Urvashi
      Urvashi about 7 years
  • Philippos
    Philippos about 7 years
    You need to add | xargs to join the output to one line again
  • George Vasiliou
    George Vasiliou about 7 years
    Plus one for the awk ! I was builting also an awk solution just for fun. There is a slight possibility words to be printed in random order at END section due to the random way that awk itterates in array keys.
  • ilkkachu
    ilkkachu about 7 years
    Yes, they will be printed in an essentially random order. The sort solution doesn't keep the original order either, though.
  • George Vasiliou
    George Vasiliou about 7 years
    Yes, good point! Even sort prints in different order than input.
  • Andrew Carsell
    Andrew Carsell about 7 years
    @ilkkachu Actually we don't need to wait for the input to end. We can make decision to print or not to print with a slight modification to your code: awk -vRS=" " -vORS=" " '!a[$1]++ {print $1}' ; echo This preserves the order.
  • Benoît
    Benoît about 7 years
    Or use sort -u. Or even a awk '!u[$0]++.
  • someonewithpc
    someonewithpc about 7 years
    What are \< and \>?
  • Philippos
    Philippos about 7 years
    @someonewithpc They match no character, but the beginning and end of a word to prevent substrings from being matched.
  • someonewithpc
    someonewithpc about 7 years
    Nice, but is that portable? Also, aren't words separated by whitespace? Seems redundant to match not whitespace followed by the end of a word.
  • Philippos
    Philippos about 7 years
    @someonewithpc No, it's not standard, that's why I wrote gnu sed. The nice part is that you don't have to handle first and last string separately
  • George Vasiliou
    George Vasiliou about 7 years
    Very clever!!!!
  • gardenhead
    gardenhead about 7 years
    @Benoît Wow, I did not know about sort -u. I've been using sort | uniq all this time. The wasted keystrokes...
  • xhienne
    xhienne about 7 years
    Please add an explanation on how your code works and why you did this and that.
  • JJoao
    JJoao about 7 years
    @GeorgeVasiliou, thank you [or to tell the truth, very lazy :-) ]
  • Pierre.Vriens
    Pierre.Vriens over 5 years
    I do not get it
  • JeremyCanfield
    JeremyCanfield about 5 years
    Neat approach. The only adjustment I had to make was to use %s instead of %s%s. The reason being is that I was doing a for loop through the results and two white spaces caused some challenges with regex matches.
  • Kusalananda
    Kusalananda about 5 years
    Your code lack explanation. With no explanation, it's difficult to follow what's happening. You also seem to make assumptions about the data that seems wrong (whitespace-delimited fields) and about the particular awk implementation being used (asorti() is not a standard awk function).
  • xeruf
    xeruf over 2 years
    what if there are other word separators, such as dots, involved?