Linux command to remove the duplicate lines but keep the first occurrence

6,351

Solution 1

If you allow sorting anyway, this will work:

sort | uniq

-u was the source of your trouble, because (from man 1 uniq):

-u, --unique
only print unique lines

while by default:

With no options, matching lines are merged to the first occurrence.

Solution 2

If you want to dedup while keeping first occurrence, you can do

awk '!visited[$0]++' "$your_hist_file" > "$your_new_hist_file"

If you want to dedup while keeping last occurrence, you can do

tac "$your_hist_file" | awk '!visited[$0]++' | tac > "$your_new_hist_file"

You can use one awk command and no tac to achieve this too, but it's as straightforward as using two tacs.

Share:
6,351

Related videos on Youtube

user9371654
Author by

user9371654

Updated on September 18, 2022

Comments

  • user9371654
    user9371654 almost 2 years

    I have a text file. Each line contains a string. Some strings are repeated. I want to remove repetition but I want to keep the first occurrence. For example:

    line1
    line1
    line2
    line3
    line4
    line3
    line5
    

    Should be

    line1
    line2
    line3
    line4
    line5
    

    I tried: sort file1 | uniq -u > file2 but this did not help. It removed all repeated strings while I want the first occurrence to be present. I do not need to sort. Just remove the exact repetition of a string in a new line while keeping everything else as it is.

  • MMM
    MMM about 4 years
    Welcome to Super User! Generally, answers are much more helpful if they include an explanation of what the code is intended to do, and why that solves the problem without introducing others.
  • DavidPostill
    DavidPostill about 4 years
    Welcome to Super User! Could you please edit your answer to give an explanation of why this code answers the question? Code-only answers are discouraged, because they don't teach the solution.