Linux command to remove the duplicate lines but keep the first occurrence

linux ubuntu command-line string-manipulation

6,351

Solution 1

If you allow sorting anyway, this will work:

sort | uniq

-u was the source of your trouble, because (from man 1 uniq):

-u, --unique
only print unique lines

while by default:

With no options, matching lines are merged to the first occurrence.

If you want to dedup while keeping first occurrence, you can do

awk '!visited[$0]++' "$your_hist_file" > "$your_new_hist_file"

If you want to dedup while keeping last occurrence, you can do

tac "$your_hist_file" | awk '!visited[$0]++' | tac > "$your_new_hist_file"

You can use one awk command and no tac to achieve this too, but it's as straightforward as using two tacs.

6,351

Updated on September 18, 2022

user9371654 almost 2 years
I have a text file. Each line contains a string. Some strings are repeated. I want to remove repetition but I want to keep the first occurrence. For example:
```
line1
line1
line2
line3
line4
line3
line5
```
Should be
```
line1
line2
line3
line4
line5
```
I tried: sort file1 | uniq -u > file2 but this did not help. It removed all repeated strings while I want the first occurrence to be present. I do not need to sort. Just remove the exact repetition of a string in a new line while keeping everything else as it is.
MMM about 4 years

Welcome to Super User! Generally, answers are much more helpful if they include an explanation of what the code is intended to do, and why that solves the problem without introducing others.
DavidPostill about 4 years

Welcome to Super User! Could you please edit your answer to give an explanation of why this code answers the question? Code-only answers are discouraged, because they don't teach the solution.