How can I remove duplicates in my .bash_history, preserving order?

bash command-line command-history sort uniq

41,316

Solution 1

Sorting the history

This command works like sort|uniq, but keeps the lines in place

nl|sort -k 2|uniq -f 1|sort -n|cut -f 2

Basically, prepends to each line its number. After sort|uniq-ing, all lines are sorted back according to their original order (using the line number field) and the line number field is removed from the lines.

This solution has the flaw that it is undefined which representative of a class of equal lines will make it in the output and therefore its position in the final output is undefined. However, if the latest representative should be chosen you can sort the input by a second key:

nl|sort -k2 -k 1,1nr|uniq -f1|sort -n|cut -f2

Managing .bash_history

For re-reading and writing back the history, you can use history -a and history -w respectively.

Solution 2

So I was looking for the same exact thing after being annoyed by duplicates, and found that if I edit my ~/.bash_profile or my ~/.bashrc with:

export HISTCONTROL=ignoreboth:erasedups

It does exactly what you wanted, it only keeps the latest of any command. ignoreboth is actually just like doing ignorespace:ignoredups and that along with erasedups gets the job done.

At least on my Mac terminal with bash this work perfect. Found it here on askubuntu.com.

Solution 3

Found this solution in the wild and tested:

awk '!x[$0]++'

The first time a specific value of a line ($0) is seen, the value of x[$0] is zero.
The value of zero is inverted with ! and becomes one.
An statement that evaluates to one causes the default action, which is print.

Therefore, the first time an specific $0 is seen, it is printed.

Every next time (the repeats) the value of x[$0] has been incrented,
its negated value is zero, and a statement that evaluates to zero doesn't print.

To keep the last repeated value, reverse the history and use the same awk:

awk '!x[$0]++' ~/.bash_history                 # keep the first value repeated.

tac ~/.bash_history | awk '!x[$0]++' | tac     # keep the last.

Solution 4

Extending Clayton answer:

tac $HISTFILE | awk '!x[$0]++' | tac | sponge $HISTFILE

tac reverse the file, make sure you have installed moreutils so you have sponge available, otherwise use a temp file.

Solution 5

This is an old post, but a perpetual issue for users who want to have multiple terminals open, and have the history synched between windows, but not duplicated.

My solution in .bashrc:

shopt -s histappend
export HISTCONTROL=ignoreboth:erasedups
export PROMPT_COMMAND="history -n; history -w; history -c; history -r"
tac "$HISTFILE" | awk '!x[$0]++' > /tmp/tmpfile  &&
                tac /tmp/tmpfile > "$HISTFILE"
rm /tmp/tmpfile

histappend option adds the history of the buffer to the end of the history file ($HISTFILE)
ignoreboth and erasedups prevent duplicate entries from being saved in the $HISTFILE
The prompt command updates the history cache
- history -n reads all lines from $HISTFILE that may have occurred in a different terminal since the last carriage return
- history -w writes the updated buffer to $HISTFILE
- history -c wipes the buffer so no duplication occurs
- history -r re-reads the $HISTFILE, appending to the now blank buffer
the awk script stores the first occurrence of each line it encounters. tac reverses it, and then reverses it back so that it can be saved with the most recent commands still most recent in the history
rm the /tmp file

Every time you open a new shell, the history has all dupes wiped, and every time you hit the Enter key in a different shell/terminal window, it updates this history from the file.

View more solutions

41,316

cwd

Updated on September 18, 2022

Comments

cwd over 1 year
I really enjoying using control+r to recursively search my command history. I've found a few good options I like to use with it:
```
# ignore duplicate commands, ignore commands starting with a space
export HISTCONTROL=erasedups:ignorespace

# keep the last 5000 entries
export HISTSIZE=5000

# append to the history instead of overwriting (good for multiple connections)
shopt -s histappend
```
The only problem for me is that erasedups only erases sequential duplicates - so that with this string of commands:
```
ls
cd ~
ls
```
The ls command will actually be recorded twice. I've thought about periodically running w/ cron:
```
cat .bash_history | sort | uniq > temp.txt
mv temp.txt .bash_history
```
This would achieve removing the duplicates, but unfortunately the order would not be preserved. If I don't sort the file first I don't believe uniq can work properly.

How can I remove duplicates in my .bash_history, preserving order?

Extra Credit:

Are there any problems with overwriting the .bash_history file via a script? For example, if you remove an apache log file I think you need to send a nohup / reset signal with kill to have it flush it's connection to the file. If that is the case with the .bash_history file, perhaps I could somehow use ps to check and make sure there are no connected sessions before the filtering script is run?
- jw013 over 11 years
  
  Try ignoredups instead of erasedups for a while and see how that works for you.
- Jazz over 11 years
  
  I don't think bash holds an open file handle to the history file - it reads/writes it when it needs to, so it should (note - should - I haven't tested) be safe to overwrite it from elsewhere.
- Ricardo over 7 years
  
  I just learned something new on the 1st sentence of your question. Good trick!
- Jonathan Hartley over 4 years
  
  I'm failing to find the man page for all the options to the history command. Where should I be looking?
- Jonathan Hartley over 2 years
  
  This answer unix.stackexchange.com/a/18443/8650 claims to erase all duplicates, not just sequential ones, using HISTCONTROL in conjunction with a PROMPT_COMMAND which re-reads the whole HISTFILE after every prompt, which gives erasedups a chance to erase older commands.
wnrph over 11 years

With sort, the -r switch always reverses the sorting order. But this won't yield the result you have in mind. sort regards the two occurrences of ls as identical with the result that, even when reversed, the eventual order depends on the sorting algorithm. But see my update for another idea.
Nathan over 10 years

In case, you don't want to modify .bash_history, you could put the following in .bashrc: alias history='history | sort -k2 -k 1,1nr | uniq -f 1 | sort -n'
trss over 9 years

Wow! That just worked. But it removes all but the first occurrence I guess. I'd reversed the ordering of the lines using Sublime Text before running this. Now I'll reverse it again to get a clean history with only the last occurrence of all duplicates left behind. Thank you.
Mohd over 9 years

Check out my answer!
A.L over 9 years

What is nl at the beginning of each code line? Shouldn't it be history?
tralston almost 9 years

For those on Mac, use brew install coreutils, and notice that all the GNU utils have a g prepended to avoid confusion with the BSD built-in Mac commands (e.g. gsed is GNU whereas sed is BSD). So use gtac.
cbmanica over 8 years

I'm resisting the urge to downvote, but the fact, as you noted, that there is no way to choose which of equal lines makes it in the output means the awk answer below may be much more helpful for others (including for the case that brought me here).
wnrph over 8 years

@cbmanica That was true only for the first command and meant as a help to understand the second one. The only difference between the first and the second command is, that the second one does exercise control over output sorting.
vaichidrewar about 8 years

I had to cleanup my history file to remove invalid characters. I used "iconv -f utf-8 -t utf-8 -c file.txt"
Ricardo over 7 years

tested on Max OS X Yosemite and on Ubuntu 14_04
Georg Jung almost 7 years

agree with @MitchBroadhead. this solves the problem within bash itself, without external cron-job. tested it on ubuntu 17.04 and 16.04 LTS
WeakPointer over 6 years

works on OpenBSD too. It only removes dups of any command it is appending to the history file, which is fine for me. It has the interesting effect of shortening the history file as I enter commands that had existed as duplicates before. Now I can make my history file max shorter.
smilingfrog over 6 years

Here is an excellent explanation to this in the comments
JepZ over 5 years

Nice clean and general answer (not restricted to the history use-case) without launching a bazilion sub-processes ;-)
Dylanthepiguy over 5 years

This only ignores duplicate, consecutive commands. If you alternate repeatedly between two given commands, your bash history will fill up with duplicates
drescherjm over 4 years

I needed history -c and history -r to get it to use the history
Jonathan Hartley over 4 years

If "ignoreboth and erasedups prevent dupes from being saved", then why do you also need to do the "awk" command to remove dupes from the file? Is it because "ignoreboth and erasedups" only prevent consecutive dupes being saved? Sorry to be pedantic, I'm just trying to understand.
Jonathan Hartley over 4 years

Can you help me understand why, on logout, you need to append unwritten history to the history file before then rewriting the whole history file? Can't you just write the entire file without the 'append'?
smilingfrog over 4 years

erasedups only erases consecutive duplicates. And you are correct that the awk command duplicates the erasedupes command making it superfluous.
Jonathan Hartley over 4 years

To be explicit, am I understanding right that you've shown two (splendid) solutions here, and a user only needs to execute one of them? Either the ruby one, or the Bash one?
laur over 3 years

Wouldn't this sort of break if .bash_history entries are on two lines - timestamp followed by the command itself?
anthony over 3 years

fails with bash timestamps. Most things do!
anthony over 3 years

Fails with bash timestamps. Most things don't take timestamps into account. See my solution.
anthony over 3 years

Fails with bash timestamps. Most things don't take timestamps into account. See my solution.
VinayChoudhary99 about 3 years

This one works perfectly.
anthony about 3 years

The only reason I do something fancy with history during logout, is because I merge (with locks) the history, sorting by timestamps, and removing some 'sensitive' commands. I don't just simply append, which does not work will when you have multiple shell windows on the same machine.
Jonathan Hartley over 2 years

This answer contains useful information, but misleadingly claims to "do exactly what you wanted". The question states the "problem for me is that erasedups only erases sequential duplicates". This answer only explains how to use erasedups to erase sequential duplicates. It is not an answer to the actual question of how to erase all duplicates, not just sequential ones.
Jonathan Hartley over 2 years

This answer is bash-fu black belt, of which I am in awe. But it cannot handle history files with multi-line commands in it, or with timestamps in it. (Enabling timestamps in the history file is required for readline to correctly retrieve multi-line commands from the history.)
Jonathan Hartley over 2 years

This answer is sublime in the appropriate wielding of awk, at which I'm awestruck. However, as @laur notes, it doesn't work for history files with timestamps in. Enabling timestamps is important because these form the delimiters in the history file that enables readline to retrieve multi-line commands.
Jonathan Hartley over 2 years

Can you explain what sponge is, and why you appended it to Clayton's answer?
Jonathan Hartley over 2 years

This is brilliant, but like many other answers here, doesn't handle history files with timestamps enabled, which is required if you want readline to be able to retrieve multi-line commands saved to your history file.
Jonathan Hartley over 2 years

$ sponge -h: soak up all input from stdin and write it to <file>. I don't yet understand why it has been appended to Clayton's answer. (although I suspect it is incidental, and the main value of this answer was using 'tac', which Clayton later incorporated in his answer too.)
Jonathan Hartley over 2 years

Aha, from man sponge: Unlike a shell redirect, sponge soaks up all its input before writing the output file. This allows constructing pipelines that read from and write to the same file.
alchemy over 2 years

this works, or appears to (I didnt check what it deleted, but the multiple exits are gone, leaving the last one entered, and removed ~200 of 500 entries). I just had to exit shell and reenter (reloading history file.. there is a command for that somewhere). Thanks!
BlueC about 2 years

This is really nice, thank you!
Admin about 2 years

Found another issue with the awk is, that it works line by line. Hence it doesn't understand where the history command starts and ends. Where a single line the multi-line command matches with another command, it fails.