How can I remove duplicates in my .bash_history, preserving order?

41,316

Solution 1

Sorting the history

This command works like sort|uniq, but keeps the lines in place

nl|sort -k 2|uniq -f 1|sort -n|cut -f 2

Basically, prepends to each line its number. After sort|uniq-ing, all lines are sorted back according to their original order (using the line number field) and the line number field is removed from the lines.

This solution has the flaw that it is undefined which representative of a class of equal lines will make it in the output and therefore its position in the final output is undefined. However, if the latest representative should be chosen you can sort the input by a second key:

nl|sort -k2 -k 1,1nr|uniq -f1|sort -n|cut -f2

Managing .bash_history

For re-reading and writing back the history, you can use history -a and history -w respectively.

Solution 2

So I was looking for the same exact thing after being annoyed by duplicates, and found that if I edit my ~/.bash_profile or my ~/.bashrc with:

export HISTCONTROL=ignoreboth:erasedups

It does exactly what you wanted, it only keeps the latest of any command. ignoreboth is actually just like doing ignorespace:ignoredups and that along with erasedups gets the job done.

At least on my Mac terminal with bash this work perfect. Found it here on askubuntu.com.

Solution 3

Found this solution in the wild and tested:

awk '!x[$0]++'

The first time a specific value of a line ($0) is seen, the value of x[$0] is zero.
The value of zero is inverted with ! and becomes one.
An statement that evaluates to one causes the default action, which is print.

Therefore, the first time an specific $0 is seen, it is printed.

Every next time (the repeats) the value of x[$0] has been incrented,
its negated value is zero, and a statement that evaluates to zero doesn't print.

To keep the last repeated value, reverse the history and use the same awk:

awk '!x[$0]++' ~/.bash_history                 # keep the first value repeated.

tac ~/.bash_history | awk '!x[$0]++' | tac     # keep the last.

Solution 4

Extending Clayton answer:

tac $HISTFILE | awk '!x[$0]++' | tac | sponge $HISTFILE

tac reverse the file, make sure you have installed moreutils so you have sponge available, otherwise use a temp file.

Solution 5

This is an old post, but a perpetual issue for users who want to have multiple terminals open, and have the history synched between windows, but not duplicated.

My solution in .bashrc:

shopt -s histappend
export HISTCONTROL=ignoreboth:erasedups
export PROMPT_COMMAND="history -n; history -w; history -c; history -r"
tac "$HISTFILE" | awk '!x[$0]++' > /tmp/tmpfile  &&
                tac /tmp/tmpfile > "$HISTFILE"
rm /tmp/tmpfile
  • histappend option adds the history of the buffer to the end of the history file ($HISTFILE)
  • ignoreboth and erasedups prevent duplicate entries from being saved in the $HISTFILE
  • The prompt command updates the history cache
    • history -n reads all lines from $HISTFILE that may have occurred in a different terminal since the last carriage return
    • history -w writes the updated buffer to $HISTFILE
    • history -c wipes the buffer so no duplication occurs
    • history -r re-reads the $HISTFILE, appending to the now blank buffer
  • the awk script stores the first occurrence of each line it encounters. tac reverses it, and then reverses it back so that it can be saved with the most recent commands still most recent in the history
  • rm the /tmp file

Every time you open a new shell, the history has all dupes wiped, and every time you hit the Enter key in a different shell/terminal window, it updates this history from the file.

Share:
41,316

Related videos on Youtube

cwd
Author by

cwd

Updated on September 18, 2022

Comments

  • cwd
    cwd over 1 year

    I really enjoying using control+r to recursively search my command history. I've found a few good options I like to use with it:

    # ignore duplicate commands, ignore commands starting with a space
    export HISTCONTROL=erasedups:ignorespace
    
    # keep the last 5000 entries
    export HISTSIZE=5000
    
    # append to the history instead of overwriting (good for multiple connections)
    shopt -s histappend
    

    The only problem for me is that erasedups only erases sequential duplicates - so that with this string of commands:

    ls
    cd ~
    ls
    

    The ls command will actually be recorded twice. I've thought about periodically running w/ cron:

    cat .bash_history | sort | uniq > temp.txt
    mv temp.txt .bash_history
    

    This would achieve removing the duplicates, but unfortunately the order would not be preserved. If I don't sort the file first I don't believe uniq can work properly.

    How can I remove duplicates in my .bash_history, preserving order?

    Extra Credit:

    Are there any problems with overwriting the .bash_history file via a script? For example, if you remove an apache log file I think you need to send a nohup / reset signal with kill to have it flush it's connection to the file. If that is the case with the .bash_history file, perhaps I could somehow use ps to check and make sure there are no connected sessions before the filtering script is run?

    • jw013
      jw013 over 11 years
      Try ignoredups instead of erasedups for a while and see how that works for you.
    • Jazz
      Jazz over 11 years
      I don't think bash holds an open file handle to the history file - it reads/writes it when it needs to, so it should (note - should - I haven't tested) be safe to overwrite it from elsewhere.
    • Ricardo
      Ricardo over 7 years
      I just learned something new on the 1st sentence of your question. Good trick!
    • Jonathan Hartley
      Jonathan Hartley over 4 years
      I'm failing to find the man page for all the options to the history command. Where should I be looking?
    • Jonathan Hartley
      Jonathan Hartley over 2 years
      This answer unix.stackexchange.com/a/18443/8650 claims to erase all duplicates, not just sequential ones, using HISTCONTROL in conjunction with a PROMPT_COMMAND which re-reads the whole HISTFILE after every prompt, which gives erasedups a chance to erase older commands.
  • wnrph
    wnrph over 11 years
    With sort, the -r switch always reverses the sorting order. But this won't yield the result you have in mind. sort regards the two occurrences of ls as identical with the result that, even when reversed, the eventual order depends on the sorting algorithm. But see my update for another idea.
  • Nathan
    Nathan over 10 years
    In case, you don't want to modify .bash_history, you could put the following in .bashrc: alias history='history | sort -k2 -k 1,1nr | uniq -f 1 | sort -n'
  • trss
    trss over 9 years
    Wow! That just worked. But it removes all but the first occurrence I guess. I'd reversed the ordering of the lines using Sublime Text before running this. Now I'll reverse it again to get a clean history with only the last occurrence of all duplicates left behind. Thank you.
  • Mohd
    Mohd over 9 years
    Check out my answer!
  • A.L
    A.L over 9 years
    What is nl at the beginning of each code line? Shouldn't it be history?
  • tralston
    tralston almost 9 years
    For those on Mac, use brew install coreutils, and notice that all the GNU utils have a g prepended to avoid confusion with the BSD built-in Mac commands (e.g. gsed is GNU whereas sed is BSD). So use gtac.
  • cbmanica
    cbmanica over 8 years
    I'm resisting the urge to downvote, but the fact, as you noted, that there is no way to choose which of equal lines makes it in the output means the awk answer below may be much more helpful for others (including for the case that brought me here).
  • wnrph
    wnrph over 8 years
    @cbmanica That was true only for the first command and meant as a help to understand the second one. The only difference between the first and the second command is, that the second one does exercise control over output sorting.
  • vaichidrewar
    vaichidrewar about 8 years
    I had to cleanup my history file to remove invalid characters. I used "iconv -f utf-8 -t utf-8 -c file.txt"
  • Ricardo
    Ricardo over 7 years
    tested on Max OS X Yosemite and on Ubuntu 14_04
  • Georg Jung
    Georg Jung almost 7 years
    agree with @MitchBroadhead. this solves the problem within bash itself, without external cron-job. tested it on ubuntu 17.04 and 16.04 LTS
  • WeakPointer
    WeakPointer over 6 years
    works on OpenBSD too. It only removes dups of any command it is appending to the history file, which is fine for me. It has the interesting effect of shortening the history file as I enter commands that had existed as duplicates before. Now I can make my history file max shorter.
  • smilingfrog
    smilingfrog over 6 years
  • JepZ
    JepZ over 5 years
    Nice clean and general answer (not restricted to the history use-case) without launching a bazilion sub-processes ;-)
  • Dylanthepiguy
    Dylanthepiguy over 5 years
    This only ignores duplicate, consecutive commands. If you alternate repeatedly between two given commands, your bash history will fill up with duplicates
  • drescherjm
    drescherjm over 4 years
    I needed history -c and history -r to get it to use the history
  • Jonathan Hartley
    Jonathan Hartley over 4 years
    If "ignoreboth and erasedups prevent dupes from being saved", then why do you also need to do the "awk" command to remove dupes from the file? Is it because "ignoreboth and erasedups" only prevent consecutive dupes being saved? Sorry to be pedantic, I'm just trying to understand.
  • Jonathan Hartley
    Jonathan Hartley over 4 years
    Can you help me understand why, on logout, you need to append unwritten history to the history file before then rewriting the whole history file? Can't you just write the entire file without the 'append'?
  • smilingfrog
    smilingfrog over 4 years
    erasedups only erases consecutive duplicates. And you are correct that the awk command duplicates the erasedupes command making it superfluous.
  • Jonathan Hartley
    Jonathan Hartley over 4 years
    To be explicit, am I understanding right that you've shown two (splendid) solutions here, and a user only needs to execute one of them? Either the ruby one, or the Bash one?
  • laur
    laur over 3 years
    Wouldn't this sort of break if .bash_history entries are on two lines - timestamp followed by the command itself?
  • anthony
    anthony over 3 years
    fails with bash timestamps. Most things do!
  • anthony
    anthony over 3 years
    Fails with bash timestamps. Most things don't take timestamps into account. See my solution.
  • anthony
    anthony over 3 years
    Fails with bash timestamps. Most things don't take timestamps into account. See my solution.
  • VinayChoudhary99
    VinayChoudhary99 about 3 years
    This one works perfectly.
  • anthony
    anthony about 3 years
    The only reason I do something fancy with history during logout, is because I merge (with locks) the history, sorting by timestamps, and removing some 'sensitive' commands. I don't just simply append, which does not work will when you have multiple shell windows on the same machine.
  • Jonathan Hartley
    Jonathan Hartley over 2 years
    This answer contains useful information, but misleadingly claims to "do exactly what you wanted". The question states the "problem for me is that erasedups only erases sequential duplicates". This answer only explains how to use erasedups to erase sequential duplicates. It is not an answer to the actual question of how to erase all duplicates, not just sequential ones.
  • Jonathan Hartley
    Jonathan Hartley over 2 years
    This answer is bash-fu black belt, of which I am in awe. But it cannot handle history files with multi-line commands in it, or with timestamps in it. (Enabling timestamps in the history file is required for readline to correctly retrieve multi-line commands from the history.)
  • Jonathan Hartley
    Jonathan Hartley over 2 years
    This answer is sublime in the appropriate wielding of awk, at which I'm awestruck. However, as @laur notes, it doesn't work for history files with timestamps in. Enabling timestamps is important because these form the delimiters in the history file that enables readline to retrieve multi-line commands.
  • Jonathan Hartley
    Jonathan Hartley over 2 years
    Can you explain what sponge is, and why you appended it to Clayton's answer?
  • Jonathan Hartley
    Jonathan Hartley over 2 years
    This is brilliant, but like many other answers here, doesn't handle history files with timestamps enabled, which is required if you want readline to be able to retrieve multi-line commands saved to your history file.
  • Jonathan Hartley
    Jonathan Hartley over 2 years
    $ sponge -h: soak up all input from stdin and write it to <file>. I don't yet understand why it has been appended to Clayton's answer. (although I suspect it is incidental, and the main value of this answer was using 'tac', which Clayton later incorporated in his answer too.)
  • Jonathan Hartley
    Jonathan Hartley over 2 years
    Aha, from man sponge: Unlike a shell redirect, sponge soaks up all its input before writing the output file. This allows constructing pipelines that read from and write to the same file.
  • alchemy
    alchemy over 2 years
    this works, or appears to (I didnt check what it deleted, but the multiple exits are gone, leaving the last one entered, and removed ~200 of 500 entries). I just had to exit shell and reenter (reloading history file.. there is a command for that somewhere). Thanks!
  • BlueC
    BlueC about 2 years
    This is really nice, thank you!
  • Admin
    Admin about 2 years
    Found another issue with the awk is, that it works line by line. Hence it doesn't understand where the history command starts and ends. Where a single line the multi-line command matches with another command, it fails.