Remove duplicate $PATH entries with awk command

bash shell awk path

37,992

Solution 1

If you don't already have duplicates in the PATH and you only want to add directories if they are not already there, you can do it easily with the shell alone.

for x in /path/to/add …; do
  case ":$PATH:" in
    *":$x:"*) :;; # already there
    *) PATH="$x:$PATH";;
  esac
done

And here's a shell snippet that removes duplicates from $PATH. It goes through the entries one by one, and copies those that haven't been seen yet.

if [ -n "$PATH" ]; then
  old_PATH=$PATH:; PATH=
  while [ -n "$old_PATH" ]; do
    x=${old_PATH%%:*}       # the first remaining entry
    case $PATH: in
      *:"$x":*) ;;          # already there
      *) PATH=$PATH:$x;;    # not there yet
    esac
    old_PATH=${old_PATH#*:}
  done
  PATH=${PATH#:}
  unset old_PATH x
fi

Solution 2

Here's an intelligible one-liner solution that does all the right things: removes duplicates, preserves the ordering of paths, and doesn't add a colon at the end. So it should give you a deduplicated PATH that gives exactly the same behavior as the original:

PATH="$(perl -e 'print join(":", grep { not $seen{$_}++ } split(/:/, $ENV{PATH}))')"

It simply splits on colon (split(/:/, $ENV{PATH})), uses uses grep { not $seen{$_}++ } to filter out any repeated instances of paths except for the first occurrence, and then joins the remaining ones back together separated by colons and prints the result (print join(":", ...)).

If you want some more structure around it, as well as the ability to deduplicate other variables as well, try this snippet, which I'm currently using in my own config:

# Deduplicate path variables
get_var () {
    eval 'printf "%s\n" "${'"$1"'}"'
}
set_var () {
    eval "$1=\"\$2\""
}
dedup_pathvar () {
    pathvar_name="$1"
    pathvar_value="$(get_var "$pathvar_name")"
    deduped_path="$(perl -e 'print join(":",grep { not $seen{$_}++ } split(/:/, $ARGV[0]))' "$pathvar_value")"
    set_var "$pathvar_name" "$deduped_path"
}
dedup_pathvar PATH
dedup_pathvar MANPATH

That code will deduplicate both PATH and MANPATH, and you can easily call dedup_pathvar on other variables that hold colon-separated lists of paths (e.g. PYTHONPATH).

Solution 3

Here's a sleek one:

printf %s "$PATH" | awk -v RS=: -v ORS=: '!arr[$0]++'

Longer (to see how it works):

printf %s "$PATH" | awk -v RS=: -v ORS=: '{ if (!arr[$0]++) { print $0 } }'

Ok, since you're new to linux, here is how to actually set PATH without a trailing ":"

PATH=`printf %s "$PATH" | awk -v RS=: '{ if (!arr[$0]++) {printf("%s%s",!ln++?"":":",$0)}}'`

btw make sure to NOT have directories containing ":" in your PATH, otherwise it is gonna be messed up.

some credit to:

Solution 4

Here is an AWK one liner.

$ PATH=$(printf %s "$PATH" \
     | awk -vRS=: -vORS= '!a[$0]++ {if (NR>1) printf(":"); printf("%s", $0) }' )

where:

printf %s "$PATH" prints the content of $PATH without a trailing newline
RS=: changes the input record delimiter character (default is newline)
ORS= changes the output record delimiter to the empty string
a the name of an implicitly created array
$0 references the current record
a[$0] is a associative array dereference
++ is the post-increment operator
!a[$0]++ guards the right hand side, i.e. it makes sure that the current record is only printed, if it wasn't printed before
NR the current record number, starting with 1

That means that AWK is used to split the PATH content along the : delimiter characters and to filter out duplicate entries without modifying the order.

Since AWK associative arrays are implemented as hash tables the runtime is linear (i.e. in O(n)).

Note that we don't need look for quoted : characters because shells don't provide quoting to support directories with : in its name in the PATH variable.

Awk + paste

The above can be simplified with paste:

$ PATH=$(printf %s "$PATH" | awk -vRS=: '!a[$0]++' | paste -s -d:)

The paste command is used to intersperse the awk output with colons. This simplifies the awk action to printing (which is the default action).

Python

The same as Python two-liner:

$ PATH=$(python3 -c 'import os; from collections import OrderedDict; \
    l=os.environ["PATH"].split(":"); print(":".join(OrderedDict.fromkeys(l)))' )

Solution 5

As long as we are adding non-awk oneliners:

PATH=$(zsh -fc "typeset -TU P=$PATH p; echo \$P")

(Could be as simple as PATH=$(zsh -fc 'typeset -U path; echo $PATH') but zsh always reads at least one zshenv configuration file, which can modify PATH.)

It uses two nice zsh features:

scalars tied to arrays (typeset -T)
and arrays that autoremove duplicate values (typeset -U).

View more solutions

37,992

Author by

Johnny Williem

Updated on September 18, 2022

Comments

Johnny Williem 6 months

I am trying to write a bash shell function that will allow me to remove duplicate copies of directories from my PATH environment variable.

I was told that it is possible to achieve this with a one line command using the awk command, but I cannot figure out how to do it. Anybody know how?
- Ken Lassesen over 4 years
  
  See Duplicate entries in $PATH a problem?
- g2mk over 4 years
  
  linuxjournal.com/content/removing-duplicate-path-entries
CurtainDog over 10 years

-1 this doesn't work. I still see duplicates in my path.
camh over 10 years

@dogbane: It removes duplicates for me. However it has a subtle problem. The output has a : on the end which if set as your $PATH, means the current directory is added the path. This has security implications on a multi-user machine.
Sahil Chaudhary over 10 years

@dogbane, it works and I edited post to have a one line command without the trailing :
Sahil Chaudhary over 10 years

@dogbane your solution has a trailing : in the output
CurtainDog over 10 years

hmm, your third command works, but the first two do not work unless I use echo -n. Your commands don't seem to work with "here strings" e.g. try: awk -v RS=: -v ORS=: '!arr[$0]++' <<< ".:/foo/bin:/bar/bin:/foo/bin"
Johnny Williem over 10 years

so which one will actually give me the desired result?
Sahil Chaudhary over 10 years

@dogbane, right, initially I didn't notice the extra line and when I wrote the third command I forgot to update the other two. wrt <<< it adds a new line at end like echo without -n. It is a bash extension though so not portable and does not provide any advantages over piping for this task. Johnny Williem, use the third command that starts with PATH=
rush almost 10 years

btw, you can use sort -u instead of sort | uniq.
maxschlepzig almost 9 years

Since the order of the PATH elements is significant, this is not very useful.
maxschlepzig almost 9 years

This is very dangerous because you add a trailing : to the PATH (i.e. an empty string entry), because then the current working directory is part of your PATH.
Håkon Hægland about 8 years

For some reason I had to add a chomp to remove a trailing newline. This worked for me: perl -ne 'chomp; print join(":", grep { !$seen{$_}++ } split(/:/))' <<<"$PATH"
Cbhihe over 6 years

+1: nice script style, but can you explain the particular syntax: ${a[$i]+_} by editing your answer and adding one bullet. The rest is perfectly understandable but you lost me there. Thank you.
Cbhihe over 6 years

Old post, but could you explain: if ( !x[$i]++ ) . Thanks.
maxschlepzig over 6 years

@Cbhihe, I've added a bullet point that addresses this expansion.
Cbhihe over 6 years

Thank you very much. Very interesting. I did not think that was possible with arrays (non-strings)...
Eric over 6 years

It would be better, if iterate items in $PATH reversely, because the later ones are newly added usually, and they might have the value up to date.
Gilles 'SO- stop being evil' over 6 years

@EricWang I don't understand your reasoning. PATH elements are traversed from front to back, so when there are duplicates, the second duplicate is effectively ignored. Iterating from back to front would change the order.
Eric over 6 years

@Gilles When you have duplicated variable in PATH, probably it's added in this way: PATH=$PATH:x=b, the x in original PATH might has value a, thus when iterate in order, then the new value will be ignored, but when in reversed order, the new value will take effect.
Gilles 'SO- stop being evil' over 6 years

@EricWang In that case, the added value has no effect so should be ignored. By going backwards, you're making the added value come before. If the added value had been supposed to go before, it would have been added as PATH=x:$PATH.
Eric over 6 years

@Gilles When you append something, that means it's not there yet, or you want to override the old value, so you need to make the new added variable visible. And, by convention, usually it's append in this way: PATH=$PATH:... not PATH=...:$PATH. Thus it's more proper to iterate reversed order. Even though you way would also work, then people append in the way reverse way.
Stéphane Chazelas over 6 years

Note that echo -n outputs -n in Unix-compliant echo implementations. The standard way to output a $string without the trailing newline character is printf %s "$string", hence Gilles' edit. Generally you can't use echo for arbitrary data
Sahil Chaudhary over 6 years

@StéphaneChazelas, ok, old UNIXes. Btw the new line was confusing awk so last entry was not deduplicated. Thanks to Gilles for catching that (and fixing portability).
Stéphane Chazelas over 6 years

@akostadinov, not only old. That's the Unix requirement as in the latest version of the Unix specification (from 2013, same goes for the 2016 specification which is going out shortly). For instance /bin/sh on OS/X is based on bash and echo -n outputs -n<newline> like the Unix specification requires (POSIX leaves the behaviour unspecified for echo -n)
Alexander Mills about 6 years

looks like the former
maxschlepzig about 6 years

@AlexanderMills, well, the OP just asked about removing duplicates so this is what the awk call does.
maxschlepzig almost 6 years

@wisbucky, hm, does your paste prints some error message? I tested it with 'paste (GNU coreutils) 8.25'.
wisbucky almost 6 years

It prints usage: paste [-s] [-d delimiters] file .... This is on mac, which I think uses BSD not GNU versions.
wisbucky almost 6 years

Also, I need to add spaces after the -v or else I get an error. -v RS=: -v ORS=. Just different flavors of awk syntax.
wisbucky almost 6 years

For those that don't understand the !a[$0]++ part, what's going on is that 1) a[$0]++ is creating an associative array with the path as the key, and the incrementing count as the value. The first time a unique path is seen, the value will be initialized to 0 and incremented to 1. The second time a path is seen, the value will be incremented to 2, etc. To see this clearly, run this command: printf %s "$PATH" | awk -v RS=: '{print a[$0]++, $0 }'
wisbucky almost 6 years

2) In awk, the statement before the {action} is a pattern. If pattern is TRUE, then execute the {action}. Any nonzero number is TRUE, 0 is FALSE. The first time a path is seen, the value of a[$0] is 0 (remember, we are post-incrementing), which evaluates to FALSE. The negated value ! is TRUE. Therefore, it executes the {action}, which is to print the path. All subsequent occurrences of the same path will have value > 0, so they evaluate to TRUE, and the negated values are FALSE. Therefore, the {action} is not executed.
Don Hatch over 5 years

I almost passed over this answer because it starts with an "add only if not already there" method, which I wouldn't want to use since it loses the important property of where in PATH I'm inserting the new entry (at the beginning, if I want it to win over everything else, or at the end if I want it to lose over everything else). But then you show an excellent shell-only way to remove dups; that is the valuable part of this answer.
Gilles 'SO- stop being evil' over 5 years

@DonHatch When you add-only-if-not-already-there, you can choose where to insert. Ok, I only show inserting at the beginning, but it's trivial to change the code to insert at the end.
Don Hatch over 5 years

@Gilles The problem is if the entry is already in $PATH, then your first method wont change $PATH. I am suggesting that in that case it would be better to move the entry to the beginning (if overriding other entries is indeed what is desired). A nice way to accomplish that is to prepend the entry as usual, and then use your second function to remove dups.
Gilles 'SO- stop being evil' over 5 years

@DonHatch My own .profile is even more complicated than that (it has complex stuff to sort both existing and added entries), but not everyone needs the complexity. I generally prefer to present possibilities in order of increasing complexity.
Don Hatch over 5 years

@Gilles Certainly, but how about refraining from presenting the first possibility at all? It's an accident waiting to happen. E.g. say my original .bashrc prepends ~/bin because I want my ~/bin/cat to win over /usr/bin/cat, then I notice my path is growing so I use your first version to prevent that, without thinking about it deeply enough. Now my setup is broken in a non-obvious way. I think your answer could be improved if you would refrain from presenting the error-prone first method at all-- or, if you are attached to keeping it for some reason, at least point out that it's error prone.
Gilles 'SO- stop being evil' over 5 years

@DonHatch I want to keep it because it serves the needs of most people. I do point out that it assumes that there are no duplicates at the beginning, what more do you want? The order of addition is a different issue which is not mentioned in the question and not solved by the duplicate removal code.
MoVod over 3 years

extremely sweet! I just love one-liners...
Christian Herenz about 3 years

Problem I ran into, duplicates with and without trailing slashes "/foo/bar:/foo/bar/" will not removed - however, they are equivalent within the PATH variable.
Sahil Chaudhary about 3 years

@ChristianHerenz, maybe awk can also split on /: and : at the same time, maybe with regular expression/pattern. Not sure ATM but might be a good thing to explore if you want to improve current solution.
einpoklum about 3 years

Why do you use printf rather than echo?
Ryan C. Thompson about 2 years

@KyleVassella Did you add this code to your shell startup file?