Remove duplicate $PATH entries with awk command
Solution 1
If you don't already have duplicates in the PATH
and you only want to add directories if they are not already there, you can do it easily with the shell alone.
for x in /path/to/add …; do
case ":$PATH:" in
*":$x:"*) :;; # already there
*) PATH="$x:$PATH";;
esac
done
And here's a shell snippet that removes duplicates from $PATH
. It goes through the entries one by one, and copies those that haven't been seen yet.
if [ -n "$PATH" ]; then
old_PATH=$PATH:; PATH=
while [ -n "$old_PATH" ]; do
x=${old_PATH%%:*} # the first remaining entry
case $PATH: in
*:"$x":*) ;; # already there
*) PATH=$PATH:$x;; # not there yet
esac
old_PATH=${old_PATH#*:}
done
PATH=${PATH#:}
unset old_PATH x
fi
Solution 2
Here's an intelligible one-liner solution that does all the right things: removes duplicates, preserves the ordering of paths, and doesn't add a colon at the end. So it should give you a deduplicated PATH that gives exactly the same behavior as the original:
PATH="$(perl -e 'print join(":", grep { not $seen{$_}++ } split(/:/, $ENV{PATH}))')"
It simply splits on colon (split(/:/, $ENV{PATH})
), uses uses grep { not $seen{$_}++ }
to filter out any repeated instances of paths except for the first occurrence, and then joins the remaining ones back together separated by colons and prints the result (print join(":", ...)
).
If you want some more structure around it, as well as the ability to deduplicate other variables as well, try this snippet, which I'm currently using in my own config:
# Deduplicate path variables
get_var () {
eval 'printf "%s\n" "${'"$1"'}"'
}
set_var () {
eval "$1=\"\$2\""
}
dedup_pathvar () {
pathvar_name="$1"
pathvar_value="$(get_var "$pathvar_name")"
deduped_path="$(perl -e 'print join(":",grep { not $seen{$_}++ } split(/:/, $ARGV[0]))' "$pathvar_value")"
set_var "$pathvar_name" "$deduped_path"
}
dedup_pathvar PATH
dedup_pathvar MANPATH
That code will deduplicate both PATH and MANPATH, and you can easily call dedup_pathvar
on other variables that hold colon-separated lists of paths (e.g. PYTHONPATH).
Solution 3
Here's a sleek one:
printf %s "$PATH" | awk -v RS=: -v ORS=: '!arr[$0]++'
Longer (to see how it works):
printf %s "$PATH" | awk -v RS=: -v ORS=: '{ if (!arr[$0]++) { print $0 } }'
Ok, since you're new to linux, here is how to actually set PATH without a trailing ":"
PATH=`printf %s "$PATH" | awk -v RS=: '{ if (!arr[$0]++) {printf("%s%s",!ln++?"":":",$0)}}'`
btw make sure to NOT have directories containing ":" in your PATH, otherwise it is gonna be messed up.
some credit to:
Solution 4
Here is an AWK one liner.
$ PATH=$(printf %s "$PATH" \
| awk -vRS=: -vORS= '!a[$0]++ {if (NR>1) printf(":"); printf("%s", $0) }' )
where:
-
printf %s "$PATH"
prints the content of$PATH
without a trailing newline -
RS=:
changes the input record delimiter character (default is newline) -
ORS=
changes the output record delimiter to the empty string -
a
the name of an implicitly created array -
$0
references the current record -
a[$0]
is a associative array dereference -
++
is the post-increment operator -
!a[$0]++
guards the right hand side, i.e. it makes sure that the current record is only printed, if it wasn't printed before -
NR
the current record number, starting with 1
That means that AWK is used to split the PATH
content along the :
delimiter characters and to filter out duplicate entries without modifying the order.
Since AWK associative arrays are implemented as hash tables the runtime is linear (i.e. in O(n)).
Note that we don't need look for quoted :
characters because shells don't provide quoting to support directories with :
in its name in the PATH
variable.
Awk + paste
The above can be simplified with paste:
$ PATH=$(printf %s "$PATH" | awk -vRS=: '!a[$0]++' | paste -s -d:)
The paste
command is used to intersperse the awk output with colons. This simplifies the awk action to printing (which is the default action).
Python
The same as Python two-liner:
$ PATH=$(python3 -c 'import os; from collections import OrderedDict; \
l=os.environ["PATH"].split(":"); print(":".join(OrderedDict.fromkeys(l)))' )
Solution 5
As long as we are adding non-awk oneliners:
PATH=$(zsh -fc "typeset -TU P=$PATH p; echo \$P")
(Could be as simple as PATH=$(zsh -fc 'typeset -U path; echo $PATH')
but zsh always reads at least one zshenv
configuration file, which can modify PATH
.)
It uses two nice zsh features:
- scalars tied to arrays (
typeset -T
) - and arrays that autoremove duplicate values (
typeset -U
).
Johnny Williem
Updated on September 18, 2022Comments
-
Johnny Williem 6 months
I am trying to write a bash shell function that will allow me to remove duplicate copies of directories from my PATH environment variable.
I was told that it is possible to achieve this with a one line command using the
awk
command, but I cannot figure out how to do it. Anybody know how?-
Ken Lassesen over 4 years
-
g2mk over 4 years
-
-
CurtainDog over 10 years-1 this doesn't work. I still see duplicates in my path.
-
camh over 10 years@dogbane: It removes duplicates for me. However it has a subtle problem. The output has a : on the end which if set as your $PATH, means the current directory is added the path. This has security implications on a multi-user machine.
-
Sahil Chaudhary over 10 years@dogbane, it works and I edited post to have a one line command without the trailing :
-
Sahil Chaudhary over 10 years@dogbane your solution has a trailing : in the output
-
CurtainDog over 10 yearshmm, your third command works, but the first two do not work unless I use
echo -n
. Your commands don't seem to work with "here strings" e.g. try:awk -v RS=: -v ORS=: '!arr[$0]++' <<< ".:/foo/bin:/bar/bin:/foo/bin"
-
Johnny Williem over 10 yearsso which one will actually give me the desired result?
-
Sahil Chaudhary over 10 years@dogbane, right, initially I didn't notice the extra line and when I wrote the third command I forgot to update the other two. wrt <<< it adds a new line at end like echo without -n. It is a bash extension though so not portable and does not provide any advantages over piping for this task. Johnny Williem, use the third command that starts with PATH=
-
rush almost 10 yearsbtw, you can use
sort -u
instead ofsort | uniq
. -
maxschlepzig almost 9 yearsSince the order of the PATH elements is significant, this is not very useful.
-
maxschlepzig almost 9 yearsThis is very dangerous because you add a trailing
:
to thePATH
(i.e. an empty string entry), because then the current working directory is part of yourPATH
. -
Håkon Hægland about 8 yearsFor some reason I had to add a
chomp
to remove a trailing newline. This worked for me:perl -ne 'chomp; print join(":", grep { !$seen{$_}++ } split(/:/))' <<<"$PATH"
-
Cbhihe over 6 years+1: nice script style, but can you explain the particular syntax:
${a[$i]+_}
by editing your answer and adding one bullet. The rest is perfectly understandable but you lost me there. Thank you. -
Cbhihe over 6 yearsOld post, but could you explain:
if ( !x[$i]++ )
. Thanks. -
maxschlepzig over 6 years@Cbhihe, I've added a bullet point that addresses this expansion.
-
Cbhihe over 6 yearsThank you very much. Very interesting. I did not think that was possible with arrays (non-strings)...
-
Eric over 6 yearsIt would be better, if iterate items in $PATH reversely, because the later ones are newly added usually, and they might have the value up to date.
-
Gilles 'SO- stop being evil' over 6 years@EricWang I don't understand your reasoning. PATH elements are traversed from front to back, so when there are duplicates, the second duplicate is effectively ignored. Iterating from back to front would change the order.
-
Eric over 6 years@Gilles When you have duplicated variable in PATH, probably it's added in this way:
PATH=$PATH:x=b
, the x in original PATH might has value a, thus when iterate in order, then the new value will be ignored, but when in reversed order, the new value will take effect. -
Gilles 'SO- stop being evil' over 6 years@EricWang In that case, the added value has no effect so should be ignored. By going backwards, you're making the added value come before. If the added value had been supposed to go before, it would have been added as
PATH=x:$PATH
. -
Eric over 6 years@Gilles When you append something, that means it's not there yet, or you want to override the old value, so you need to make the new added variable visible. And, by convention, usually it's append in this way:
PATH=$PATH:...
notPATH=...:$PATH
. Thus it's more proper to iterate reversed order. Even though you way would also work, then people append in the way reverse way. -
Stéphane Chazelas over 6 yearsNote that
echo -n
outputs-n
in Unix-compliantecho
implementations. The standard way to output a $string without the trailing newline character isprintf %s "$string"
, hence Gilles' edit. Generally you can't useecho
for arbitrary data -
Sahil Chaudhary over 6 years@StéphaneChazelas, ok, old UNIXes. Btw the new line was confusing
awk
so last entry was not deduplicated. Thanks to Gilles for catching that (and fixing portability). -
Stéphane Chazelas over 6 years@akostadinov, not only old. That's the Unix requirement as in the latest version of the Unix specification (from 2013, same goes for the 2016 specification which is going out shortly). For instance
/bin/sh
on OS/X is based onbash
andecho -n
outputs-n<newline>
like the Unix specification requires (POSIX leaves the behaviour unspecified forecho -n
) -
Alexander Mills about 6 yearslooks like the former
-
maxschlepzig about 6 years@AlexanderMills, well, the OP just asked about removing duplicates so this is what the awk call does.
-
maxschlepzig almost 6 years@wisbucky, hm, does your paste prints some error message? I tested it with 'paste (GNU coreutils) 8.25'.
-
wisbucky almost 6 yearsIt prints
usage: paste [-s] [-d delimiters] file ...
. This is on mac, which I think uses BSD not GNU versions. -
wisbucky almost 6 yearsAlso, I need to add spaces after the
-v
or else I get an error.-v RS=: -v ORS=
. Just different flavors ofawk
syntax. -
wisbucky almost 6 yearsFor those that don't understand the
!a[$0]++
part, what's going on is that 1)a[$0]++
is creating an associative array with thepath
as thekey
, and the incrementingcount
as thevalue
. The first time a unique path is seen, thevalue
will be initialized to0
and incremented to1
. The second time a path is seen, the value will be incremented to2
, etc. To see this clearly, run this command:printf %s "$PATH" | awk -v RS=: '{print a[$0]++, $0 }'
-
wisbucky almost 6 years2) In
awk
, the statement before the{action}
is apattern
. Ifpattern
isTRUE
, then execute the{action}
. Any nonzero number isTRUE
,0
isFALSE
. The first time a path is seen, thevalue
ofa[$0]
is0
(remember, we are post-incrementing), which evaluates toFALSE
. The negated value!
isTRUE
. Therefore, it executes the{action}
, which is to print the path. All subsequent occurrences of the samepath
will havevalue
> 0, so they evaluate toTRUE
, and the negated values areFALSE
. Therefore, the{action}
is not executed. -
Don Hatch over 5 yearsI almost passed over this answer because it starts with an "add only if not already there" method, which I wouldn't want to use since it loses the important property of where in PATH I'm inserting the new entry (at the beginning, if I want it to win over everything else, or at the end if I want it to lose over everything else). But then you show an excellent shell-only way to remove dups; that is the valuable part of this answer.
-
Gilles 'SO- stop being evil' over 5 years@DonHatch When you add-only-if-not-already-there, you can choose where to insert. Ok, I only show inserting at the beginning, but it's trivial to change the code to insert at the end.
-
Don Hatch over 5 years@Gilles The problem is if the entry is already in $PATH, then your first method wont change $PATH. I am suggesting that in that case it would be better to move the entry to the beginning (if overriding other entries is indeed what is desired). A nice way to accomplish that is to prepend the entry as usual, and then use your second function to remove dups.
-
Gilles 'SO- stop being evil' over 5 years@DonHatch My own
.profile
is even more complicated than that (it has complex stuff to sort both existing and added entries), but not everyone needs the complexity. I generally prefer to present possibilities in order of increasing complexity. -
Don Hatch over 5 years@Gilles Certainly, but how about refraining from presenting the first possibility at all? It's an accident waiting to happen. E.g. say my original .bashrc prepends ~/bin because I want my ~/bin/cat to win over /usr/bin/cat, then I notice my path is growing so I use your first version to prevent that, without thinking about it deeply enough. Now my setup is broken in a non-obvious way. I think your answer could be improved if you would refrain from presenting the error-prone first method at all-- or, if you are attached to keeping it for some reason, at least point out that it's error prone.
-
Gilles 'SO- stop being evil' over 5 years@DonHatch I want to keep it because it serves the needs of most people. I do point out that it assumes that there are no duplicates at the beginning, what more do you want? The order of addition is a different issue which is not mentioned in the question and not solved by the duplicate removal code.
-
MoVod over 3 yearsextremely sweet! I just love one-liners...
-
Christian Herenz about 3 yearsProblem I ran into, duplicates with and without trailing slashes "/foo/bar:/foo/bar/" will not removed - however, they are equivalent within the PATH variable.
-
Sahil Chaudhary about 3 years@ChristianHerenz, maybe
awk
can also split on/:
and:
at the same time, maybe with regular expression/pattern. Not sure ATM but might be a good thing to explore if you want to improve current solution. -
einpoklum about 3 yearsWhy do you use printf rather than echo?
-
Ryan C. Thompson about 2 years@KyleVassella Did you add this code to your shell startup file?