List of all unique characters in a string?
Solution 1
The simplest solution is probably:
In [10]: ''.join(set('aaabcabccd'))
Out[10]: 'acbd'
Note that this doesn't guarantee the order in which the letters appear in the output, even though the example might suggest otherwise.
You refer to the output as a "list". If a list is what you really want, replace ''.join
with list
:
In [1]: list(set('aaabcabccd'))
Out[1]: ['a', 'c', 'b', 'd']
As far as performance goes, worrying about it at this stage sounds like premature optimization.
Solution 2
Use an OrderedDict. This will ensure that the order is preserved
>>> ''.join(OrderedDict.fromkeys( "aaabcabccd").keys())
'abcd'
PS: I just timed both the OrderedDict and Set solution, and the later is faster. If order does not matter, set should be the natural solution, if Order Matter;s this is how you should do.
>>> from timeit import Timer
>>> t1 = Timer(stmt=stmt1, setup="from __main__ import data, OrderedDict")
>>> t2 = Timer(stmt=stmt2, setup="from __main__ import data")
>>> t1.timeit(number=1000)
1.2893918431815337
>>> t2.timeit(number=1000)
0.0632140599081196
Solution 3
For completeness sake, here's another recipe that sorts the letters as a byproduct of the way it works:
>>> from itertools import groupby
>>> ''.join(k for k, g in groupby(sorted("aaabcabccd")))
'abcd'
Solution 4
char_seen = []
for char in string:
if char not in char_seen:
char_seen.append(char)
print(''.join(char_seen))
This will preserve the order in which alphabets are coming,
output will be
abcd
Solution 5
if the result does not need to be order-preserving, then you can simply use a set
>>> ''.join(set( "aaabcabccd"))
'acbd'
>>>
Comments
-
Ali almost 2 years
I want to append characters to a string, but want to make sure all the letters in the final list are unique.
Example:
"aaabcabccd"
→"abcd"
Now of course I have two solutions in my mind. One is using a
list
that will map the characters with their ASCII codes. So whenever I encounter a letter it will set the index toTrue
. Afterwards I will scan the list and append all the ones that were set. It will have a time complexity of O(n).Another solution would be using a
dict
and following the same procedure. After mapping every char, I will do the operation for each key in the dictionary. This will have a linear running time as well.Since I am a Python newbie, I was wondering which would be more space efficient. Which one could be implemented more efficiently?
PS: Order is not important while creating the list.
-
Ali over 11 yearsThanks a lot. I forget to mention that order does not matter.
-
Aviram Segal over 11 yearsI'm pretty sure that with preserving order the complexity will be o(nlogn) and not o(n) like the set solutions
-
Ali over 11 yearsThanks a lot for the answer. I was just wondering how come this method is more efficient than the dictionary or list?
-
Marcin over 11 years@Ali It's time complexity is the same as the
dict
method (it's the same implementation), but you save on creating key-value pairs. -
NPE over 11 years@Ali: I didn't say it was more efficient (although it almost certainly is). My point is that you should focus on clarity and correctness first, and only optimize when everything is working well and you have profiled your code and know what to optimize.
-
NPE over 11 years@AviramSegal: Why is that? Surely, preserving the order just requires a linked list running through the dict elements in insertion order?
-
Ali over 11 years@NPE I was also wondering. This solution contains calling a join over a string I guess (''.join). After this will I have to like iterate through the string and add the chars into my list, or I can apply this join to a list as well?
-
Admin over 11 years@AviramSegal
OrderedDict
usesdict
internally and has hence O(1) expected lookup/membership check. Maintaining the order requires additional space (a lot), but only amortized constant additional (appending to a list if the key wasn't in the dictionary before). -
Admin over 11 years@NPE Yeah, but note that Python doesn't use linked lists (
list
is a dynamic over-allocating array). -
NPE over 11 years@Ali: If you want a list, just use
list(set('aaabcabccd'))
. Depending on your requirements, it might even make sense to keep theset
and use it directly. -
NPE over 11 years@delnan: Erm, if you look at the source for
OrderedDict
, you'll see that a linked list is exactly what it uses. -
Admin over 11 years@NPE I stand corrected,
OrderedDict
rolls its own doubly-linked list. I assumed it was just alist
and you made the newbie mistake of takinglist
to mean "linked list". -
M. Volf over 6 years@martineau he posted at almost the same moment as the author of the accepted answer...
-
Santhosh Dhaipule Chandrakanth over 5 years@All to get order in which they appear
sorted(set('aaabcabccd'), key=('aaabcabccd').index)
-
Ashish Pratap about 4 yearsFor preserving the order this seems to be working, the other method which include typecasting to set, destroys the order.