Python: Split a comma-delimited string directly into a set

10,119

Solution 1

You're ignoring the fact that in order to convert anything to a set, you need to iterate it. And that iteration is exactly the same as you are already doing in order to search the original list. So there cannot be any advantage in doing this, only overhead.

Searching against a set is more efficient if you're doing it multiple times, as that allows you to amortise the cost of the conversion. But the conversion itself is always going to be a linear scan; there's no way of avoiding that.

Solution 2

No, the str.split operation always returns a list and trying to convert that into a set will cost time. Also writing your own handmade split that produces directly a set will be slower, because str.split is implemente in C (the source code should be under Objects/stringlib/split.h)

Note however that if your string does not contain a comma and you expect string to not be a substring of the elements returned by split, then you can just do:

if string in comma_delimited_string:

If string contains a comma, then your test will always fail (because by definition the elements of text.split(',') will never contain one.

The case in which the above condition fail is when you have something like:

if "a" in "aaa,bb,c".split(',')

because in this case "a" in ["aaa", "bb", "c"] fails.

Alternatively you could use a regex:

import re
if re.search(r'(^{0},)|(,{0},)|(,{0}$)|(^{0}$)'.format(re.escape(string)), comma_delimited_string):

However I don't know whether this would be faster, it probably depends on your inputs.

Solution 3

While a membership test on an existing set may be faster (O(1)) than on a list (O(n)), you'd still need to create the set from the string, which will be O(n). So there's nothing you can do about the time complexity.

You can speed up the test by a constant factor though by just scanning the string instead of constructing intermediate data structures:

(',%s,' % string) in (',%s,' % comma_delimited_string)

Don't use this unless you have a really good reason to.

Share:
10,119
dieggsy
Author by

dieggsy

Updated on June 04, 2022

Comments

  • dieggsy
    dieggsy almost 2 years

    I have some code that does something like:

    if string in comma_delimited_string.split(','):
        return True
    

    This website says that membership testing with sets and dicts is much faster that with lists or tuples. I know doing set(comma_delimited_string.split(',')) wouldn't improve speed because a list is still being created before it's converted into a set (or at least, it appeared to slow it down when I timed it).

    I was wondering then, (mostly out of curiosity than real benefit to my code), is there a way to achieve the same effect of comma_delimited_string.split(',') but directly creating a set, instead of a list, with the intention of speeding up the above operation?