How To Get All The Contiguous Substrings Of A String In Python?
Solution 1
The only improvement I could think of is, to use list comprehension like this
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j+1] for i in xrange(length) for j in xrange(i,length)]
print get_all_substrings('abcde')
The timing comparison between, yours and mine
def get_all_substrings(string):
length = len(string)
alist = []
for i in xrange(length):
for j in xrange(i,length):
alist.append(string[i:j + 1])
return alist
def get_all_substrings_1(input_string):
length = len(input_string)
return [input_string[i:j + 1] for i in xrange(length) for j in xrange(i,length)]
from timeit import timeit
print timeit("get_all_substrings('abcde')", "from __main__ import get_all_substrings")
# 3.33308315277
print timeit("get_all_substrings_1('abcde')", "from __main__ import get_all_substrings_1")
# 2.67816185951
Solution 2
can be done concisely with itertools.combinations
from itertools import combinations
def get_all_substrings_2(string):
length = len(string) + 1
return [string[x:y] for x, y in combinations(range(length), r=2)]
Solution 3
You could write it as a generator to save storing all the strings in memory at once if you don't need to
def get_all_substrings(string):
length = len(string)
for i in xrange(length):
for j in xrange(i + 1, length + 1):
yield(string[i:j])
for i in get_all_substrings("abcde"):
print i
you can still make a list if you really need one
alist = list(get_all_substrings("abcde"))
The function can be reduced to return a generator expression
def get_all_substrings(s):
length = len(s)
return (s[i: j] for i in xrange(length) for j in xrange(i + 1, length + 1))
Or of course you can change two characters to return a list if you don't care about memory
def get_all_substrings(s):
length = len(s)
return [s[i: j] for i in xrange(length) for j in xrange(i + 1, length + 1)]
Solution 4
I've never been fond of range(len(seq))
, how about using enumerate and just using the index value:
def indexes(seq, start=0):
return (i for i,_ in enumerate(seq, start=start))
def gen_all_substrings(s):
return (s[i:j] for i in indexes(s) for j in indexes(s[i:], i+1))
def get_all_substrings(string):
return list(gen_all_substrings(string))
print(get_all_substrings('abcde'))
Solution 5
Use itertools.permutations
to generate all pairs of possible start and end indexes,
and filter out only those where the start index is less than then end index. Then
use these pairs to return slices of the original string.
from itertools import permutations
def gen_all_substrings(s):
lt = lambda pair: pair[0] < pair[1]
index_pairs = filter(lt, permutations(range(len(s)+1), 2))
return (s[i:j] for i,j in index_pairs)
def get_all_substrings(s):
return list(gen_all_substrings(s))
print(get_all_substrings('abcde'))
Comments
-
lqhcpsgbl about 3 years
Here is my code, but I want a better solution, how do you think about the problem?
def get_all_substrings(string): length = len(string) alist = [] for i in xrange(length): for j in xrange(i,length): alist.append(string[i:j + 1]) return alist print get_all_substring('abcde')
-
John La Rooy about 10 yearsIf you're doing timeit, you might find
range
is quicker thanxrange
for such a small range -
thefourtheye about 10 years@gnibbler I get slightly bigger number when I use
range
:( -
John La Rooy about 10 years@thefourtheye, don't be sad - it's a good thing if there isn't any longer a penalty to using xrange for small ranges (apart from the Python3 thing)
-
Sonny Parlin over 5 yearsThis saved me, I was getting a memory error, I did some searching, found your solution and it solved my issue as well. Thank you.
-
Tian over 3 yearsI'm guessing the time complexity of this is O(n log n)?
-
jidicula about 3 yearsThis is much more performant than the currently selected answer too.