String count with overlapping occurrences

73,328

Solution 1

Well, this might be faster since it does the comparing in C:

def occurrences(string, sub):
    count = start = 0
    while True:
        start = string.find(sub, start) + 1
        if start > 0:
            count+=1
        else:
            return count

Solution 2

>>> import re
>>> text = '1011101111'
>>> len(re.findall('(?=11)', text))
5

If you didn't want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:

>>> sum(1 for _ in re.finditer('(?=11)', text))
5

As a function (re.escape makes sure the substring doesn't interfere with the regex):

>>> def occurrences(text, sub):
        return len(re.findall('(?={0})'.format(re.escape(sub)), text))

>>> occurrences(text, '11')
5

Solution 3

You can also try using the new Python regex module, which supports overlapping matches.

import regex as re

def count_overlapping(text, search_for):
    return len(re.findall(search_for, text, overlapped=True))

count_overlapping('1011101111','11')  # 5

Solution 4

Python's str.count counts non-overlapping substrings:

In [3]: "ababa".count("aba")
Out[3]: 1

Here are a few ways to count overlapping sequences, I'm sure there are many more :)

Look-ahead regular expressions

How to find overlapping matches with a regexp?

In [10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']

Generate all substrings

In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2

Solution 5

def count_substring(string, sub_string):
    count = 0
    for pos in range(len(string)):
        if string[pos:].startswith(sub_string):
            count += 1
    return count

This could be the easiest way.

Share:
73,328
calccrypto
Author by

calccrypto

Updated on July 08, 2022

Comments

  • calccrypto
    calccrypto almost 2 years

    What's the best way to count the number of occurrences of a given string, including overlap in Python? This is one way:

    def function(string, str_to_search_for):
          count = 0
          for x in xrange(len(string) - len(str_to_search_for) + 1):
               if string[x:x+len(str_to_search_for)] == str_to_search_for:
                    count += 1
          return count
    
    
    function('1011101111','11')
    

    This method returns 5.

    Is there a better way in Python?