Create a compress function in Python?

42,102

Solution 1

Here is a short python implementation of a compression function:

def compress(string):

    res = ""

    count = 1

    #Add in first character
    res += string[0]

    #Iterate through loop, skipping last one
    for i in range(len(string)-1):
        if(string[i] == string[i+1]):
            count+=1
        else:
            if(count > 1):
                #Ignore if no repeats
                res += str(count)
            res += string[i+1]
            count = 1
    #print last one
    if(count > 1):
        res += str(count)
    return res

Here are a few examples:

>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'

Solution 2

Short version with generators:

from itertools import groupby
import re
def compress(string):
    return re.sub(r'(?<![0-9])[1](?![0-9])', '', ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)))

(1) Grouping by chars with groupby(string)

(2) Counting length of group with sum(1 for _ in group) (because no len on group is possible)

(3) Joining into proper format

(4) Removing 1 chars for single items when there is a no digit before and after 1

Solution 3

There are several reasons why this doesn't work. You really need to try debugging this yourself first. Put in a few print statements to trace the execution. For instance:

def compress(s):
    count=0

    for i in range(0, len(s)):
        print "Checking character", i, s[i]
        if s[i] == s[i-1]:
            count += 1
        c = s.count(s[i])
        print "Found", s[i], c, "times"

    return str(s[i]) + str(c)

print compress("ddaaaff")

Here's the output:

Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2

Process finished with exit code 0

(1) You throw away the results of all but the last letter's search. (2) You count all occurrences, not merely the consecutive ones. (3) You cast a string to a string -- redundant.

Try working through this example with pencil and paper. Write down the steps you use, as a human being, to parse the string. Work on translating those to Python.

Solution 4

from collections import Counter
def string_compression(string):
    counter = Counter(string)
    result = ''
    for k, v in counter.items():
        result = result + k + str(v)
    print(result)

Solution 5

Just another simplest way to perform this:

def compress(str1):
    output = ''
    initial = str1[0]
    output = output + initial
    count = 1
    for item in str1[1:]:
        if item == initial:
            count = count + 1
        else:
            if count == 1:
                count = ''
            output = output + str(count)
            count = 1
            initial = item
            output = output + item
    print (output)

Which gives the output as required, examples:

>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o

>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt

>> compress("mississippi")
mis2is2ip2i
Share:
42,102
Cero
Author by

Cero

Updated on July 14, 2022

Comments

  • Cero
    Cero almost 2 years

    I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.

    Ex:

    >>> compress("ddaaaff")
    'd2a3f2'
    
    
     def compress(s):
         count=0
    
         for i in range(0,len(s)):
             if s[i] == s[i-1]:
                 count += 1
             c = s.count(s[i])
    
         return str(s[i]) + str(c)
    
  • Cero
    Cero over 8 years
    How would I count all occurrences?
  • Prune
    Prune over 8 years
    You did count all occurrences with the "count" function. Reworking the algorithm from here is your job; we'll help with code that's already written.
  • Cero
    Cero over 8 years
    What would I do once I got the result of the first character?
  • Prune
    Prune over 8 years
    Did you work through this with pencil & paper? What did you do there? [These are leading questions; give it a try.]
  • Prune
    Prune over 8 years
    More specifically, how did you go about turning "ddaaaff" into "d2a3f2" when you worked on paper? Note the areas of the paper where you kept partial results: those are your variables. Write down the steps you used, and where you repeated things (loops).
  • Cero
    Cero over 8 years
    I'm done, your no help. going through the process only made me less sure on what my translation to python should be. Frustration beyond belief, thanks though for your attempt.
  • Prune
    Prune over 8 years
    Ah! You have something to translate into Python? You have something that works in another language or application? That would help a lot -- can you post it?
  • Cero
    Cero over 8 years
    Thank you so much, the second step is where I was confused.
  • Patrick Yu
    Patrick Yu over 8 years
    Glad to hear that :-)
  • Cero
    Cero over 8 years
    Is there a way to do the reverse procedure?
  • Patrick Yu
    Patrick Yu over 8 years
    Do you mean by "reverse procedure" as to uncompress text? If that's what you want, you can just iterate over the compressed string, and everytime you hit an integer n, you can print the last character n times.
  • Cero
    Cero over 8 years
    Yes, what would the notation be for checking the integer in a string?
  • Patrick Yu
    Patrick Yu over 8 years
    You can use isinstance(n, int ), which returns True if n is an integer. Reference: stackoverflow.com/questions/3501382/…
  • Cero
    Cero over 8 years
    Sorry I keep bothering you but can you provide an example of use. I'm a newb :(, and lack knowledge of proper usage of methods.
  • Patrick Yu
    Patrick Yu over 8 years
    Here is my code for uncompression: ideone.com/x6BWX0 Note that I used a different integer checking method instead of isinstance, because I just found that you can't check for ints inside strings. So, I adapted my integer check code from stackoverflow.com/questions/1265665/…. Also, try printing different uncompresses, like uncompress("mis2is2ip2i").
  • Puneet Chaudhary
    Puneet Chaudhary almost 6 years
    Output: m 1 i 4 s 4 p 2
  • Baptiste Mille-Mathias
    Baptiste Mille-Mathias almost 6 years
    I think the author wanted number for repeated character only, so it should "mis2is2p2i"
  • Baptiste Mille-Mathias
    Baptiste Mille-Mathias over 5 years
    indentation of the last line is wrong, so explanation for the person asking the question would be great.
  • www.hybriscx.com
    www.hybriscx.com over 4 years
    Hi Vikas - Its recommended to add textual deails and code only answers are not highly appreciated on this forum.
  • Jagrut Trivedi
    Jagrut Trivedi about 4 years
    This will not work with 11m character. Expected output should be m11 but it will give only m.
  • finnmglas
    finnmglas about 4 years
    DO NOT ANSWER CODE ONLY. Describe what you changed. Describe how it effects the outcome and what it improves. That greatly improves you answers quality ^^