Create a compress function in Python?
Solution 1
Here is a short python implementation of a compression function:
def compress(string):
res = ""
count = 1
#Add in first character
res += string[0]
#Iterate through loop, skipping last one
for i in range(len(string)-1):
if(string[i] == string[i+1]):
count+=1
else:
if(count > 1):
#Ignore if no repeats
res += str(count)
res += string[i+1]
count = 1
#print last one
if(count > 1):
res += str(count)
return res
Here are a few examples:
>>> compress("ddaaaff")
'd2a3f2'
>>> compress("daaaafffyy")
'da4f3y2'
>>> compress("mississippi")
'mis2is2ip2i'
Solution 2
Short version with generators:
from itertools import groupby
import re
def compress(string):
return re.sub(r'(?<![0-9])[1](?![0-9])', '', ''.join('%s%s' % (char, sum(1 for _ in group)) for char, group in groupby(string)))
(1) Grouping by chars with groupby(string)
(2) Counting length of group with sum(1 for _ in group)
(because no len
on group is possible)
(3) Joining into proper format
(4) Removing 1
chars for single items when there is a no digit before and after 1
Solution 3
There are several reasons why this doesn't work. You really need to try debugging this yourself first. Put in a few print statements to trace the execution. For instance:
def compress(s):
count=0
for i in range(0, len(s)):
print "Checking character", i, s[i]
if s[i] == s[i-1]:
count += 1
c = s.count(s[i])
print "Found", s[i], c, "times"
return str(s[i]) + str(c)
print compress("ddaaaff")
Here's the output:
Checking character 0 d
Found d 2 times
Checking character 1 d
Found d 2 times
Checking character 2 a
Found a 3 times
Checking character 3 a
Found a 3 times
Checking character 4 a
Found a 3 times
Checking character 5 f
Found f 2 times
Checking character 6 f
Found f 2 times
f2
Process finished with exit code 0
(1) You throw away the results of all but the last letter's search. (2) You count all occurrences, not merely the consecutive ones. (3) You cast a string to a string -- redundant.
Try working through this example with pencil and paper. Write down the steps you use, as a human being, to parse the string. Work on translating those to Python.
Solution 4
from collections import Counter
def string_compression(string):
counter = Counter(string)
result = ''
for k, v in counter.items():
result = result + k + str(v)
print(result)
Solution 5
Just another simplest way to perform this:
def compress(str1):
output = ''
initial = str1[0]
output = output + initial
count = 1
for item in str1[1:]:
if item == initial:
count = count + 1
else:
if count == 1:
count = ''
output = output + str(count)
count = 1
initial = item
output = output + item
print (output)
Which gives the output as required, examples:
>> compress("aaaaaaaccddddeehhyiiiuuo")
a7c2d4e2h2yi3u2o
>> compress("lllhhjuuuirrdtt")
l3h2ju3ir2dt
>> compress("mississippi")
mis2is2ip2i
Cero
Updated on July 14, 2022Comments
-
Cero almost 2 years
I need to create a function called compress that compresses a string by replacing any repeated letters with a letter and number. My function should return the shortened version of the string. I've been able to count the first character but not any others.
Ex:
>>> compress("ddaaaff") 'd2a3f2' def compress(s): count=0 for i in range(0,len(s)): if s[i] == s[i-1]: count += 1 c = s.count(s[i]) return str(s[i]) + str(c)
-
Cero over 8 yearsHow would I count all occurrences?
-
Prune over 8 yearsYou did count all occurrences with the "count" function. Reworking the algorithm from here is your job; we'll help with code that's already written.
-
Cero over 8 yearsWhat would I do once I got the result of the first character?
-
Prune over 8 yearsDid you work through this with pencil & paper? What did you do there? [These are leading questions; give it a try.]
-
Prune over 8 yearsMore specifically, how did you go about turning "ddaaaff" into "d2a3f2" when you worked on paper? Note the areas of the paper where you kept partial results: those are your variables. Write down the steps you used, and where you repeated things (loops).
-
Cero over 8 yearsI'm done, your no help. going through the process only made me less sure on what my translation to python should be. Frustration beyond belief, thanks though for your attempt.
-
Prune over 8 yearsAh! You have something to translate into Python? You have something that works in another language or application? That would help a lot -- can you post it?
-
Cero over 8 yearsThank you so much, the second step is where I was confused.
-
Patrick Yu over 8 yearsGlad to hear that :-)
-
Cero over 8 yearsIs there a way to do the reverse procedure?
-
Patrick Yu over 8 yearsDo you mean by "reverse procedure" as to uncompress text? If that's what you want, you can just iterate over the compressed string, and everytime you hit an integer n, you can print the last character n times.
-
Cero over 8 yearsYes, what would the notation be for checking the integer in a string?
-
Patrick Yu over 8 yearsYou can use
isinstance(n, int )
, which returns True if n is an integer. Reference: stackoverflow.com/questions/3501382/… -
Cero over 8 yearsSorry I keep bothering you but can you provide an example of use. I'm a newb :(, and lack knowledge of proper usage of methods.
-
Patrick Yu over 8 yearsHere is my code for uncompression: ideone.com/x6BWX0 Note that I used a different integer checking method instead of isinstance, because I just found that you can't check for ints inside strings. So, I adapted my integer check code from stackoverflow.com/questions/1265665/…. Also, try printing different uncompresses, like uncompress("mis2is2ip2i").
-
Puneet Chaudhary almost 6 yearsOutput: m 1 i 4 s 4 p 2
-
Baptiste Mille-Mathias almost 6 yearsI think the author wanted number for repeated character only, so it should "mis2is2p2i"
-
Baptiste Mille-Mathias over 5 yearsindentation of the last line is wrong, so explanation for the person asking the question would be great.
-
www.hybriscx.com over 4 yearsHi Vikas - Its recommended to add textual deails and code only answers are not highly appreciated on this forum.
-
Jagrut Trivedi about 4 yearsThis will not work with 11m character. Expected output should be m11 but it will give only m.
-
finnmglas about 4 yearsDO NOT ANSWER CODE ONLY. Describe what you changed. Describe how it effects the outcome and what it improves. That greatly improves you answers quality ^^