Using regex to findall lowercase letters in string append to list. Python
Sounds like (I may be misunderstanding your question) you just need to capture runs of lowercase letters, rather than each individual lowercase letter. This is easy: just add the +
quantifier to your regular expression.
for seq in sequences:
lower_output.append(re.findall("[a-z]+", seq)) # add substrings
The +
quantifier specifies that you want "at least one, and as many as you can find in a row" of the preceding expression (in this case '[a-z]'
). So this will capture your full runs of lowercase letters all in one group, which should cause them to appear as you want them to in your output lists.
It gets a little big uglier if you want to preserve your list-of-list structure and get the indices as well, but it's still very simple:
for seq in sequences:
matches = re.finditer("[a-z]+", seq) # List of Match objects.
lower_output.append([match.group(0) for match in matches]) # add substrings
lower_indx.append([match.start(0) for match in matches]) # add indices
print lower_output
>>> [['defgdefgdefg'], ['wowhello', 'onemore'], []]
print lower_indx
>>> [[9], [9, 23], []]
O.rka
I am an academic researcher studying machine-learning and microorganisms
Updated on June 05, 2022Comments
-
O.rka almost 2 years
I'm looking for a way to get the lowercase values out of a string that has both uppercase and potentially lowercase letters
here's an example
sequences = ['CABCABCABdefgdefgdefgCABCAB','FEGFEGFEGwowhelloFEGFEGonemoreFEG','NONEARELOWERCASE'] #sequences with uppercase and potentially lowercase letters
this is what i want to output
upper_output = ['CABCABCABCABCAB','FEGFEGFEGFEGFEGFEG','NONEARELOWERCASE'] #the upper case letters joined together lower_output = [['defgdefgdefg'],['wowhello','onemore'],[]] #the lower case letters in lists within lists lower_indx = [[9],[9,23],[]] #where the lower case values occur in the original sequence
so i want the lower_output list be a LIST of SUBLISTS. the SUBLISTS would have all the strings of lowercase letters .
i was thinking of using regex . . .
import re lower_indx = [] for seq in sequences: lower_indx.append(re.findall("[a-z]", seq).start()) print lower_indx
for the lowercase lists i was trying:
lower_output = [] for seq in sequences: temp = '' temp = re.findall("[a-z]", seq) lower_output.append(temp) print lower_output
but the values are not in separate lists (i still need to join them)
[['d', 'e', 'f', 'g', 'd', 'e', 'f', 'g', 'd', 'e', 'f', 'g'], ['w', 'o', 'w', 'h', 'e', 'l', 'l', 'o', 'o', 'n', 'e', 'm', 'o', 'r', 'e'], []]
-
O.rka about 11 yearsi didn't even know that existed. any idea on how to get the index value ?
-
Henry Keiter about 11 years@draconisthe0ry Certainly; I've updated my answer to include those.