Python re.findall prints output as list instead of string
Solution 1
Thank you for everyone's help!
Both of the below codes were successful in printing the output as a string.
> re.findall(r'gene=[^;\n]+', line)[0]
> re.search(r'gene=[^;\n]+', line).group
However, I was continuing to get "list index out of range" errors on one of my regex, even though results were printing when I just used re.findall().
> re.findall(r'transcript_id=[^\s]+',line)
I realized that this seemingly impossible result was because I was calling re.findall() within a for loop that was iterating over every line in a file. There were matches for some lines but not for others, so I was receiving the "list index out of range" error for those lines in which there was no match.
the code below resolved the issue:
> if re.findall(r'transcript_id=[^\s]+',line):
> transcript = re.findall(r'transcript_id=[^\s]+',line)[0]
> else:
> transcript = "NA"
Thank you!
Solution 2
It prints it as a list, because.. it is a list.
Return all non-overlapping matches of pattern in string, as a list of strings.
To print only the string use print(re.findall(r'Name=[^;]+', line)[0])
instead.
That code is assuming you do have one match. If you have 0 matches, you ll get an error. If you have more, you ll print only the first match.
To ensure you are not getting an error, check if a match was found before you use [0]
(or .group()
for re.search()
).
s = re.search(r'Name=[^;]+', my_str)
if s:
print(s.group())
or print(s[0])
Solution 3
The error that you are getting could be because your regex is not returning any match for the findall function.Please try to check what is the return type of the object returned by re.findall before trying to index it.Use this code before indexing so that if list is empty it will not raise indexerror.
x = re.findall(r'Name=[^;]+', line)
if not len(x):
#write your logic
Ilea
Updated on March 30, 2020Comments
-
Ilea about 4 years
My re.findall search is matching and returning the right string, but when I try to print the result, it prints it as a list instead of a string. Example below:
> line = ID=id5;Parent=rna1;Dbxref=GeneID:653635,Genbank:NR_024540.1,HGNC:38034;gbkey=misc_RNA;gene=WASH7P;product=WAS protein family homolog 7 pseudogene;transcript_id=NR_024540.1 > print re.findall(r'gene=[^;\n]+', line) > ['gene=WASH7P']
I would like the print function just to return
gene=WASH7P
without the brackets and parentheses around it.How can I adjust my code so that it prints just the match, without the brackets and parentheses around it?
Thank you!