Python re.findall prints output as list instead of string

41,340

Solution 1

Thank you for everyone's help!

Both of the below codes were successful in printing the output as a string.

> re.findall(r'gene=[^;\n]+', line)[0]  

> re.search(r'gene=[^;\n]+', line).group

However, I was continuing to get "list index out of range" errors on one of my regex, even though results were printing when I just used re.findall().

> re.findall(r'transcript_id=[^\s]+',line)

I realized that this seemingly impossible result was because I was calling re.findall() within a for loop that was iterating over every line in a file. There were matches for some lines but not for others, so I was receiving the "list index out of range" error for those lines in which there was no match.

the code below resolved the issue:

> if re.findall(r'transcript_id=[^\s]+',line):

>    transcript = re.findall(r'transcript_id=[^\s]+',line)[0]

> else:

>   transcript = "NA" 

Thank you!

Solution 2

It prints it as a list, because.. it is a list.

findall():

Return all non-overlapping matches of pattern in string, as a list of strings.

To print only the string use print(re.findall(r'Name=[^;]+', line)[0]) instead.

That code is assuming you do have one match. If you have 0 matches, you ll get an error. If you have more, you ll print only the first match.

To ensure you are not getting an error, check if a match was found before you use [0] (or .group() for re.search()).

s = re.search(r'Name=[^;]+', my_str)
if s:
    print(s.group())

or print(s[0])

Solution 3

The error that you are getting could be because your regex is not returning any match for the findall function.Please try to check what is the return type of the object returned by re.findall before trying to index it.Use this code before indexing so that if list is empty it will not raise indexerror.

x = re.findall(r'Name=[^;]+', line)
if not len(x):
    #write your logic
Share:
41,340
Ilea
Author by

Ilea

Updated on March 30, 2020

Comments

  • Ilea
    Ilea about 4 years

    My re.findall search is matching and returning the right string, but when I try to print the result, it prints it as a list instead of a string. Example below:

    > line =  ID=id5;Parent=rna1;Dbxref=GeneID:653635,Genbank:NR_024540.1,HGNC:38034;gbkey=misc_RNA;gene=WASH7P;product=WAS protein family homolog 7 pseudogene;transcript_id=NR_024540.1
    
    > print re.findall(r'gene=[^;\n]+', line)
    
    >     ['gene=WASH7P']
    

    I would like the print function just to return gene=WASH7P without the brackets and parentheses around it.

    How can I adjust my code so that it prints just the match, without the brackets and parentheses around it?

    Thank you!