Read file object as string in python

48,269

Solution 1

You can use Python in interactive mode to search for solutions.

if f is your object, you can enter dir(f) to see all methods and attributes. There's one called read. Enter help(f.read) and it tells you that f.read() is the way to retrieve a string from an file object.

Solution 2

From the doc file.read() (my emphasis):

file.read([size])

Read at most size bytes from the file (less if the read hits EOF before obtaining size bytes). If the size argument is negative or omitted, read all data until EOF is reached. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately. (For certain files, like ttys, it makes sense to continue reading after an EOF is hit.) Note that this method may call the underlying C function fread more than once in an effort to acquire as close to size bytes as possible. Also note that when in non-blocking mode, less data than was requested may be returned, even if no size parameter was given.

Be aware that a regexp search on a large string object may not be efficient, and consider doing the search line-by-line, using file.next() (a file object is its own iterator).

Solution 3

Michael Foord, aka Voidspace has an excellent tutorial on urllib2 which you can find here: urllib2 - The Missing Manual

What you are doing should be pretty straightforward, observe this sample code:

import urllib2
import re
response = urllib2.urlopen("http://www.voidspace.org.uk/python/articles/urllib2.shtml")
html = response.read()
pattern = '(V.+space)'
wordPattern = re.compile(pattern, re.IGNORECASE)
results = wordPattern.search(html)
print results.groups()
Share:
48,269
Oli
Author by

Oli

Hi, I'm Oli and I'm a "full-stack" web-dev-op. Eurgh. I'm also allergic to jargon BS. I spend most of my professional time writing Django websites and webapps for SMEs. I write a lot of Python outside of Django sites too. I administer various Linux servers for various tasks. I contribute to the open source projects that I use when I can. I'm a full-time Linux user and that has lead to helping other people live the dream. I am an official Ubuntu Member and I earnt my ♦ on SE's own Ask Ubuntu in 2011's moderator election. That's probably where I spend most of my unpaid time. I also run thepcspy.com which has been my place to write for the last decade or so. If you need to contact me for extended help, you can do so via my website, just remember that I have bills so if I feel your request is above and beyond normal duty, I might ask for remuneration for one-on-one support. For more social contact, you can usually find me (or just my computer) lurking in the Ask Ubuntu General Chat Room and on Freenode in #ubuntu and #ubuntu-uk under the handle Oli or Oli``.

Updated on December 15, 2020

Comments

  • Oli
    Oli over 3 years

    I'm using urllib2 to read in a page. I need to do a quick regex on the source and pull out a few variables but urllib2 presents as a file object rather than a string.

    I'm new to python so I'm struggling to see how I use a file object to do this. Is there a quick way to convert this into a string?