Python - How can I open a file and specify the offset in bytes?

36,329

Solution 1

You can manage the position in the file thanks to the seek and tell methods of the file class see https://docs.python.org/2/tutorial/inputoutput.html

The tell method will tell you where to seek next time you open

Solution 2

log = open('myfile.log')
pos = open('pos.dat','w')
print log.readline()
pos.write(str(f.tell())
log.close()
pos.close()

log = open('myfile.log')
pos = open('pos.dat')
log.seek(int(pos.readline()))
print log.readline()

Of course you shouldn't use it like that - you should wrap the operations up in functions like save_position(myfile) and load_position(myfile), but the functionality is all there.

Solution 3

If your logfiles fit easily in memory (this is, you have a reasonable rotation policy) you can easily do something like:

log_lines = open('logfile','r').readlines()
last_line = get_last_lineprocessed() #From some persistent storage
last_line = parse_log(log_lines[last_line:])
store_last_lineprocessed(last_line)

If you cannot do this, you can use something like (see accepted answer's use of seek and tell, in case you need to do it with them) Get last n lines of a file with Python, similar to tail

Share:
36,329
dave
Author by

dave

Updated on March 01, 2020

Comments

  • dave
    dave over 4 years

    I'm writing a program that will parse an Apache log file periodically to log it's visitors, bandwidth usage, etc..

    The problem is, I don't want to open the log and parse data I've already parsed. For example:

    line1
    line2
    line3
    

    If I parse that file, I'll save all the lines then save that offset. That way, when I parse it again, I get:

    line1
    line2
    line3 - The log will open from this point
    line4
    line5
    

    Second time round, I'll get line4 and line5. Hopefully this makes sense...

    What I need to know is, how do I accomplish this? Python has the seek() function to specify the offset... So do I just get the filesize of the log (in bytes) after parsing it then use that as the offset (in seek()) the second time I log it?

    I can't seem to think of a way to code this >.<

  • Duncan
    Duncan almost 14 years
    That would actually put the read position 3 characters from the EOF, not 3 lines.
  • dave
    dave almost 14 years
    This seems like it'll do exactly what I want. Cheers.
  • dave
    dave almost 14 years
    The logs are for virtual hosts so, currently, no log rotation. I suppose I should looking into setting that up... Which would make your solution rather useful. Cheers.
  • cevaris
    cevaris about 8 years
    Hmm, seems that link needs to be update. Has no reference to file objects; Perhaps: docs.python.org/2/tutorial/inputoutput.html