Many threads to write log file at same time in Python

21,877

Solution 1

You can simply create your own locking mechanism to ensure that only one thread is ever writing to a file.

import threading
lock = threading.Lock()

def write_to_file(f, text, file_size):
    lock.acquire() # thread blocks at this line until it can obtain lock

    # in this section, only one thread can be present at a time.
    print >> f, text, file_size

    lock.release()

def filesize(asset):  
    f = open("results.txt", 'a+')  
    c = wmi.WMI(asset)  
    wql = 'SELECT FileSize,Name FROM CIM_DataFile where (Drive="D:" OR Drive="E:") and Caption like "%file%"'  
    for item in c.query(wql):  
        write_to_file(f, item.Name.split("\\")[2].strip().upper(), str(item.FileSize))

You may want to consider placing the lock around the entire for loop for item in c.query(wql): to allow each thread to do a larger chunk of work before releasing the lock.

Solution 2

print is not thread safe. Use the logging module instead (which is):

import logging
import threading
import time


FORMAT = '[%(levelname)s] (%(threadName)-10s) %(message)s'

logging.basicConfig(level=logging.DEBUG,
                    format=FORMAT)

file_handler = logging.FileHandler('results.log')
file_handler.setFormatter(logging.Formatter(FORMAT))
logging.getLogger().addHandler(file_handler)


def worker():
    logging.info('Starting')
    time.sleep(2)
    logging.info('Exiting')


t1 = threading.Thread(target=worker)
t2 = threading.Thread(target=worker)

t1.start()
t2.start()

Output (and contents of results.log):

[INFO] (Thread-1  ) Starting
[INFO] (Thread-2  ) Starting
[INFO] (Thread-1  ) Exiting
[INFO] (Thread-2  ) Exiting

Instead of using the default name (Thread-n), you can set your own name using the name keyword argument, which the %(threadName) formatting directive then will then use:

t = threading.Thread(name="My worker thread", target=worker)

(This example was adapted from an example from Doug Hellmann's excellent article about the threading module)

Solution 3

For another solution, use a Pool to calculate data, returning it to the parent process. This parent then writes all data to a file. Since there's only one proc writing to the file at a time, there's no need for additional locking.

Note the following uses a pool of processes, not threads. This makes the code much simpler and easier than putting something together using the threading module. (There is a ThreadPool object, but it's not documented.)

source

import glob, os, time
from multiprocessing import Pool

def filesize(path):
    time.sleep(0.1)
    return (path, os.path.getsize(path))

paths = glob.glob('*.py')
pool = Pool()                   # default: proc per CPU

with open("results.txt", 'w+') as dataf:
    for (apath, asize) in pool.imap_unordered(
            filesize, paths,
    ):
        print >>dataf, apath,asize

output in results.txt

zwrap.py 122
usercustomize.py 38
tpending.py 2345
msimple4.py 385
parse2.py 499
Share:
21,877
user3515946
Author by

user3515946

Updated on July 09, 2022

Comments

  • user3515946
    user3515946 almost 2 years

    I am writing a script to retrieve WMI info from many computers at the same time then write this info in a text file:

    f = open("results.txt", 'w+') ## to clean the results file before the start
    
    
    def filesize(asset):  
        f = open("results.txt", 'a+')  
        c = wmi.WMI(asset)  
        wql = 'SELECT FileSize,Name FROM CIM_DataFile where (Drive="D:" OR Drive="E:") and Caption like "%file%"'  
        for item in c.query(wql):  
            print >> f, item.Name.split("\\")[2].strip().upper(), str(item.FileSize)  
    
    
    
    
    class myThread (threading.Thread):  
        def __init__(self,name):  
            threading.Thread.__init__(self)  
            self.name = name  
        def run(self):  
            pythoncom.CoInitialize ()  
            print "Starting " + self.name       
            filesize(self.name)  
            print "Exiting " + self.name  
    
    
    
    thread1 = myThread('10.24.2.31')  
    thread2 = myThread('10.24.2.32')  
    thread3 = myThread('10.24.2.33')  
    thread4 = myThread('10.24.2.34')  
    thread1.start()  
    thread2.start()  
    thread3.start()  
    thread4.start()
    

    The problem is that all threads writing at the same time.

  • George L
    George L over 8 years
    If a thread tries to write_to_file while the file is locked to another thread, will both still get their turn to write?
  • Martin Konecny
    Martin Konecny over 8 years
    Yes when the thread that has the lock releases it, the waiting thread will obtain the lock.
  • ProGrammer
    ProGrammer almost 7 years
    @MartinKonecny I am using this acquire and release method for a python script where 4 threads write to one file simultaneously. It neatly aligns the tasks and prevents writing errors. I am working on a second (completely separate script) that is writing to the same file as the first. When running at the same time, does the lock and acquire method implemented in the first script also prevent simultaneous access by the second script (where it isn't implemented)? I.e Does this method lock the file for every script attempting to access it at that time? ...curious but the documentation is unclear.
  • Martin Konecny
    Martin Konecny almost 7 years
    Nope, this lock method will only work for threads running in the same script
  • Mayur
    Mayur over 4 years
    why some people use with lock : #operation