Python -- "Batch Processing" of multiple existing scripts

15,092

Solution 1

If you just need to have the scripts run, probably a shell script would be the easiest thing.

If you want to stay in Python, the best way would be to have a main() (or somesuch) function in each script (and have each script importable), have the batch script import the subscript and then run its main.

If staying in Python: - your three scripts must have the .py ending to be importable - they should either be in Python's search path, or the batch control script can set the path - they should each have a main function (or whatever name you choose) that will activate that script

For example:

batch_script

import sys
sys.path.insert(0, '/location/of/subscripts')

import first_script
import second_script
import third_script

first_script.main('/location/of/files')
second_script.main('/location/of/files')
third_script.main('/location/of/files')

example sub_script

import os
import sys
import some_other_stuff
SOMETHING_IMPORTANT = 'a value'

def do_frobber(a_file):
   ...

def main(path_to_files):
    all_files = os.listdir(path_to_files)
    for file in all_files:
        do_frobber(os.path.join(path_to_files, file)

if __name__ == '__main__':
    main(sys.argv[1])

This way, your subscript can be run on its own, or called from the main script.

Solution 2

You can write a batch script in python using os.walk() to generate a list of the files and then process them one by one with your existing python programs.

import os, re

for root, dir, file in os.walk(/path/to/files):
    for f in file:
        if re.match('.*\.dat$', f):
            run_existing_script1 root + "/" file
            run_existing_script2 root + "/" file

If there are other files in the directory you might want to add a regex to ensure you only process the files you're interested in.

EDIT - added regular expression to ensure only files ending ".dat" are processed.

Share:
15,092
Palmetto_Girl86
Author by

Palmetto_Girl86

graduate student, Physics, Lehigh University

Updated on June 05, 2022

Comments

  • Palmetto_Girl86
    Palmetto_Girl86 about 2 years

    I have written three simple scripts (which I will not post here, as they are part of my dissertation research) that are all in working order.

    What I would like to do now is write a "batch-processing" script for them. I have many (read as potentially tens of thousands) of data files on which I want these scripts to act.

    My questions about this process are as follows:

    1. What is the most efficient way to go about this sort of thing?
    2. I am relatively new to programming. Is there a simple way to do this, or is this a very complex endeavor?

    Before anyone downvotes this question as "unresearched" or whatever negative connotation comes to mind, PLEASE just offer help. I have spent days reading documentation and following leads from Google searches, and it would be most appreciated if a human being could offer some input.

  • Palmetto_Girl86
    Palmetto_Girl86 over 9 years
    Forgive my ignorance, but what is a regex?
  • Palmetto_Girl86
    Palmetto_Girl86 over 9 years
    I'm needing to stay in Python. What do you mean about having each script importable? Is that something that must be done to a .py file?
  • Eenvincible
    Eenvincible over 9 years
    Regex is regular expression
  • Tim B
    Tim B over 9 years
    @Palmetto_Girl86 - Sorry, a regex is a regular expression and is used to match string patterns. You can use them in python with the re module (link). I'll update my answer to include one as an example.
  • Ethan Furman
    Ethan Furman over 9 years
    @Palmetto_Girl86: '.py' files are importable, but they also have to be accessible to Python (which means in the Python search path). So the batch script needs to be able to say import script1* and have your first script available to it. *'script1' should be your actual script name.
  • Palmetto_Girl86
    Palmetto_Girl86 over 9 years
    Is it necessary for the file I'm importing to be in the same directory as the batch processing file, or is it sufficient to specify the path?
  • Tim B
    Tim B over 9 years
    @Palmetto_Girl86 It should be ok to specify the path.
  • Ethan Furman
    Ethan Furman over 9 years
    @TimB: The path cannot be specified in an import.
  • abarnert
    abarnert over 9 years
    @Palmetto_Girl86: Being in the same directory actually isn't relevant at all, because it's the current working directory, not the script directory, that matters. (Well, you can use sys.path.insert(0, os.path.abspath(os.path.dirname(sys.argv[0]))), but the point is, you still need the sys.path munging.)
  • abarnert
    abarnert over 9 years
    @Palmetto_Girl86: Actually, another alternative is to put all of the scripts in a package together (a directory named pkgname with an __init__.py file), which means you run it with python -m pkgname.main; in that case, you can from . import first_script. (Or you can rename main.py to __init__.py, and then run it with python -m pkgname.)