Python -- "Batch Processing" of multiple existing scripts
Solution 1
If you just need to have the scripts run, probably a shell script would be the easiest thing.
If you want to stay in Python, the best way would be to have a main()
(or somesuch) function in each script (and have each script importable), have the batch script import the subscript and then run its main
.
If staying in Python:
- your three scripts must have the .py
ending to be importable
- they should either be in Python's search path, or the batch control script can set the path
- they should each have a main
function (or whatever name you choose) that will activate that script
For example:
batch_script
import sys
sys.path.insert(0, '/location/of/subscripts')
import first_script
import second_script
import third_script
first_script.main('/location/of/files')
second_script.main('/location/of/files')
third_script.main('/location/of/files')
example sub_script
import os
import sys
import some_other_stuff
SOMETHING_IMPORTANT = 'a value'
def do_frobber(a_file):
...
def main(path_to_files):
all_files = os.listdir(path_to_files)
for file in all_files:
do_frobber(os.path.join(path_to_files, file)
if __name__ == '__main__':
main(sys.argv[1])
This way, your subscript can be run on its own, or called from the main script.
Solution 2
You can write a batch script in python using os.walk()
to generate a list of the files and then process them one by one with your existing python programs.
import os, re
for root, dir, file in os.walk(/path/to/files):
for f in file:
if re.match('.*\.dat$', f):
run_existing_script1 root + "/" file
run_existing_script2 root + "/" file
If there are other files in the directory you might want to add a regex to ensure you only process the files you're interested in.
EDIT - added regular expression to ensure only files ending ".dat" are processed.
![Palmetto_Girl86](https://i.stack.imgur.com/qnSg2.jpg?s=256&g=1)
Comments
-
Palmetto_Girl86 about 2 years
I have written three simple scripts (which I will not post here, as they are part of my dissertation research) that are all in working order.
What I would like to do now is write a "batch-processing" script for them. I have many (read as potentially tens of thousands) of data files on which I want these scripts to act.
My questions about this process are as follows:
- What is the most efficient way to go about this sort of thing?
- I am relatively new to programming. Is there a simple way to do this, or is this a very complex endeavor?
Before anyone downvotes this question as "unresearched" or whatever negative connotation comes to mind, PLEASE just offer help. I have spent days reading documentation and following leads from Google searches, and it would be most appreciated if a human being could offer some input.
-
Palmetto_Girl86 over 9 yearsForgive my ignorance, but what is a regex?
-
Palmetto_Girl86 over 9 yearsI'm needing to stay in Python. What do you mean about having each script importable? Is that something that must be done to a .py file?
-
Eenvincible over 9 yearsRegex is regular expression
-
Tim B over 9 years@Palmetto_Girl86 - Sorry, a regex is a regular expression and is used to match string patterns. You can use them in python with the
re
module (link). I'll update my answer to include one as an example. -
Ethan Furman over 9 years@Palmetto_Girl86: '.py' files are importable, but they also have to be accessible to Python (which means in the Python search path). So the batch script needs to be able to say
import script1
* and have your first script available to it. *'script1' should be your actual script name. -
Palmetto_Girl86 over 9 yearsIs it necessary for the file I'm importing to be in the same directory as the batch processing file, or is it sufficient to specify the path?
-
Tim B over 9 years@Palmetto_Girl86 It should be ok to specify the path.
-
Ethan Furman over 9 years@TimB: The path cannot be specified in an
import
. -
abarnert over 9 years@Palmetto_Girl86: Being in the same directory actually isn't relevant at all, because it's the current working directory, not the script directory, that matters. (Well, you can use
sys.path.insert(0, os.path.abspath(os.path.dirname(sys.argv[0])))
, but the point is, you still need thesys.path
munging.) -
abarnert over 9 years@Palmetto_Girl86: Actually, another alternative is to put all of the scripts in a package together (a directory named
pkgname
with an__init__.py
file), which means you run it withpython -m pkgname.main
; in that case, you canfrom . import first_script
. (Or you can renamemain.py
to__init__.py
, and then run it withpython -m pkgname
.)