How do I make python programs behave like proper unix tools?

20,041

Solution 1

Why not just

files = sys.argv[1:]
if not files:
    files = ["/dev/stdin"]

for file in files:
    f = open(file)
    ...

Solution 2

Check if a filename is given as an argument, or else read from sys.stdin.

Something like this:

if sys.argv[1]:
   f = open(sys.argv[1])
else:
   f = sys.stdin 

It's similar to Mikel's answer except it uses the sys module. I figure if they have it in there it must be for a reason...

Solution 3

My preferred way of doing it turns out to be... (and this is taken from a nice little Linux blog called Harbinger's Hollow)

#!/usr/bin/env python

import argparse, sys

parser = argparse.ArgumentParser()
parser.add_argument('filename', nargs='?')
args = parser.parse_args()
if args.filename:
    string = open(args.filename).read()
elif not sys.stdin.isatty():
    string = sys.stdin.read()
else:
    parser.print_help()

The reason why I liked this best is that, as the blogger says, it just outputs a silly message if accidentally called without input. It also slots so nicely into all of my existing Python scripts that I have modified them all to include it.

Solution 4

files=sys.argv[1:]

for f in files or [sys.stdin]:
   if isinstance(f, file):
      txt = f.read()
   else:
      txt = open(f).read()

   process(txt)
Share:
20,041

Related videos on Youtube

Matthew
Author by

Matthew

Updated on September 18, 2022

Comments

  • Matthew
    Matthew over 1 year

    I have a few Python scripts laying around, and I'm working on rewriting them. I have the same problem with all of them.

    It's not obvious to me how to write the programs so that they behave like proper unix tools.

    Because this

    $ cat characters | progname
    

    and this

    $ progname characters
    

    should produce the same output.

    The closest thing I could find to that in Python was the fileinput library. Unfortunately, I don't really see how to rewrite my Python scripts, all of which look like this:

    #!/usr/bin/env python 
    # coding=UTF-8
    
    import sys, re
    
    for file in sys.argv[1:]:
        f = open(file)
        fs = f.read()
        regexnl = re.compile('[^\s\w.,?!:;-]')
        rstuff = regexnl.sub('', fs)
        f.close()
        print rstuff
    

    The fileinput library processes stdin if there is a stdin, and processes a file if there is a file. But it iterates over single lines.

    import fileinput
    for line in fileinput.input():
        process(line)
    

    I really don't get that. I guess if you're dealing with small files, or if you're not doing much to the files, this may seem obvious. But, for my purposes, this makes it much slower than simply opening the entire file and reading it into a string, as above.

    Currently I run the script above like

    $ pythonscript textfilename1 > textfilename2
    

    But I want to be able to run it (and its brethren) in pipes, like

    $ grep pattern textfile1 | pythonscript | pythonscript | pythonscript > textfile2
    
  • Mikel
    Mikel over 11 years
    What if two file names are specified on the command line?
  • rahmu
    rahmu over 11 years
    Oh absolutely! I didn't bother showing it because it was already shown in your answer. At some point you have to trust the user to decide what she needs. But feel free to edit if you believe this is best. My point is only to replace "open(/dev/stdin") with sys.stdin.
  • musiphil
    musiphil over 10 years
    Sometimes you do want to enter the input interactively from a tty; checking isatty and bailing out does not conform to the philosophy of Unix filters.
  • Piotr Dobrogost
    Piotr Dobrogost over 9 years
    sys.stdin should be used instead as it's more portable than hardcoded path to file.
  • tripleee
    tripleee over 8 years
    Apart from the isatty wart, this covers useful and important ground not found in the other answers, so it gets my upvote.
  • smci
    smci over 8 years
    sys.stdin should be used instead, as Piotr says
  • Yibo Yang
    Yibo Yang over 7 years
    you may want to check if len(sys.argv)>1: instead of if sys.argv[1]: otherwise you get an index out of range error
  • Mikel
    Mikel almost 6 years
    This is how I would have written it, if /dev/stdin were unavailable on all my systems.
  • alexis
    alexis over 5 years
    But sys.stdin is a file, and it's already open, and must not be closed. Impossible to handle just like a file argument without jumping through hoops.
  • Mikel
    Mikel over 5 years
    @alexis Sure, if you want to close f, or want to use a context manager, you need something more complex. See my new answer as an alternative.
  • alexis
    alexis over 5 years
    Why do you move the file pointer on exit? Bad idea. If input was redirected from a file, the next program will read it again. (And if stdin is a terminal, seek usually does nothing, right?) Just leave it alone.
  • Mikel
    Mikel over 5 years
    Yeah, done. I just thought it was cute to use - multiple times. :)
  • abalter
    abalter about 2 years
    Could someone explain what the Stdin class does?