Downloading text files with Python and ftplib.FTP from z/os

12,028

Solution 1

Just came across this question as I was trying to figure out how to recursively download datasets from z/OS. I've been using a simple python script for years now to download ebcdic files from the mainframe. It effectively just does this:

def writeline(line):
    file.write(line + "\n")

file = open(filename, "w")
ftp.retrlines("retr " + filename, writeline)

Solution 2

You should be able to download the file as a binary (using retrbinary) and use the codecs module to convert from EBCDIC to whatever output encoding you want. You should know the specific EBCDIC code page being used on the z/OS system (e.g. cp500). If the files are small, you could even do something like (for a conversion to UTF-8):

file = open(ebcdic_filename, "rb")
data = file.read()
converted = data.decode("cp500").encode("utf8")
file = open(utf8_filename, "wb")
file.write(converted)
file.close()

Update: If you need to use retrlines to get the lines and your lines are coming back in the correct encoding, your approach will not work, because the callback is called once for each line. So in the callback, sequence will be the line, and your for loop will write individual characters in the line to the output, each on its own line. So you probably want to do self.write(sequence + "\r\n") rather than the for loop. It still doesn' feel especially right to subclass file just to add this utility method, though - it probably needs to be in a different class in your bells-and-whistles version.

Solution 3

Your writelineswitheol method appends '\r\n' instead of '\n' and then writes the result to a file opened in text mode. The effect, no matter what platform you are running on, will be an unwanted '\r'. Just append '\n' and you will get the appropriate line ending.

Proper error handling should not be relegated to a "bells and whistles" version. You should set up your callback so that your file open() is in a try/except and retains a reference to the output file handle, your write call is in a try/except, and you have a callback_obj.close() method which you use when retrlines() returns to explicitly file_handle.close() (in a try/except) -- that way you get explict error handling e.g. messages "can't (open|write to|close) file X because Y" AND you save having to think about when your files are going to be implicitly closed and whether you risk running out of file handles.

Python 3.x ftplib.FTP.retrlines() should give you str objects which are in effect Unicode strings, and you will need to encode them before you write them -- unless the default encoding is latin1 which would be rather unusual for a Windows box. You should have test files with (1) all possible 256 bytes (2) all bytes that are valid in the expected EBCDIC codepage.

[a few "sanitation" remarks]

  1. You should consider upgrading your Python from 3.0 (a "proof of concept" release) to 3.1.

  2. To facilitate better understanding of your code, use "i" as an identifier only as a sequence index and only if you irredeemably acquired the habit from FORTRAN 3 or more decades ago :-)

  3. Two of the problems discovered so far (appending line terminator to each character, wrong line terminator) would have shown up the first time you tested it.

Share:
12,028
mikeramos
Author by

mikeramos

Old-ish IT Geezer, young at heart, memoir fanboy

Updated on June 13, 2022

Comments

  • mikeramos
    mikeramos almost 2 years

    I'm trying to automate downloading of some text files from a z/os PDS, using Python and ftplib.

    Since the host files are EBCDIC, I can't simply use FTP.retrbinary().

    FTP.retrlines(), when used with open(file,w).writelines as its callback, doesn't, of course, provide EOLs.

    So, for starters, I've come up with this piece of code which "looks OK to me", but as I'm a relative Python noob, can anyone suggest a better approach? Obviously, to keep this question simple, this isn't the final, bells-and-whistles thing.

    Many thanks.

    #!python.exe
    from ftplib import FTP
    
    class xfile (file):
        def writelineswitheol(self, sequence):
            for s in sequence:
                self.write(s+"\r\n")
    
    sess = FTP("zos.server.to.be", "myid", "mypassword")
    sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
    sess.cwd("'FOO.BAR.PDS'")
    a = sess.nlst("RTB*")
    for i in a:
        sess.retrlines("RETR "+i, xfile(i, 'w').writelineswitheol)
    sess.quit()
    

    Update: Python 3.0, platform is MingW under Windows XP.

    z/os PDSs have a fixed record structure, rather than relying on line endings as record separators. However, the z/os FTP server, when transmitting in text mode, provides the record endings, which retrlines() strips off.

    Closing update:

    Here's my revised solution, which will be the basis for ongoing development (removing built-in passwords, for example):

    import ftplib
    import os
    from sys import exc_info
    
    sess = ftplib.FTP("undisclosed.server.com", "userid", "password")
    sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
    for dir in ["ASM", "ASML", "ASMM", "C", "CPP", "DLLA", "DLLC", "DLMC", "GEN", "HDR", "MAC"]:
        sess.cwd("'ZLTALM.PREP.%s'" % dir)
        try:
            filelist = sess.nlst()
        except ftplib.error_perm as x:
            if (x.args[0][:3] != '550'):
                raise
        else:
            try:
                os.mkdir(dir)
            except:
                continue
            for hostfile in filelist:
                lines = []
                sess.retrlines("RETR "+hostfile, lines.append)
                pcfile = open("%s/%s"% (dir,hostfile), 'w')
                for line in lines:
                    pcfile.write(line+"\n")
                pcfile.close()
            print ("Done: " + dir)
    sess.quit()
    

    My thanks to both John and Vinay

  • mikeramos
    mikeramos almost 15 years
    Thanks, Vinay, that's an interesting idea, but how do I insert the newlines? (These are conventional zos PDSs, not OpenEdition files)
  • Vinay Sajip
    Vinay Sajip almost 15 years
    How are the lines terminated on the host system, then, if not with EBCDIC line feeds?
  • mikeramos
    mikeramos almost 15 years
    The host file system is record-based. It's either fixed-length, in which case all the records have the same length, or variable-length, where the length is stored in a descriptor field at the start of each record. FTP.retrlines() extracts the records correctly, but (correctly, I think) doesn't provide the newlines.
  • mikeramos
    mikeramos almost 15 years
    @Vinay.Update: Oops, yes, I understand. When I get back to the mainframe, later this week, I'll give some ideas a try, and post back.
  • john ktejik
    john ktejik over 2 years
    Callback? What callback? Code, please.
  • john ktejik
    john ktejik over 2 years
    file is not defined