Transfer raw binary with apache commons-net FTPClient?

32,614

Solution 1

After login to the ftp server

ftp.setFileType(FTP.BINARY_FILE_TYPE);

The line below doesn't solve it:

//ftp.setFileTransferMode(org.apache.commons.net.ftp.FTP.BINARY_FILE_TYPE);

Solution 2

It sounds to me as if your application code might have got the selection of ASCII and BINARY mode inverted. ASCII is coming through unchanged, BINARY performing end-of-line character translations is the exact opposite of how FTP is supposed to work.

If that is not the problem, please edit your question to add the relevant part of your code.

EDIT

A couple of other possible (but IMO unlikely) explanations:

  • The FTP server is broken / misconfigured. (Can you successfully download the file in ASCII / BINARY mode using a non-Java command-line FTP utility?)
  • You are talking to the FTP server via a proxy that is broken or misconfigured.
  • You've somehow managed to get hold of a dodgy (hacked) copy of the Apache FTP client JAR file. (Yea, yea, very unlikely ...)

Solution 3

I found that Apache retrieveFile(...) sometimes did not work with File Sizes exceeding a certain limit. To overcome that I would used retrieveFileStream() instead. Prior to download I have set the Correct FileType and set the Mode to PassiveMode

So the code will look like

    ....
    ftpClientConnection.setFileType(FTP.BINARY_FILE_TYPE);
    ftpClientConnection.enterLocalPassiveMode();
    ftpClientConnection.setAutodetectUTF8(true);

    //Create an InputStream to the File Data and use FileOutputStream to write it
    InputStream inputStream = ftpClientConnection.retrieveFileStream(ftpFile.getName());
    FileOutputStream fileOutputStream = new FileOutputStream(directoryName + "/" + ftpFile.getName());
    //Using org.apache.commons.io.IOUtils
    IOUtils.copy(inputStream, fileOutputStream);
    fileOutputStream.flush();
    IOUtils.closeQuietly(fileOutputStream);
    IOUtils.closeQuietly(inputStream);
    boolean commandOK = ftpClientConnection.completePendingCommand();
    ....
Share:
32,614
Chris Suter
Author by

Chris Suter

Senior Software Engineer, Google Research. Background in mathematics. Interest in systems, frameworks, linux, music, neat ideas and difficult proofs.

Updated on May 16, 2020

Comments

  • Chris Suter
    Chris Suter about 4 years

    UPDATE: Solved

    I was calling FTPClient.setFileType() before I logged in, causing the FTP server to use the default mode (ASCII) no matter what I set it to. The client, on the other hand, was behaving as though the file type had been properly set. BINARY mode is now working exactly as desired, transporting the file byte-for-byte in all cases. All I had to do was a little traffic sniffing in wireshark and then mimicing the FTP commands using netcat to see what was going on. Why didn't I think of that two days ago!? Thanks, everyone for your help!

    I have an xml file, utf-16 encoded, which I am downloading from an FTP site using apache's commons-net-2.0 java library's FTPClient. It offers support for two transfer modes: ASCII_FILE_TYPE and BINARY_FILE_TYPE, the difference being that ASCII will replace line separators with the appropriate local line separator ('\r\n' or just '\n' -- in hex, 0x0d0a or just 0x0a). My problem is this: I have a test file, utf-16 encoded, that contains the following:

    <?xml version='1.0' encoding='utf-16'?>
    <data>
        <blah>blah</blah>
    </data>

    Here's the hex:
    0000000: 003c 003f 0078 006d 006c 0020 0076 0065 .<.?.x.m.l. .v.e
    0000010: 0072 0073 0069 006f 006e 003d 0027 0031 .r.s.i.o.n.=.'.1
    0000020: 002e 0030 0027 0020 0065 006e 0063 006f ...0.'. .e.n.c.o
    0000030: 0064 0069 006e 0067 003d 0027 0075 0074 .d.i.n.g.=.'.u.t
    0000040: 0066 002d 0031 0036 0027 003f 003e 000a .f.-.1.6.'.?.>..
    0000050: 003c 0064 0061 0074 0061 003e 000a 0009 .<.d.a.t.a.>....
    0000060: 003c 0062 006c 0061 0068 003e 0062 006c .<.b.l.a.h.>.b.l
    0000070: 0061 0068 003c 002f 0062 006c 0061 0068 .a.h.<./.b.l.a.h
    0000080: 003e 000a 003c 002f 0064 0061 0074 0061 .>...<./.d.a.t.a
    0000090: 003e 000a                                                            .>..

    When I use ASCII mode for this file it transfers correctly, byte-for-byte; the result has the same md5sum. Great. When I use BINARY transfer mode, which is not supposed to do anything but shuffle bytes from an InputStream into an OutputStream, the result is that the newlines (0x0a) are converted to carriage return + newline pairs (0x0d0a). Here's the hex after binary transfer:

    0000000: 003c 003f 0078 006d 006c 0020 0076 0065 .<.?.x.m.l. .v.e
    0000010: 0072 0073 0069 006f 006e 003d 0027 0031 .r.s.i.o.n.=.'.1
    0000020: 002e 0030 0027 0020 0065 006e 0063 006f ...0.'. .e.n.c.o
    0000030: 0064 0069 006e 0067 003d 0027 0075 0074 .d.i.n.g.=.'.u.t
    0000040: 0066 002d 0031 0036 0027 003f 003e 000d .f.-.1.6.'.?.>..
    0000050: 0a00 3c00 6400 6100 7400 6100 3e00 0d0a ..<.d.a.t.a.>...
    0000060: 0009 003c 0062 006c 0061 0068 003e 0062 ...<.b.l.a.h.>.b
    0000070: 006c 0061 0068 003c 002f 0062 006c 0061 .l.a.h.<./.b.l.a
    0000080: 0068 003e 000d 0a00 3c00 2f00 6400 6100 .h.>....<./.d.a.
    0000090: 7400 6100 3e00 0d0a                                        t.a.>...

    Not only does it convert the newline characters (which it shouldn't), but it doesn't respect the utf-16 encoding (not that I would expect it to know that it should, it's just a dumb FTP pipe). The result is unreadable without further processing to realign the bytes. I would just use ASCII mode, but my application will also be moving real binary data (mp3 files and jpeg images) across the same pipe. Using the BINARY transfer mode on these binary files also causes them to have random 0x0ds injected into their contents, which can't safely be removed since the binary data often contains legitimate 0x0d0a sequences. If I use ASCII mode on these files, then the "clever" FTPClient converts these 0x0d0as into 0x0a leaving the file inconsistent no matter what I do.

    I guess my question(s) is(are): does anyone know of any good FTP libraries for java that just move the damned bytes from there to here, or am I going to have to hack up apache commons-net-2.0 and maintain my own FTP client code just for this simple application? Has anyone else dealt with this bizarre behavior? Any suggestions would be appreciated.

    I checked out the commons-net source code and it doesn't look like it's responsible for the weird behavior when BINARY mode is used. But the InputStream it's reading from in BINARY mode is just a java.io.BufferedInptuStream wrapped around a socket InputStream. Do these lower level java streams ever do any weird byte-manipulation? I would be shocked if they did, but I don't see what else could be going on here.

    EDIT 1:

    Here's a minimal piece of code that mimics what I'm doing to download the file. To compile, just do

    javac -classpath /path/to/commons-net-2.0.jar Main.java
    

    To run, you'll need directories /tmp/ascii and /tmp/binary for the file to download to, as well as an ftp site set up with the file sitting in it. The code will also need to be configured with the appropriate ftp host, username and password. I put the file on my testing ftp site under the test/ folder and called the file test.xml. The test file should at least have more than one line, and be utf-16 encoded (this may not be necessary, but will help to recreate my exact situation). I used vim's :set fileencoding=utf-16 command after opening a new file and entered the xml text referenced above. Finally, to run, just do

    java -cp .:/path/to/commons-net-2.0.jar Main
    

    Code:

    (NOTE: this code modified to use custom FTPClient object, linked below under "EDIT 2")

    import java.io.*;
    import java.util.zip.CheckedInputStream;
    import java.util.zip.CheckedOutputStream;
    import java.util.zip.CRC32;
    import org.apache.commons.net.ftp.*;
    
    public class Main implements java.io.Serializable
    {
        public static void main(String[] args) throws Exception
        {
            Main main = new Main();
            main.doTest();
        }
    
        private void doTest() throws Exception
        {
            String host = "ftp.host.com";
            String user = "user";
            String pass = "pass";
    
            String asciiDest = "/tmp/ascii";
            String binaryDest = "/tmp/binary";
    
            String remotePath = "test/";
            String remoteFilename = "test.xml";
    
            System.out.println("TEST.XML ASCII");
            MyFTPClient client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.ASCII_FILE_TYPE);
            File path = new File("/tmp/ascii");
            downloadFTPFileToPath(client, "test/", "test.xml", path);
            System.out.println("");
    
            System.out.println("TEST.XML BINARY");
            client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.BINARY_FILE_TYPE);
            path = new File("/tmp/binary");
            downloadFTPFileToPath(client, "test/", "test.xml", path);
            System.out.println("");
    
            System.out.println("TEST.MP3 ASCII");
            client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.ASCII_FILE_TYPE);
            path = new File("/tmp/ascii");
            downloadFTPFileToPath(client, "test/", "test.mp3", path);
            System.out.println("");
    
            System.out.println("TEST.MP3 BINARY");
            client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.BINARY_FILE_TYPE);
            path = new File("/tmp/binary");
            downloadFTPFileToPath(client, "test/", "test.mp3", path);
        }
    
        public static File downloadFTPFileToPath(MyFTPClient ftp, String remoteFileLocation, String remoteFileName, File path)
            throws Exception
        {
            // path to remote resource
            String remoteFilePath = remoteFileLocation + "/" + remoteFileName;
    
            // create local result file object
            File resultFile = new File(path, remoteFileName);
    
            // local file output stream
            CheckedOutputStream fout = new CheckedOutputStream(new FileOutputStream(resultFile), new CRC32());
    
            // try to read data from remote server
            if (ftp.retrieveFile(remoteFilePath, fout)) {
                System.out.println("FileOut: " + fout.getChecksum().getValue());
                return resultFile;
            } else {
                throw new Exception("Failed to download file completely: " + remoteFilePath);
            }
        }
    
        public static MyFTPClient createFTPClient(String url, String user, String pass, int type)
            throws Exception
        {
            MyFTPClient ftp = new MyFTPClient();
            ftp.connect(url);
            if (!ftp.setFileType( type )) {
                throw new Exception("Failed to set ftpClient object to BINARY_FILE_TYPE");
            }
    
            // check for successful connection
            int reply = ftp.getReplyCode();
            if (!FTPReply.isPositiveCompletion(reply)) {
                ftp.disconnect();
                throw new Exception("Failed to connect properly to FTP");
            }
    
            // attempt login
            if (!ftp.login(user, pass)) {
                String msg = "Failed to login to FTP";
                ftp.disconnect();
                throw new Exception(msg);
            }
    
            // success! return connected MyFTPClient.
            return ftp;
        }
    
    }
    

    EDIT 2:

    Okay I followed the CheckedXputStream advice and here are my results. I made a copy of apache's FTPClient called MyFTPClient, and I wrapped both the SocketInputStream and the BufferedInputStream in a CheckedInputStream using CRC32 checksums. Furthermore, I wrapped the FileOutputStream that I give to FTPClient to store the output in a CheckOutputStream with CRC32 checksum. The code for MyFTPClient is posted here and I've modified the above test code to use this version of the FTPClient (tried to post a gist URL to the modified code, but I need 10 reputation points to post more than one URL!), test.xml and test.mp3 and the results were thus:

    14:00:08,644 DEBUG [main,TestMain] TEST.XML ASCII
    14:00:08,919 DEBUG [main,MyFTPClient] Socket CRC32: 2739864033
    14:00:08,919 DEBUG [main,MyFTPClient] Buffer CRC32: 2739864033
    14:00:08,954 DEBUG [main,FTPUtils] FileOut CRC32: 866869773
    
    14:00:08,955 DEBUG [main,TestMain] TEST.XML BINARY
    14:00:09,270 DEBUG [main,MyFTPClient] Socket CRC32: 2739864033
    14:00:09,270 DEBUG [main,MyFTPClient] Buffer CRC32: 2739864033
    14:00:09,310 DEBUG [main,FTPUtils] FileOut CRC32: 2739864033
    
    14:00:09,310 DEBUG [main,TestMain] TEST.MP3 ASCII
    14:00:10,635 DEBUG [main,MyFTPClient] Socket CRC32: 60615183
    14:00:10,635 DEBUG [main,MyFTPClient] Buffer CRC32: 60615183
    14:00:10,636 DEBUG [main,FTPUtils] FileOut CRC32: 2352009735
    
    14:00:10,636 DEBUG [main,TestMain] TEST.MP3 BINARY
    14:00:11,482 DEBUG [main,MyFTPClient] Socket CRC32: 60615183
    14:00:11,482 DEBUG [main,MyFTPClient] Buffer CRC32: 60615183
    14:00:11,483 DEBUG [main,FTPUtils] FileOut CRC32: 60615183
    

    This makes, basically zero sense whatsoever because here are the md5sums of the corresponsing files:

    bf89673ee7ca819961442062eaaf9c3f  ascii/test.mp3
    7bd0e8514f1b9ce5ebab91b8daa52c4b  binary/test.mp3
    ee172af5ed0204cf9546d176ae00a509  original/test.mp3
    
    104e14b661f3e5dbde494a54334a6dd0  ascii/test.xml
    36f482a709130b01d5cddab20a28a8e8  binary/test.xml
    104e14b661f3e5dbde494a54334a6dd0  original/test.xml
    

    I'm at a loss. I swear I haven't permuted the filenames/paths at any point in this process, and I've triple-checked every step. It must be something simple, but I haven't the foggiest idea where to look next. In the interest of practicality I'm going to proceed by calling out to the shell to do my FTP transfers, but I intend to pursue this until I understand what the hell is going on. I'll update this thread with my findings, and I'll continue to appreciate any contributions anyone may have. Hopefully this will be useful to someone at some point!