How to know if file is complete on the server using FTP?

35,869

Solution 1

This is a very old and well-known problem.

There is no way to be absolutely certain a file being written by the FTP daemon is complete. It's even possible that the file transfer failed and then gets restarted and completed. You must poll the file's size and set a time limit, say 5 minutes. If the size does not change during that time you assume the file is complete.

If possible, the program that processes the file should be able to deal with partial files.

A much better alternative is rsync, which is much more robust and deterministic. It can even be configured (via command-line option) to write the data initially to a temporary location and move it to its final destination path upon successful completion. If the file exists where you expect it, then it is by definition complete.

Solution 2

A possible solution would be first uploading the file with a different filename (e.g. adding ".partial") and then renaming it to its final name.

If the server finds the final name then the upload has been completed.

If you cannot control the upload process then what you are asking is impossible by definition: the file upload could stop because of a network problem or because the sending process is stopped for whatever reason.

What the receiving end will observe is just a closing of the incoming stream; there is no way to guarantee that the data will not be a partial transfer.

Other workarounds could be checking for an end-of-data marker or using a request to the sending server to check if (in their view) the transfer has been completed.

Solution 3

This is more fundamental than FTP: you'd have a similar problem reading those files even if they were being created on the local machine.

If you can't modify the writing process, you'll need to jump through some hoops. None are great, but some are safer than others.

  • Keep reading until nothing changes for some window (maybe a minute, like David Schwartz suggests). You could optimize this a bit by watching the file size.
  • Figure out if the files are written serially in a reliable order. When you see file N appear, you know that file N-1 is ready. (Assumes that the directory is empty before the files are written, though you could also look at timestamps.) The downside is that your logic will break if the writer ever changes order or starts writing in parallel.

The reliable, safe solutions require improving the writer process.

  • Writer can write the files to hidden or temporary locations and only make them visible once the entire file (or directory) is ready, using symlinks or file-moving or chmod.
  • Writer creates a special file (e.g., "./DONE") only after all other files have been written, and reader doesn't read any files until that file is present.
  • Depending on the file type, the writer could add some kind of end-of-file record/line at the end of the file, and the reader could ensure that it's present.
Share:
35,869

Related videos on Youtube

mostafa.S
Author by

mostafa.S

Senior Java and Oracle Developer

Updated on July 19, 2022

Comments

  • mostafa.S
    mostafa.S almost 2 years

    I have a file scanner application in Java, that keeps scanning a directory on a server using FTP. gets list of files of the directory and downloads them one by one. on the other side, on the server, there's a process that writes these files. if I'm lucky I wouldn't try to download an incomplete file but how can I make sure if the write process on the server is complete and the file handle is closed, and file is ready to be downloaded?

    I have no control on the write process which is on the server. moreover, I don't have write permission on the directory to try to get a write-handle in order to check if there's already a write handle open, so this option is off the table.

    Is there an FTP function addressing this problem?

    • David Schwartz
      David Schwartz over 11 years
      The best you can do is see that the file hasn't been modified for some amount of time, say a minute.
    • Mohammod Hossain
      Mohammod Hossain over 11 years
      which library are you using for ftp client?
    • Karthik T
      Karthik T over 11 years
      What if write starts after you start downloading?
    • Anantha Sharma
      Anantha Sharma over 11 years
      I agree with David, you should poll the FTP folder for atleast a couple of minutes, ensuer the file last modified remains the same and then download it.
    • mostafa.S
      mostafa.S over 11 years
      @MohammodHossain I use sauronsoftware.it/projects/ftp4j
    • mostafa.S
      mostafa.S over 11 years
      @DavidSchwartz I wonder isn't there a way to query the Operating system for number of open "write-handles" on a file? specially through ftp?
    • David Schwartz
      David Schwartz over 11 years
      @mostafa.S: Maybe. But that won't help you unless you know what ftp's write logic is. Does it close the write handle as soon as the upload is finished? Does it open files for writing even when it doesn't actually plan to write to them?
  • Anantha Sharma
    Anantha Sharma over 11 years
    @Hossain, his question was not about which library to use, but how to ensure he doesnt download incomplete files from the server, the Apache FTP library doesnt gaurentee you that it will always download the complete file from the server.
  • mostafa.S
    mostafa.S over 11 years
    sadly the writer process is out of my control and won't cooperate with me, so I'm on my own in this.
  • mostafa.S
    mostafa.S over 11 years
    Actually I'm already using this 5 minute threshold, the thing is, I really could use a faster file availability. however I might be able to manage to check size of file twice in less than a minute to make your solution work for me :) thank you Jim.
  • mostafa.S
    mostafa.S over 11 years
    @Mohammod anyway I will take a look at Apache common FTPClient, thanks
  • mostafa.S
    mostafa.S over 11 years
    @Mohammod I checked the documentation, it seems the flag is true if the retrieval completes successfully, doesn't mean that it won't download a file that is being still written to. I mean it will download the file as much as It's written. and it will return true if it can download the incomplete file successfully :) that's what I'm not talking about ;) anyway thanks
  • 6502
    6502 over 11 years
    I think what you are asking is impossible. If the sending server is shut down in the middle of the transfer and never turned on again is the transfer complete? There is no way to detect that from the receiving site.
  • mostafa.S
    mostafa.S over 11 years
    thanks dbort, I wonder isn't there a way to query the Operating system for number of open "write-handles" on a file? specially through ftp?
  • mostafa.S
    mostafa.S over 11 years
    I wonder isn't there a way to query the Operating system for number of open "write-handles" on a file? specially through ftp?
  • mostafa.S
    mostafa.S over 11 years
    I wonder isn't there a way to query the Operating system for number of open "write-handles" on a file? specially through ftp?
  • Jim Garrison
    Jim Garrison about 10 years
    That wouldn't discriminate between a failed transfer and a complete transfer. Use the rsync protocol instead if you can, it does a much better job.
  • Hejazzman
    Hejazzman over 5 years
    There is, but it would not be useful. The "write handles" would appear closed also when the FTP stopped because e.g. network connection loss and restarts later.