JAVA - Download Binary File (e.g. PDF) file from Webserver

21,976

Solution 1

Never store binary data into a String.

Never use PrintWriter for binary data.

Never write binary files line by line.

I don't want to be harsh or impolite but these three never's have to take roots in your mind! :)

You can see this page for an example on how to download a binary file. I don't like this example because it caches the whole document in memory (what happens if its size is 5GB?) but you can start from this. :)

Solution 2

Use apache FileUtils. I tried it with a small PDF and a JAR that was 60 meg. Works great!

import java.io.File;
import java.io.IOException;
import java.net.URL;
import org.apache.commons.io.FileUtils;

String uri = "http://localhost:8080/PMInstaller/f1.pdf";
URL url = new URL(uri);
File destination = new File("f1.pdf");
FileUtils.copyURLToFile(url, destination);

Solution 3

can't you just take the link?

public static void downloadFile(URL from, File to, boolean overwrite) throws Exception {
    if (to.exists()) {
        if (!overwrite)
            throw new Exception("File " + to.getAbsolutePath() + " exists already.");
        if (!to.delete())
            throw new Exception("Cannot delete the file " + to.getAbsolutePath() + ".");
    }

    int lengthTotal = 0;
    try {
        HttpURLConnection content = (HttpURLConnection) from.openConnection();
        lengthTotal = content.getContentLength();
    } catch (Exception e) {
        lengthTotal = -1;
    }

    int lengthSoFar = 0;
    InputStream is = from.openStream();
    FileOutputStream fos = new FileOutputStream(to);

    int lastUpdate = 0;
    int c;
    while ((c = is.read()) != -1) {
        fos.write(c);
    }

    is.close();
    fos.close();
}
Share:
21,976
Augusto Picciani
Author by

Augusto Picciani

if ( development != passion ){ isDeveloperDead = true; }else{ isDeveloperDead = false; }

Updated on July 09, 2022

Comments

  • Augusto Picciani
    Augusto Picciani almost 2 years

    I need to download a pdf file from a webserver to my pc and save it locally.

    I used Httpclient to connect to webserver and get the content body:

    HttpEntity entity=response.getEntity();
                    InputStream in=entity.getContent();
    
                    String stream = CharStreams.toString(new InputStreamReader(in));
                    int size=stream.length();
                    System.out.println("stringa html page LENGTH:"+stream.length());
                     System.out.println(stream);
                     SaveToFile(stream);
    

    Then i save content in a file:

                                  //check CRLF (i don't know if i need to to this)
                                       String[] fix=stream.split("\r\n");
    
                                          File file=new              File("C:\\Users\\augusto\\Desktop\\progetti web\\test\\test2.pdf");
                                          PrintWriter out = new PrintWriter(new FileWriter(file));
                                          for (int i = 0; i < fix.length; i++)  {
                                              out.print(fix[i]);
                                             out.print("\n");
    
                                          }
                                         out.close();
    

    I also tried to save a String content to file directly:

                             OutputStream out=new FileOutputStream("pathPdfFile");
                             out.write(stream.getBytes());
                             out.close();
    

    But the result is always the same: I can open pdf file but i can see white pages only. Does the mistake is around pdf stream and endstream charset encoding? Does pdf content between stream and endStream need to be manipulate in some others way?


    Hope this helps to avoid some misunderstanding about what i want to do:

    This is my login (works perfectly):

      public static void postForm(){
        String cookie="";
        try {
       System.out.println("POSTFORM ###################################");
         String postURL = "http://login.libero.it/logincheck.php";
        HttpPost post = new HttpPost(postURL);
            post.setHeader("User-Agent", "Chrome/14.0.835.202");
            post.setHeader("Referer","http://login.libero.it/?layout=m&service_id=m_mail&ret_url=http://m.mailbeta.libero.it/m/wmm/auth/check");
            if(cookieVector.size()>0){
               for(int i=0;i<cookieVector.size();i++){
                  cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
    
                 }
                  post.setHeader("Cookie",cookie);
    
            }
            //System.out.println("sequenza cookie post:"+cookie);
            List<NameValuePair> params = new ArrayList<NameValuePair>();
            params.add(new BasicNameValuePair("SERVICE_ID", "m_mail"));
            params.add(new BasicNameValuePair("LAYOUT", "m"));
            params.add(new BasicNameValuePair("DEVICE", ""));
            params.add(new  BasicNameValuePair("RET_URL","http://m.mailbeta.libero.it/m/wmm/auth/check"));
            params.add(new BasicNameValuePair("LOGINID", "secret"));
            params.add(new BasicNameValuePair("PASSWORD", "secret"));
            UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
            System.out.println("stringa urlPost:"+ent.toString());
            post.setEntity(ent);
            HttpResponse responsePOST = client.execute(post);
                    System.out.println("Response postForm: " +              responsePOST.getStatusLine());
            Header[] allHeaders = responsePOST.getAllHeaders();
    
        String location = "";
        for (Header header : allHeaders) {
            if("location".equalsIgnoreCase(header.getName())) location = header.getValue();
            responsePOST.addHeader(header.getName(), header.getValue());
        }
        cookieVector.clear();
        Header[] headerx=responsePOST.getHeaders("Set-Cookie");
        System.out.println("array header:"+headerx.length);
            for(int i=0;i<headerx.length;i++){
                 System.out.println("restituito cookie POST:"+headerx[i].getValue());
               cookieVector.add(headerx[i]);
               //System.out.println("cookie trovato POST:"+cookieVector.elementAt(i));
            }
            //System.out.println("inseriti"+cookieVector.size()+""+"elements");
            //HttpEntity resEntity = responsePOST.getEntity();
    
            // populate redirect information in response
             //CONTROLLO ESITO LOGIN
                         if(location.contains("https://login.libero.it/logincheck.php")){
                              loginError=1;
                         }
                     System.out.println("Redirecting to: " + location);
                     //EntityUtils.consume(resEntity);
                                     responsePOST.getEntity().consumeContent();
                     System.out.println("torno a GET:"+"url:"+location+"cookieVector size:"+cookieVector.size());
                     get(location,"http://login.libero.it/logincheck.php");
    
    
    
    
        }  catch (IOException ex) {
            Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
        }
    
    }
    

    Once logged i'm able to access to the file's link (pdf,image,doc, exc.). In this case we take for example a pdf file:

        public static void httpConnection(String url,String referer,String cookieAuth){
        try {
            String location="";
            String cookie="";
            HttpResponse response;
            HttpGet get;
            HttpEntity respEntity;
            Referer=referer;
            System.out.println("HTTPCONNECTION ################################");
            System.out.println("connessione a:"+url+"............");
    
            get = new HttpGet(url);
            if(referer.length()>0){
            //httpget.setHeader("Referer",referer );
    
            }
               if(attachmentURL.size()==0){
                get.setHeader("User-Agent", "Chrome/14.0.835.202");
               }else{
    
               get.setHeader("Accept-charset", "UTF-8");
    
                 get.setHeader("Content-type", "application/pdf");
               }
            if(cookieVector.size()>0){
                System.out.println("iserisco cookie da vector");
             for(int i=0;i<cookieVector.size();i++){
               cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
              }
             get.setHeader("Cookie", cookie);
            }else if(cookieAuth.length()>0){
                System.out.println("inserisco cookieAuth....");
                System.out.println("valore cookieSession:"+cookieAuth);
                get.setHeader("Cookie",cookieAuth.replace("Set-Cookie:", "")+";");
            }
    
            response = client.execute(get);
            cookieVector.clear();//reset cookie
    
    
            System.out.println("home get: " + response.getStatusLine());
    
    
            Header[] headery=response.getAllHeaders();
             for(int j=0;j<headery.length;j++){
                                System.out.println(headery[j].getName()+" "+" VALUE:"+" "+headery[j].getValue());
             }
            Header[] headerx=response.getHeaders("Set-Cookie");
            System.out.println("array header:"+headerx.length);
              System.out.print("httpconnection SERVER HEADERS ###############");
            for(int i=0;i<headerx.length;i++){
                 if("location".equalsIgnoreCase(headerx[i].getName())){
                     location = headerx[i].getValue();
                      //ResponseGET.addHeader(headerx[i].getName(), header.getValue());
                 }
    
            //System.out.println(headerx[i].getValue());
            cookieVector.add(headerx[i]);
            }
    
    
                  //STREAM CONTENT BODY
    
                    HttpEntity entity2=response.getEntity();
                    InputStream in=entity2.getContent(); <==THIS IS THE WAY I GET STREAM RESPONSE
    
    
                   if(attachmentURL.size()>0){
                       saveAttachment(in);//SAVE FILE <==
                   }else{
                    from(in,htmlpage);//Parse and grab: message title,subject,attachments. If attachments are found then come back here and execute the method saveAttachment.
                    in.close();
                   }
    
        } catch (IOException ex) {
            Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
        }
    
    }
    

    Method httpConnection works and i can download the file!!

    Server Response:

     Date  VALUE: Fri, 18 Nov 2011 13:09:46 GMT
     Server  VALUE: Apache/2.2.21 (Unix) mod_jk/1.2.23
      Set-Cookie  VALUE: MST_PVP=tiQZO3nbl9_5f_OQXtJP32YiqQx_5f_kSh6F6Io7r3xS;       Domain=m.libero.it; Path=/
      Content-Type  VALUE: application/octet-stream
      Expires  VALUE: Fri, 18 Nov 2011 15:09:46 GMT
      Transfer-Encoding  VALUE: chunked
    

    Example of response body:

     %PDF-1.7
    
     1 0 obj  % entry point
     <<
    /Type /Catalog
    /Pages 2 0 R
    

    > endobj

     2 0 obj
     <<
     /Type /Pages
     /MediaBox [ 0 0 200 200 ]
     /Count 1
     /Kids [ 3 0 R ]
     >>
      endobj
    
      3 0 obj
      <<
     /Type /Page
     /Parent 2 0 R
     /Resources <<
      /Font <<
      /F1 4 0 R 
    >>
    >>
    /Contents 5 0 R
    >>
    endobj
    
    4 0 obj
    <<
    /Type /Font
    /Subtype /Type1
    /BaseFont /Times-Roman
    >>
    endobj
    
    5 0 obj  % page content
    <<
     /Length 44
     >>
     stream
      BT
      70 50 TD
     /F1 12 Tf
     (Hello, world!) Tj
      ET
      endstream
      endobj
    
      xref
      0 6
     0000000000 65535 f 
     0000000010 00000 n 
     0000000079 00000 n 
     0000000173 00000 n 
     0000000301 00000 n 
    0000000380 00000 n 
    trailer
    <<
    /Size 6
    /Root 1 0 R
     >>
     startxref
     492
     %%EOF
    

    Now,let starts from here. Can you,please, tell me what i have to do to save the stream in a file?

    ########### SOLVED:

    To save a file locally from the Stream data, respecting the binary data nature, i did like this:

      public void saveFile(InputStream is){
    
       try {
            DataOutputStream out = new DataOutputStream(new  BufferedOutputStream(new FileOutputStream(new File("test.pdf"))));
            int c;
            while((c = is.read()) != -1) {
                out.writeByte(c);
            }
            out.close();
                        is.close();
        }catch(IOException e) {
            System.err.println("Error Writing/Reading Streams.");
        }
         }
    

    If you want a more efficent method you can use java.IOUtils and do like this:

       public void saveFile(InputStream is){
    
          OutputStream os=new FileOutputStream(new File("test.pdf"));        
          byte[] bytes = IOUtils.toByteArray(is);
          os.write(bytes);
          os.close();
    
        }
    
  • gd1
    gd1 over 12 years
    Reading byte by byte is crazy. However +1 because the overall intentions are good.
  • hurtledown
    hurtledown over 12 years
    you are right... this was developed for small files and to have a precise progressbar. Anyway believe it or not I recently compared it with the speed of downloading a file with nio, in which you just connect the two streams and it takes the same time...
  • gd1
    gd1 over 12 years
    Yeah, the network can be "crappier" than any unoptimized code we may write. Our machines get better whereas our networks get worse. You should try it with a 5 GB document on a 100Mbit LAN, and it will eventually make some difference...
  • Augusto Picciani
    Augusto Picciani over 12 years
    hurtledown, i have to log in to the webserver with a series of cookies first, and then i can download the file. Any suggestion?
  • gd1
    gd1 over 12 years
    You are programming an HTTP robot. It will take some effort, since there are no one-liner hints for it. For the cookies: download.oracle.com/javase/tutorial/networking/cookies/… To login, you probably need to POST the credentials on a login page. See: bytestrike.blogspot.com/2008/05/…
  • Augusto Picciani
    Augusto Picciani over 12 years
    I tried your example (i had already tried some days ago) but nada. In this case pdf file does not open.
  • Augusto Picciani
    Augusto Picciani over 12 years
    I have already made login part in my code and it works like a charms. But now i'm blocked here, on pdf stream problem. I can't go ahead. I'm asking you all why when i download a pdf it doesn't open. If in the downloaded pdf there's not an encoding stuff like "flatDecode" (or others) between "stream" and "endstream" i'm able to open and view the file. But if there's a kind of encoding i could not.
  • gd1
    gd1 over 12 years
    You shouldn't try copy and paste pieces of code written by other people hoping you've randomly found the correct combination. Once you have understood the problem (downloading a binary file, not a text one) you should use the examples AMONG WITH the Java documentation in order to find a solution that is both correct and tailored to your needs, but however written by you. Write some code line by line, debug each single line, and create for us a SSCE (sscce.org)
  • gd1
    gd1 over 12 years
    You are downloading text whereas you should download raw bytes. See: google.com/?q=difference+between+text+files+and+binary+files
  • gd1
    gd1 over 12 years
    So don't try the example you find on the Internet and the one provided by hurtledown INTO your program, but in an appropriate, separate test case, and really show us WHERE and HOW it fails. More, please learn about Java byte streams, because if you find acceptable to write a binary file line by line, then you'll have more and more problems even after you've succeeded in making your program work in some awkward way. Fix things up!
  • Augusto Picciani
    Augusto Picciani over 12 years
    Obviously, i adapted example code to my code, i'm not a super-dummy!!!.. :)) I had already tested other script using urlConnection library(like hartleMan example) to download other pdf files on others webservers that doesn't need a log in, and everythings was fine.(pdf was open successfully) I would try your and hartleman examples in a separate test case but i can't because to reproducing real test i need to first log into specific webserver and then downloading pdf file. But manage cookies with UrlConnection is so hard!
  • gd1
    gd1 over 12 years
    OK. So the problem cannot be solved here, because we have no idea on what your code can be messing up in the login part, and "I tried your example (i had already tried some days ago) but nada" is at least misleading, don't you think? You don't need S.O., you basically have to debug your code. If your login part does not work, then stop telling you've got a download PDF issue and concentrate on it. But you apparently have a download PDF issue, too, if you keep on thinking it's fine to use PrintWriter for them. :)
  • Augusto Picciani
    Augusto Picciani over 12 years
    gd1, can you take a look to a new answer i published?
  • gd1
    gd1 over 12 years
    Sure we'll do, it looks complete and understandable. Give me some time. It's also possible others will look at it and solve the issue. ;-)