Download a file via a proxy java

10,941

Solution 1

It is possible to use the library Apache httpclient that solves most of the issue with proxies. To compile the code below, you can use the following maven:

Maven:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>stackoverflow.test</groupId>
  <artifactId>proxyhttp</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>proxy</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.httpcomponents</groupId>
      <artifactId>httpclient</artifactId>
      <version>4.5.1</version>
    </dependency>
  </dependencies>
</project>

Java code:

import org.apache.http.HttpHost;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

/**
 * How to send a request via proxy.
 *
 * @since 4.0
 */
public class ClientExecuteProxy {

    public static void main(String[] args)throws Exception {
        CloseableHttpClient httpclient = HttpClients.createDefault();
        try {
            HttpHost target = new HttpHost("www.google.com", 80, "http");
            HttpHost proxy = new HttpHost("127.0.0.1", 8889, "http");

            RequestConfig config = RequestConfig.custom()
                    .setProxy(proxy)
                    .build();
            HttpGet request = new HttpGet("/");
            request.setConfig(config);

            System.out.println("Executing request " + request.getRequestLine() + " to " + target + " via " + proxy);

            CloseableHttpResponse response = httpclient.execute(target, request);
            try {
                System.out.println("----------------------------------------");
                System.out.println(response.getStatusLine());
                System.out.println(EntityUtils.toString(response.getEntity()));
            } finally {
                response.close();
            }
        } finally {
            httpclient.close();
        }
    }

}

Solution 2

The following is different from the other answers and works for me: set these properties before the connection:

            System.getProperties().put("http.proxySet", "true");
            System.getProperties().put("http.proxyHost", "my.proxy.com");
            System.getProperties().put("http.proxyPort", "8080"); //port is String, not int

Then, open the URLConnection and try to download the file.

Solution 3

To set a proxy programmatically:

SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
URL url = new URL("http://my.real.url.com/");
URLConnection conn = url.openConnection(proxy);

Then you can use your code above with the URLConnection returned on the last line. You can also use a SOCKS proxy, or force no proxy, if you so desire.

This was taken (and slightly edited) from this Oracle documentation.

Solution 4

Another approach is to implement the proxy "inside" each instance of httpUrlConnection. That is:

  1. Do not connect to the real URL you want. First, connect to the proxy IP and port, but with the http GET method refering to the URL you want.
  2. Use the setRequestProperty to set the host to your URL's and any other header you may need.

If it works, the connection will transparently send the file to you.

I have some code that worked with Sockets.

try {
    Socket sock = new Socket("10.0.241.1", 3128); //proxy IP and port
    InputStream is = sock.getInputStream();
    OutputStream os = sock.getOutputStream();
    String str = "GET http://www.uol.com.br HTTP/1.1\r\n"; //GET your site
    str += "Host: www.uol.com.br\r\n"; //again, Host of your site
    str += "Proxy-Authorization: Basic ZWR1YXJkby5wb2NvOmM1NmQyMw==\r\n"; //if password is needed
    str += "\r\n";
    os.write(str.getBytes());
    byte[] bb = new byte[1024];
    int L = 0;
    while ((L = is.read(bb)) != -1) {
        //write bytes to file stream...
    }
} catch (Exception ex) {
    //exception handling...
}

"Why would somebody use pure sockets when one could use httpUrlConnection?", you say. Well, by that time, I didn't know about httpUrlConnection.

Share:
10,941
Exagon
Author by

Exagon

Updated on June 08, 2022

Comments

  • Exagon
    Exagon almost 2 years

    i have a problem downloading a file from a url like www.example.com/example.pdf via a proxy and saving it on the filesystem in java. Does anybody have an Idea on how this could work? if I get the InputStream i can simply save it to filesystem with this:

    final ReadableByteChannel rbc = Channels.newChannel(httpUrlConnetion.getInputStream());    
    final FileOutputStream fos = new FileOutputStream(file);
    fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
    fos.close();
    

    but how to get the inputstream of the a url via a prox? if i am doing it like this:

    SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080);
    Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
    URL url = new URL("http://my.real.url.com/");
    URLConnection conn = url.openConnection(proxy);
    

    i am getting this exception:

    java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
        at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
        at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
        at app.model.mail.crawler.newimpl.FileLoader.getSourceOfSiteViaProxy(FileLoader.java:167)
        at app.model.mail.crawler.newimpl.FileLoader.process(FileLoader.java:220)
        at app.model.mail.crawler.newimpl.FileLoader.run(FileLoader.java:57)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
    

    using this:

    final HttpURLConnection httpUrlConnetion = (HttpURLConnection) website.openConnection(proxy);
    httpUrlConnetion.setDoOutput(true);
    httpUrlConnetion.setDoInput(true);
    httpUrlConnetion.setRequestProperty("Content-type", "text/xml");
    httpUrlConnetion.setRequestProperty("Accept", "text/xml, application/xml");
    httpUrlConnetion.setRequestMethod("POST");
    httpUrlConnetion.connect();
    

    i am able to download the source of a site which is html, but not a file maybe someone could help me with the properties i have to set for downloading a file.

  • Exagon
    Exagon over 8 years
    if i am doing it like this i am getting an Exception see my question again i will edit it
  • Eric Galluzzo
    Eric Galluzzo over 8 years
    Unfortunately it's difficult to tell why the connection would be reset in your case. Have you tried accessing the URL in a browser, with the same proxy settings, and ensured that it works there? Are you using the right type of proxy (SOCKS vs. HTTP)?
  • Exagon
    Exagon over 8 years
    i am using a SOCKS yes i did and it worked... i tried on a lot of other sites now but never worked
  • Eric Galluzzo
    Eric Galluzzo over 8 years
    Did you change the Proxy.Type.HTTP in the code to Proxy.Type.SOCKS? You might try both just in case.
  • Eric Galluzzo
    Eric Galluzzo over 8 years
    Hmmm, I'm not sure then. I assume you've verified your proxy host and port in your code. Other than that, I'm not sure what to suggest. :(
  • Exagon
    Exagon over 8 years
    i am using diferent proxys in different threads so this wont work
  • Exagon
    Exagon over 8 years
    I am getting a HTTP response code: 411, a read timeout or a connect timed out ... any ideas?
  • Marco Altieri
    Marco Altieri over 8 years
    @Exagon I have updated the code because last time I used a code that I wrote for an old version using classes that have been all deprecated. I retested the code using fiddler2 as a proxy. It worked fine. If you get a timeout it is probably a "netwrorking" issue.
  • Marco Altieri
    Marco Altieri over 8 years
    By the way, the example is just a "copy and paste" of: hc.apache.org/httpcomponents-client-ga/httpclient/examples/o‌​rg/…
  • Exagon
    Exagon over 8 years
    sorry but i am getting an error at request.setConfig(config); "The method setConfig(RequestConfig) is undefined for the type HttpGet"
  • Exagon
    Exagon over 8 years
    could you show how to do this with all the propertys with some code?
  • Marco Altieri
    Marco Altieri over 8 years
    @exagon What version of the library are you using ? If you do not want to use maven, you can download the version that I used from: central.maven.org/maven2/org/apache/httpcomponents/httpclien‌​t/…
  • Exagon
    Exagon over 8 years
    the newest 4.5.1 my IDE is Eclipse Mars and I am using Java 8.65
  • Marco Altieri
    Marco Altieri over 8 years
    mmm I see... I am not on JDK 8. Let me check
  • Marco Altieri
    Marco Altieri over 8 years
    It worked for me on JDK 8. Is your error at runtime or compile time?
  • Exagon
    Exagon over 8 years
    its a compile time error ... i dont know why ... also appears when I create a new project and just add the library and this class
  • Marco Altieri
    Marco Altieri over 8 years
    HttpGet has the method setConfig since the beginning. As I said, the example is from the apache httpclient site so it has to work.
  • Eduardo Poço
    Eduardo Poço over 8 years
    Edited in the answer. This implementation is from the time when I didn't know about httpUrlConnection, so used sockets. Did the edit in a hurry, I think you can figure out the equivalent operations on a httpUrlConnection. If you need, I'll edit it again to fit a httpUrlConnection.