Read url to string in few lines of java code

219,395

Solution 1

Now that some time has passed since the original answer was accepted, there's a better approach:

String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();

If you want a slightly fuller implementation, which is not a single line, do this:

public static String readStringFromURL(String requestURL) throws IOException
{
    try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
            StandardCharsets.UTF_8.toString()))
    {
        scanner.useDelimiter("\\A");
        return scanner.hasNext() ? scanner.next() : "";
    }
}

Solution 2

This answer refers to an older version of Java. You may want to look at ccleve's answer.


Here is the traditional way to do this:

import java.net.*;
import java.io.*;

public class URLConnectionReader {
    public static String getText(String url) throws Exception {
        URL website = new URL(url);
        URLConnection connection = website.openConnection();
        BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                    connection.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;

        while ((inputLine = in.readLine()) != null) 
            response.append(inputLine);

        in.close();

        return response.toString();
    }

    public static void main(String[] args) throws Exception {
        String content = URLConnectionReader.getText(args[0]);
        System.out.println(content);
    }
}

As @extraneon has suggested, ioutils allows you to do this in a very eloquent way that's still in the Java spirit:

 InputStream in = new URL( "http://jakarta.apache.org" ).openStream();

 try {
   System.out.println( IOUtils.toString( in ) );
 } finally {
   IOUtils.closeQuietly(in);
 }

Solution 3

Or just use Apache Commons IOUtils.toString(URL url), or the variant that also accepts an encoding parameter.

Solution 4

There's an even better way as of Java 9:

URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
    return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}

Like the original groovy example, this assumes that the content is UTF-8 encoded. (If you need something more clever than that, you need to create a URLConnection and use it to figure out the encoding.)

Solution 5

Now that more time has passed, here's a way to do it in Java 8:

URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
    pageText = reader.lines().collect(Collectors.joining("\n"));
}
Share:
219,395

Related videos on Youtube

Pomponius
Author by

Pomponius

Updated on February 23, 2021

Comments

  • Pomponius
    Pomponius about 3 years

    I'm trying to find Java's equivalent to Groovy's:

    String content = "http://www.google.com".toURL().getText();
    

    I want to read content from a URL into string. I don't want to pollute my code with buffered streams and loops for such a simple task. I looked into apache's HttpClient but I also don't see a one or two line implementation.

    • Jonathan B
      Jonathan B over 13 years
      Why not just create a utility class that encapsulates all that "polluted" buffered streams and loops? You could also use that class to handle things like the socket closing before the stream completes and to handle I/O blocks over a slow connection. After all, this is OO - encapsulate the functionality and hide it from your main class.
    • matbrgz
      matbrgz over 13 years
      It cannot be done in one or two lines.
    • StevenWernerCS
      StevenWernerCS almost 3 years
      see ZhekaKozlov 3 line answer, tested and no external dependencies
  • Goran Jovic
    Goran Jovic over 13 years
    You could rename the main method to, say getText, pass URL string as a parameter and have a one-liner: String content = URLConnectionReader.getText("http://www.yahoo.com/");
  • Marcelo
    Marcelo over 11 years
    Just don't forget you need to call Scanner#close() later.
  • M.C.
    M.C. about 11 years
    if the compiler gives a leak warning you should split the statement as here stackoverflow.com/questions/11463327/…
  • Rune
    Rune about 11 years
    The regular expression \\A matches the beginning of input. This tells Scanner to tokenize the entire stream, from beginning to (illogical) next beginning.
  • gMale
    gMale almost 11 years
    +1 Thanks, this worked perfectly. One line of code AND it closes the stream! Note that IOUtils.toString(URL) is deprecated. IOUtils.toString(URL url, String encoding) is preferred.
  • Benoît Guédas
    Benoît Guédas over 10 years
    The string will not contain any line-termination character (because of the use of BufferReader.readLine() which remove them), so it will not be exactly the content of the URL.
  • NateS
    NateS about 10 years
    Neat, but fails if the webpage returns no content (""). You need String result = scanner.hasNext() ? scanner.next() : ""; to handle that.
  • Matthias Ronge
    Matthias Ronge over 9 years
    Isn’t it necessary to close all of the resources properly? String s(URL u)throws IOException{HttpURLConnection c=null;InputStream i=null;Scanner s=null;try{c=(HttpURLConnection) u.openConnection();i=c.getInputStream();s=new Scanner(i,"UTF-8").useDelimiter("\\A");return s.hasNext()?s.next():"";}finally{if(s!=null)s.close();if(i!=‌​null)try{i.close();}‌​catch(IOException e){}if(c != null)c.disconnect();}} Perhaps you also want to set some timeouts: c.setConnectTimeout(5000);c.setReadTimeout(25000);
  • franckysnow
    franckysnow over 9 years
    IOUtils.toString(url, (Charset) null) to reach similar result.
  • gaal
    gaal almost 9 years
    Guava docs says link: Note that even though these methods use {@link URL} parameters, they are usually not appropriate for HTTP or other non-classpath resources
  • Ortomala Lokni
    Ortomala Lokni about 8 years
    When using this example on the http://www.worldcat.org/webservices/catalog/search/opensearc‌​h webservice, I'm getting only the first two lines of xml.
  • Ortomala Lokni
    Ortomala Lokni about 8 years
    The 400 error is because you need a key to use this webservice. The problem is that this webservice send a bit of xml then take several seconds to do some processing and then send the second part of the xml. The InputStream is closed during the interval and not all content is consumed. I've solved the problem using the http component apache library hc.apache.org/httpcomponents-client-ga
  • Erik Humphrey
    Erik Humphrey almost 7 years
    @Marcelo What do you mean like this? Seems you would have to split it into multiple statements to close an unassigned value.
  • kiedysktos
    kiedysktos almost 7 years
    @ccleve it would be useful to add imports here, there are multiple Scanners and URLs in Java
  • user1788736
    user1788736 over 6 years
    @Benoit Guedas so how to keep the line breaks ?
  • Jeffrey Blattman
    Jeffrey Blattman over 6 years
    One line of code, and tens of megabytes of extraneous class files that are now in your runtime. Including a gigantic library to avoid writing a few (actually, one) line of code is not a great decision.
  • Imaskar
    Imaskar over 6 years
    @ccleve can you update the link "This explains the \\A:"?
  • big data nerd
    big data nerd about 6 years
    @JeffreyBlattman if you are using it only once in your application it's probably not such a smart decission, but if you are using it more frequently and other things from the commons-io package then it might be a smart decission again. It also dependens on the application you are writing. If it's a mobile or desktop ap you might think twice about bloating the memory footprint with additional libraries. If it's a server application running on 64 GB RAM machine, then just ignore this 10 MB - memory is cheap nowadays and whether de basic footprint is 1,5% or 2% of your total memory doesn't matter
  • Shihe Zhang
    Shihe Zhang almost 6 years
    the link is dead
  • Jon Chase
    Jon Chase almost 6 years
    This answer is dangerous, as Scanner will swallow IOExceptions produced by the underlying stream and you will not get the full content of the resource. In fact, you must call scanner.ioException() to see if there was an exception. From Scanner's docs: " If an invocation of the underlying readable's Readable.read(java.nio.CharBuffer) method throws an IOException then the scanner assumes that the end of the input has been reached. The most recent IOException thrown by the underlying readable can be retrieved via the ioException() method." This has bitten us before.
  • Brad Parks
    Brad Parks over 5 years
    This works with redirects too, for what it's worth ;-)
  • Crystalzord
    Crystalzord over 5 years
    Note: According to the documentation The try-with-resources statement ensures that each resource is closed at the end of the statement.
  • rjh
    rjh almost 4 years
    Thanks, this was exactly what I was looking for. It can also be used with getClass().getResourceAsStream(...) to open text files inside the jar.
  • gouessej
    gouessej almost 4 years
    I use this source code in a CORS proxy, URLConnection allows to get the content encoding, it's helpful. @OrtomalaLokni I have a similar problem when I try to download a web page whereas it works when it points to a file available online (an RSS file for example). Thank you for the suggestion. I won't probably use this library but it might be a good source of inspiration to solve my problem as it's open source.
  • Bostone
    Bostone over 3 years
    Nice but if you need to add a header this will not do
  • Sean Reilly
    Sean Reilly over 3 years
    @Bostone true, but the same thing is true for the original groovy example in the question.
  • Gael
    Gael over 3 years
    I liked that solution... until I realised it doesn't follow redirection :(
  • Daniel Henao
    Daniel Henao about 3 years
    In terms of performance, is this the best option? or wich one do you think it is?