Read url to string in few lines of java code
Solution 1
Now that some time has passed since the original answer was accepted, there's a better approach:
String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();
If you want a slightly fuller implementation, which is not a single line, do this:
public static String readStringFromURL(String requestURL) throws IOException
{
try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
StandardCharsets.UTF_8.toString()))
{
scanner.useDelimiter("\\A");
return scanner.hasNext() ? scanner.next() : "";
}
}
Solution 2
This answer refers to an older version of Java. You may want to look at ccleve's answer.
Here is the traditional way to do this:
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static String getText(String url) throws Exception {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
return response.toString();
}
public static void main(String[] args) throws Exception {
String content = URLConnectionReader.getText(args[0]);
System.out.println(content);
}
}
As @extraneon has suggested, ioutils allows you to do this in a very eloquent way that's still in the Java spirit:
InputStream in = new URL( "http://jakarta.apache.org" ).openStream();
try {
System.out.println( IOUtils.toString( in ) );
} finally {
IOUtils.closeQuietly(in);
}
Solution 3
Or just use Apache Commons IOUtils.toString(URL url)
, or the variant that also accepts an encoding parameter.
Solution 4
There's an even better way as of Java 9:
URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}
Like the original groovy example, this assumes that the content is UTF-8 encoded. (If you need something more clever than that, you need to create a URLConnection and use it to figure out the encoding.)
Solution 5
Now that more time has passed, here's a way to do it in Java 8:
URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
pageText = reader.lines().collect(Collectors.joining("\n"));
}
Related videos on Youtube
Pomponius
Updated on February 23, 2021Comments
-
Pomponius about 3 years
I'm trying to find Java's equivalent to Groovy's:
String content = "http://www.google.com".toURL().getText();
I want to read content from a URL into string. I don't want to pollute my code with buffered streams and loops for such a simple task. I looked into apache's HttpClient but I also don't see a one or two line implementation.
-
Jonathan B over 13 yearsWhy not just create a utility class that encapsulates all that "polluted" buffered streams and loops? You could also use that class to handle things like the socket closing before the stream completes and to handle I/O blocks over a slow connection. After all, this is OO - encapsulate the functionality and hide it from your main class.
-
matbrgz over 13 yearsIt cannot be done in one or two lines.
-
StevenWernerCS almost 3 yearssee ZhekaKozlov 3 line answer, tested and no external dependencies
-
-
Goran Jovic over 13 yearsYou could rename the main method to, say
getText
, pass URL string as a parameter and have a one-liner:String content = URLConnectionReader.getText("http://www.yahoo.com/");
-
Marcelo over 11 yearsJust don't forget you need to call
Scanner#close()
later. -
M.C. about 11 yearsif the compiler gives a leak warning you should split the statement as here stackoverflow.com/questions/11463327/…
-
Rune about 11 yearsThe regular expression \\A matches the beginning of input. This tells Scanner to tokenize the entire stream, from beginning to (illogical) next beginning.
-
gMale almost 11 years+1 Thanks, this worked perfectly. One line of code AND it closes the stream! Note that
IOUtils.toString(URL)
is deprecated.IOUtils.toString(URL url, String encoding)
is preferred. -
Benoît Guédas over 10 yearsThe string will not contain any line-termination character (because of the use of BufferReader.readLine() which remove them), so it will not be exactly the content of the URL.
-
NateS about 10 yearsNeat, but fails if the webpage returns no content (""). You need
String result = scanner.hasNext() ? scanner.next() : "";
to handle that. -
Matthias Ronge over 9 yearsIsn’t it necessary to close all of the resources properly?
String s(URL u)throws IOException{HttpURLConnection c=null;InputStream i=null;Scanner s=null;try{c=(HttpURLConnection) u.openConnection();i=c.getInputStream();s=new Scanner(i,"UTF-8").useDelimiter("\\A");return s.hasNext()?s.next():"";}finally{if(s!=null)s.close();if(i!=null)try{i.close();}catch(IOException e){}if(c != null)c.disconnect();}}
Perhaps you also want to set some timeouts:c.setConnectTimeout(5000);c.setReadTimeout(25000);
-
franckysnow over 9 years
IOUtils.toString(url, (Charset) null)
to reach similar result. -
gaal almost 9 yearsGuava docs says link: Note that even though these methods use {@link URL} parameters, they are usually not appropriate for HTTP or other non-classpath resources
-
Ortomala Lokni about 8 yearsWhen using this example on the
http://www.worldcat.org/webservices/catalog/search/opensearch
webservice, I'm getting only the first two lines of xml. -
Ortomala Lokni about 8 yearsThe 400 error is because you need a key to use this webservice. The problem is that this webservice send a bit of xml then take several seconds to do some processing and then send the second part of the xml. The InputStream is closed during the interval and not all content is consumed. I've solved the problem using the http component apache library hc.apache.org/httpcomponents-client-ga
-
Erik Humphrey almost 7 years@Marcelo What do you mean like this? Seems you would have to split it into multiple statements to close an unassigned value.
-
kiedysktos almost 7 years@ccleve it would be useful to add imports here, there are multiple Scanners and URLs in Java
-
user1788736 over 6 years@Benoit Guedas so how to keep the line breaks ?
-
Jeffrey Blattman over 6 yearsOne line of code, and tens of megabytes of extraneous class files that are now in your runtime. Including a gigantic library to avoid writing a few (actually, one) line of code is not a great decision.
-
Imaskar over 6 years@ccleve can you update the link "This explains the \\A:"?
-
big data nerd about 6 years@JeffreyBlattman if you are using it only once in your application it's probably not such a smart decission, but if you are using it more frequently and other things from the commons-io package then it might be a smart decission again. It also dependens on the application you are writing. If it's a mobile or desktop ap you might think twice about bloating the memory footprint with additional libraries. If it's a server application running on 64 GB RAM machine, then just ignore this 10 MB - memory is cheap nowadays and whether de basic footprint is 1,5% or 2% of your total memory doesn't matter
-
Shihe Zhang almost 6 yearsthe link is dead
-
Jon Chase almost 6 yearsThis answer is dangerous, as Scanner will swallow IOExceptions produced by the underlying stream and you will not get the full content of the resource. In fact, you must call
scanner.ioException()
to see if there was an exception. From Scanner's docs: " If an invocation of the underlying readable's Readable.read(java.nio.CharBuffer) method throws an IOException then the scanner assumes that the end of the input has been reached. The most recent IOException thrown by the underlying readable can be retrieved via the ioException() method." This has bitten us before. -
Brad Parks over 5 yearsThis works with redirects too, for what it's worth ;-)
-
Crystalzord over 5 yearsNote: According to the documentation The try-with-resources statement ensures that each resource is closed at the end of the statement.
-
rjh almost 4 yearsThanks, this was exactly what I was looking for. It can also be used with
getClass().getResourceAsStream(...)
to open text files inside the jar. -
gouessej almost 4 yearsI use this source code in a CORS proxy, URLConnection allows to get the content encoding, it's helpful. @OrtomalaLokni I have a similar problem when I try to download a web page whereas it works when it points to a file available online (an RSS file for example). Thank you for the suggestion. I won't probably use this library but it might be a good source of inspiration to solve my problem as it's open source.
-
Bostone over 3 yearsNice but if you need to add a header this will not do
-
Sean Reilly over 3 years@Bostone true, but the same thing is true for the original groovy example in the question.
-
Gael over 3 yearsI liked that solution... until I realised it doesn't follow redirection :(
-
Daniel Henao about 3 yearsIn terms of performance, is this the best option? or wich one do you think it is?