Using JSoup to save the contents of this url: http://www.aw20.co.uk/images/logo.png to a file
Solution 1
You can use Jsoup to fetch any URL and get the data as bytes, if you don't want to parse it as HTML. E.g.:
byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();
ignoreContentType(true)
is set because otherwise Jsoup will throw an exception that the content is not HTML parseable -- that's OK in this case because we're using bodyAsBytes()
to get the response body, rather than parsing.
Check the Jsoup Connection API for more details.
Solution 2
Jsoup isn't designed for downloading the content of the url.
Since you are able to use a third party library, you can try apache common IO for downloading the content of a given URL to file using:
FileUtils.copyURLToFile(URL source, File destination);
It is only one line.
Solution 3
This method does not work well. Please careful when using it.
byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();
Solution 4
You can use these methods or part of these methods to solve your problem. NOTE: IMAGE_HOME is the absolute path. e.g. /home/yourname/foldername
public static String storeImageIntoFS(String imageUrl, String fileName, String relativePath) {
String imagePath = null;
try {
byte[] bytes = Jsoup.connect(imageUrl).ignoreContentType(true).execute().bodyAsBytes();
ByteBuffer buffer = ByteBuffer.wrap(bytes);
String rootTargetDirectory = IMAGE_HOME + "/"+relativePath;
imagePath = rootTargetDirectory + "/"+fileName;
saveByteBufferImage(buffer, rootTargetDirectory, fileName);
} catch (IOException e) {
e.printStackTrace();
}
return imagePath;
}
public static void saveByteBufferImage(ByteBuffer imageDataBytes, String rootTargetDirectory, String savedFileName) {
String uploadInputFile = rootTargetDirectory + "/"+savedFileName;
File rootTargetDir = new File(rootTargetDirectory);
if (!rootTargetDir.exists()) {
boolean created = rootTargetDir.mkdirs();
if (!created) {
System.out.println("Error while creating directory for location- "+rootTargetDirectory);
}
}
String[] fileNameParts = savedFileName.split("\\.");
String format = fileNameParts[fileNameParts.length-1];
File file = new File(uploadInputFile);
BufferedImage bufferedImage;
InputStream in = new ByteArrayInputStream(imageDataBytes.array());
try {
bufferedImage = ImageIO.read(in);
ImageIO.write(bufferedImage, format, file);
} catch (IOException e) {
e.printStackTrace();
}
}
user1644544
Updated on June 14, 2022Comments
-
user1644544 almost 2 years
I am try to use JSoup to get the contents of this url http://www.aw20.co.uk/images/logo.png, which is the image logo.png, and save it to a file. So far I have used JSoup to connect to http://www.aw20.co.uk and get a Document. I then went and found the absolute url for the image I am looking for, but now am not sure how to this to get the actual image. So I was hoping someone could point me in the right direction to do so? Also is there anyway I could use Jsoup.connect("http://www.aw20.co.uk/images/logo.png").get(); to get the image?
import java.io.IOException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class JGet2 { public static void main(String[] args) { try { Document doc = Jsoup.connect("http://www.aw20.co.uk").get(); Elements img = doc.getElementsByTag("img"); for (Element element : img) { String src = element.absUrl("src"); System.out.println("Image Found!"); System.out.println("src attribute is: " + src); if (src.contains("logo.png") == true) { System.out.println("Success"); } getImages(src); } } catch (IOException e) { e.printStackTrace(); } } private static void getImages(String src) throws IOException { int indexName = src.lastIndexOf("/"); if (indexName == src.length()) { src = src.substring(1, indexName); } indexName = src.lastIndexOf("/"); String name = src.substring(indexName, src.length()); System.out.println(name); } }
-
user1644544 over 11 yearsI am not allowed to use the URL class to do this, which is why I was trying to use JSoup in the first place.
-
Hovercraft Full Of Eels over 11 years@user1644544: regarding,
"I am not allowed to use the URL class to do this"
-- what is the rationale for this crazy restriction? Why can't you use the most basic class to easily allow access to internet resources? -
Reg about 10 yearsBad link... For apache commons commons.apache.org/proper/commons-io and for the and for the FileUtils commons.apache.org/proper/commons-io/apidocs/org/apache/commons/…
-
Lion about 9 yearsAs noted above, some binary files downloaded via this method might be corrupted.