Using JSoup to save the contents of this url: http://www.aw20.co.uk/images/logo.png to a file

java image jsoup

10,281

Solution 1

You can use Jsoup to fetch any URL and get the data as bytes, if you don't want to parse it as HTML. E.g.:

byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();

ignoreContentType(true) is set because otherwise Jsoup will throw an exception that the content is not HTML parseable -- that's OK in this case because we're using bodyAsBytes() to get the response body, rather than parsing.

Check the Jsoup Connection API for more details.

Solution 2

Jsoup isn't designed for downloading the content of the url.

Since you are able to use a third party library, you can try apache common IO for downloading the content of a given URL to file using:

FileUtils.copyURLToFile(URL source, File destination);

It is only one line.

Solution 3

This method does not work well. Please careful when using it.

byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();

Solution 4

You can use these methods or part of these methods to solve your problem. NOTE: IMAGE_HOME is the absolute path. e.g. /home/yourname/foldername

public static String storeImageIntoFS(String imageUrl, String fileName, String relativePath) {
    String imagePath = null;
    try {
        byte[] bytes = Jsoup.connect(imageUrl).ignoreContentType(true).execute().bodyAsBytes();
        ByteBuffer buffer = ByteBuffer.wrap(bytes);
        String rootTargetDirectory = IMAGE_HOME + "/"+relativePath;
        imagePath = rootTargetDirectory + "/"+fileName;
        saveByteBufferImage(buffer, rootTargetDirectory, fileName);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return imagePath;
}

public static void saveByteBufferImage(ByteBuffer imageDataBytes, String rootTargetDirectory, String savedFileName) {
   String uploadInputFile = rootTargetDirectory + "/"+savedFileName;

   File rootTargetDir = new File(rootTargetDirectory);
   if (!rootTargetDir.exists()) {
       boolean created = rootTargetDir.mkdirs();
       if (!created) {
           System.out.println("Error while creating directory for location- "+rootTargetDirectory);
       }
   }
   String[] fileNameParts = savedFileName.split("\\.");
   String format = fileNameParts[fileNameParts.length-1];

   File file = new File(uploadInputFile);
   BufferedImage bufferedImage;

   InputStream in = new ByteArrayInputStream(imageDataBytes.array());
   try {
       bufferedImage = ImageIO.read(in);
       ImageIO.write(bufferedImage, format, file);
   } catch (IOException e) {
       e.printStackTrace();
   }

}

View more solutions

10,281

Author by

user1644544

Updated on June 14, 2022

Comments

user1644544 almost 2 years

I am try to use JSoup to get the contents of this url http://www.aw20.co.uk/images/logo.png, which is the image logo.png, and save it to a file. So far I have used JSoup to connect to http://www.aw20.co.uk and get a Document. I then went and found the absolute url for the image I am looking for, but now am not sure how to this to get the actual image. So I was hoping someone could point me in the right direction to do so? Also is there anyway I could use Jsoup.connect("http://www.aw20.co.uk/images/logo.png").get(); to get the image?

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class JGet2 {

public static void main(String[] args) {

    try {
        Document doc = Jsoup.connect("http://www.aw20.co.uk").get();

        Elements img = doc.getElementsByTag("img");

        for (Element element : img) {
            String src = element.absUrl("src");

            System.out.println("Image Found!");
            System.out.println("src attribute is: " + src);
            if (src.contains("logo.png") == true) {
                System.out.println("Success");     
            }
            getImages(src);
        }
    } 

    catch (IOException e) {
        e.printStackTrace();
    }
}

private static void getImages(String src) throws IOException {

    int indexName = src.lastIndexOf("/");

    if (indexName == src.length()) {
        src = src.substring(1, indexName);
    }

    indexName = src.lastIndexOf("/");
    String name = src.substring(indexName, src.length());

    System.out.println(name);
}
}

user1644544 over 11 years

I am not allowed to use the URL class to do this, which is why I was trying to use JSoup in the first place.
Hovercraft Full Of Eels over 11 years

@user1644544: regarding, "I am not allowed to use the URL class to do this" -- what is the rationale for this crazy restriction? Why can't you use the most basic class to easily allow access to internet resources?
Reg about 10 years

Bad link... For apache commons commons.apache.org/proper/commons-io and for the and for the FileUtils commons.apache.org/proper/commons-io/apidocs/org/apache/comm‌ons/…
Lion about 9 years

As noted above, some binary files downloaded via this method might be corrupted.