Using JSoup to save the contents of this url: http://www.aw20.co.uk/images/logo.png to a file

10,281

Solution 1

You can use Jsoup to fetch any URL and get the data as bytes, if you don't want to parse it as HTML. E.g.:

byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();

ignoreContentType(true) is set because otherwise Jsoup will throw an exception that the content is not HTML parseable -- that's OK in this case because we're using bodyAsBytes() to get the response body, rather than parsing.

Check the Jsoup Connection API for more details.

Solution 2

Jsoup isn't designed for downloading the content of the url.

Since you are able to use a third party library, you can try apache common IO for downloading the content of a given URL to file using:

FileUtils.copyURLToFile(URL source, File destination);

It is only one line.

Solution 3

This method does not work well. Please careful when using it.

byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();

Solution 4

You can use these methods or part of these methods to solve your problem. NOTE: IMAGE_HOME is the absolute path. e.g. /home/yourname/foldername

public static String storeImageIntoFS(String imageUrl, String fileName, String relativePath) {
    String imagePath = null;
    try {
        byte[] bytes = Jsoup.connect(imageUrl).ignoreContentType(true).execute().bodyAsBytes();
        ByteBuffer buffer = ByteBuffer.wrap(bytes);
        String rootTargetDirectory = IMAGE_HOME + "/"+relativePath;
        imagePath = rootTargetDirectory + "/"+fileName;
        saveByteBufferImage(buffer, rootTargetDirectory, fileName);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return imagePath;
}

public static void saveByteBufferImage(ByteBuffer imageDataBytes, String rootTargetDirectory, String savedFileName) {
   String uploadInputFile = rootTargetDirectory + "/"+savedFileName;

   File rootTargetDir = new File(rootTargetDirectory);
   if (!rootTargetDir.exists()) {
       boolean created = rootTargetDir.mkdirs();
       if (!created) {
           System.out.println("Error while creating directory for location- "+rootTargetDirectory);
       }
   }
   String[] fileNameParts = savedFileName.split("\\.");
   String format = fileNameParts[fileNameParts.length-1];

   File file = new File(uploadInputFile);
   BufferedImage bufferedImage;

   InputStream in = new ByteArrayInputStream(imageDataBytes.array());
   try {
       bufferedImage = ImageIO.read(in);
       ImageIO.write(bufferedImage, format, file);
   } catch (IOException e) {
       e.printStackTrace();
   }

}

Share:
10,281
user1644544
Author by

user1644544

Updated on June 14, 2022

Comments

  • user1644544
    user1644544 almost 2 years

    I am try to use JSoup to get the contents of this url http://www.aw20.co.uk/images/logo.png, which is the image logo.png, and save it to a file. So far I have used JSoup to connect to http://www.aw20.co.uk and get a Document. I then went and found the absolute url for the image I am looking for, but now am not sure how to this to get the actual image. So I was hoping someone could point me in the right direction to do so? Also is there anyway I could use Jsoup.connect("http://www.aw20.co.uk/images/logo.png").get(); to get the image?

    import java.io.IOException;
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    
    
    public class JGet2 {
    
    public static void main(String[] args) {
    
        try {
            Document doc = Jsoup.connect("http://www.aw20.co.uk").get();
    
            Elements img = doc.getElementsByTag("img");
    
            for (Element element : img) {
                String src = element.absUrl("src");
    
                System.out.println("Image Found!");
                System.out.println("src attribute is: " + src);
                if (src.contains("logo.png") == true) {
                    System.out.println("Success");     
                }
                getImages(src);
            }
        } 
    
        catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    private static void getImages(String src) throws IOException {
    
        int indexName = src.lastIndexOf("/");
    
        if (indexName == src.length()) {
            src = src.substring(1, indexName);
        }
    
        indexName = src.lastIndexOf("/");
        String name = src.substring(indexName, src.length());
    
        System.out.println(name);
    }
    }
    
  • user1644544
    user1644544 over 11 years
    I am not allowed to use the URL class to do this, which is why I was trying to use JSoup in the first place.
  • Hovercraft Full Of Eels
    Hovercraft Full Of Eels over 11 years
    @user1644544: regarding, "I am not allowed to use the URL class to do this" -- what is the rationale for this crazy restriction? Why can't you use the most basic class to easily allow access to internet resources?
  • Reg
    Reg about 10 years
  • Lion
    Lion about 9 years
    As noted above, some binary files downloaded via this method might be corrupted.