download all Images of a Website

12,316

Take a gander at How can I use HTML Agility Pack to retrieve all the images from a website?

This uses a library called HTML Agility Pack to download all <img src="" \> lines on a website. How can I use HTML Agility Pack to retrieve all the images from a website?

If that topic somehow disappears, I'm putting this up for those who need it but can't reach that topic.

// Creating a list array
public List<string> ImageList; 
public void GetAllImages()
{
    // Declaring 'x' as a new WebClient() method
    WebClient x = new WebClient();

    // Setting the URL, then downloading the data from the URL.
    string source = x.DownloadString(@"http://www.google.com");

    // Declaring 'document' as new HtmlAgilityPack() method
    HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();

    // Loading document's source via HtmlAgilityPack
    document.LoadHtml(source);

    // For every tag in the HTML containing the node img.
    foreach(var link in document.DocumentNode.Descendants("img")
                                .Select(i => i.Attributes["src"])) 
    {
        // Storing all links found in an array.
        // You can declare this however you want.
        ImageList.Add(link.Attribute["src"].Value.ToString());
    }
}

Since you are rather new as you stated, you can add HTML Agility Pack easily with NuGet. To add it, you right-click on your project, click Manage NuGet Packages, search the Online tab on the left hand side for HTML Agility Pack and click install. You need to call it by using using HtmlAgilityPack;

After all that you should be fine creating and using a method already created to download all items contained in the image_list array created above.

Good luck!

EDIT: Added comments explaining what each section does.

EDIT2: Updated snippet to reflect user comment.

Share:
12,316
Erwin Schrödinger
Author by

Erwin Schrödinger

Updated on June 13, 2022

Comments

  • Erwin Schrödinger
    Erwin Schrödinger almost 2 years

    So I just started learning C# last night. The first project I started was a simple Image-Downloader, which downloads all images of a website using HtmlElementCollection.

    Here's what I got so far:

        private void dl_Click(object sender, EventArgs e)
        {
            System.Net.WebClient wClient = new System.Net.WebClient();
    
            HtmlElementCollection hecImages = Browser.Document.GetElementsByTagName("img");
    
            for (int i = 0; i < hecImages.Count - 1; i++)
            {
    
                char[] ftype = new char[4];
                string gtype;
    
                try
                {
                    //filetype
                    hecImages[i].GetAttribute("src").CopyTo(hecImages[i].GetAttribute("src").Length -4,ftype,0,4) ;
                    gtype = new string(ftype);
    
                    //copy image to local path
                    wClient.DownloadFile(hecImages[i].GetAttribute("src"), absPath + i.ToString() + gtype);                                                                               
                }
                catch (System.Net.WebException) 
                {
                    expand_Exception_Log();
                    System.Threading.Thread.Sleep(50);
                }
    

    Basically it's rendering the page in advance and looking for the images. This works pretty well, but for some reason it only downloads the Thumbnails, but not the full (high-res) image.

    Additional Sources:

    Documentation on WebClient.DownloadFile: http://msdn.microsoft.com/en-us/library/ez801hhe(v=vs.110).aspx

    The DownloadFile method downloads to a local file data from the URI specified by in the address parameter.

  • Erwin Schrödinger
    Erwin Schrödinger over 9 years
    This basically stores the URI in image_links[] instead of the whole img context 'src=URI ...' in hecImages[], right? I'll try if this package gives me a better result. Thanks already!
  • Brandon Palmer
    Brandon Palmer over 9 years
    Well, using that image_links[] array you can use a simple foreach(string uri in image_links) method in another function to download all the images.
  • Erwin Schrödinger
    Erwin Schrödinger over 9 years
    My edit got rejected, so here's what I had to do to get it to work: (1) document.LoadHtml instead of document.Load, as it can't resolve the path (ArgumentException). (2) DocumentElement doesn't exist anymore in the current version of HAP. Instead you have to use DocumentNodes. (3) link["src"] -[]-indexes are not allowed on a expression of type HAP.HtmlNode. Instead you have to call .Attribute: link.Attribute["src"]. To return a string you need to call link.Attribute["src"].Value.
  • mrogunlana
    mrogunlana over 6 years
    the above code is broke, the variable "x" is already used in a previous context. update .Select(i => i.Attributes["src"])) will compile.