Automate daily csv file download from website button click

16,804

Your button most likely issues a POST request to the server. In order to track it:

  1. Open Network tab in Chrome developer tools
  2. Navigate to the page and hit the button.
  3. Notice which request led to file download. Right click on it and copy as cURL
  4. Run copied cURL

Once you have cURL working you can schedule downloads using cron or Task Scheduler depending on operation system you are using.

Share:
16,804
user
Author by

user

Updated on June 04, 2022

Comments

  • user
    user almost 2 years

    I would like to automate the process of visiting a website, clicking a button, and saving the file. The only way to download the file on this site is to click a button. You can't navigate to the file using a url.

    I have been trying to use phantomjs and casperjs to automate this process, but haven't had any success.

    I recently tried to use brandon's solution here Grab the resource contents in CasperJS or PhantomJS

    Here is my code for that

    var fs = require('fs');
    var cache = require('./cache');
    var mimetype = require('./mimetype');
    var casper = require('casper').create();
    
    casper.start('http://www.example.com/page_with_download_button', function() {
    
    });
    
    casper.then(function() {    
         this.click('#download_button');
     });
    
     casper.on('resource.received', function (resource) {
         "use strict";
        for(i=0;i < resource.headers.length; i++){
            if(resource.headers[i]["name"] == "Content-Type" && resource.headers[i]["value"] == "text/csv; charset-UTF-8;"){
                cache.includeResource(resource);
            }
        }
     });
    
     casper.on('load.finished', function(status) {
        for(i=0; i< cache.cachedResources.length; i++){
            var file = cache.cachedResources[i].cacheFileNoPath;
            var ext = mimetype.ext[cache.cachedResources[index].mimetype];
            var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
            fs.write('downloads/'+finalFile,cache.cachedResources[i].getContents(),'b');
        }
    });
    
    casper.run();
    

    I think the problem could be caused by my cachePath being incorrect in cache.js

    exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';
    

    Should I be using something in adition to the backslashes to define the path?

    When I try

     casperjs --disk-cache=true export_script.js
    

    Nothing is downloaded. After a little debugging I have found that cache.cachedResources is always empty.

    I would also be open to solutions outside of phantomjs/casperjs.


    UPDATE

    I am not longer trying to accomplish this with CasperJS/PhantomJS. I am using the chrome extension Tampermonkey suggested by dandavis. Tampermonkey was extremely easy to figure out. I installed Tampermonkey, navigated to the page with the download link, and then clicked New Script under tampermonkey and added my javascript code.

    document.getElementById("download_button").click();
    

    Now every time I navigate to the page in my browser, the file is downloaded. I then created a batch script that looks like this

    set date=%DATE:~10,4%_%DATE:~4,2%_%DATE:~7,2%
    chrome "http://www.example.com/page-with-dl-button"
    timeout 10
    move "C:\Users\user\Downloads\export.csv" "C:\path\to\dir\export_%date%.csv"
    

    I set that batch script to run nightly using the windows task scheduler.

    Success!