Batch script get html site and parse content (without wget, curl or other external app)

13,224

I've only ever used wget to fetch web content from a Windows batch script. Using an XHR via JScript was a fantastic idea!

But the script you're trying to plunder appears to be intended for checking whether a web server is responding, not for fetching content.

With some modifications, you can use it to fetch a web page and do whatever processing you need.

@if (@a==@b) @end /*

:: fetch.bat <url>
:: fetch a web page

@echo off
setlocal
if "%~1"=="" goto usage
echo "%~1" | findstr /i "https*://" >NUL || goto usage

set "URL=%~1"
for /f "delims=" %%I in ('cscript /nologo /e:jscript "%~f0" "%URL%"') do (
    rem process the HTML line-by-line
    echo(%%I
)
goto :EOF

:usage
echo Usage: %~nx0 URL
echo     for example: %~nx0 http://www.google.com/
echo;
echo The URL must be fully qualified, including the http:// or https://
goto :EOF

JScript */
var x=new ActiveXObject("Microsoft.XMLHTTP");
x.open("GET",WSH.Arguments(0),true);
x.setRequestHeader('User-Agent','XMLHTTP/1.0');
x.send('');
while (x.readyState!=4) {WSH.Sleep(50)};
WSH.Echo(x.responseText);
Share:
13,224

Related videos on Youtube

peet
Author by

peet

Updated on September 16, 2022

Comments

  • peet
    peet over 1 year

    I need to work with windows cmd functionality only. I need two vars/strings from a website to use in the batchscript for validate actions with it. To not make it too simple this website needs authentification in addition.

    I found this somewhere:

    @set @x=0 /*
    :: ChkHTTP.cmd
    @echo off
    setlocal
    set "URL=http://www.google.com"
    cscript /nologo /e:jscript "%~f0" %URL% | find "200" > nul
    if %ErrorLevel% EQU 0 (
    echo Web server ok % Put your code here %
    ) else (
    echo Web server error reported
    )
    goto :EOF
    
    JScript */
    var x=new ActiveXObject("Microsoft.XMLHTTP");
    x.open("GET",WSH.Arguments(0));x.send();
    while (x.ReadyState!=4) {WSH.Sleep(50)};
    WSH.Echo(x.status)
    

    But I'm not sure if it's possible to get the site content this way too instead of status answer and the more I don't know how to implement website authentification to this.

    The above code does not work correctly as it will always produce error because of the pipe, but this seemed nearer to my needs of parsing the content I hoped.

    • BDM
      BDM about 11 years
      Why on earth...?
    • David Ruhmann
      David Ruhmann about 11 years
      Do note that the script you listed uses JScript. Are you able to use any other scripting languages besides just Batch (Powershell, JScript, VBScript, Etc...)? Also +1 to Prof Pickle
  • peet
    peet about 11 years
    very cool rojo, now i could find the string, strip and use as var, but i need to authenticate to fetch the page. the site is protected with basic htaccess. is it possible to implement this in the jscript part anyway please?
  • rojo
    rojo about 11 years
    Easiest way would be to pass the auth info via the URL, like http://username:[email protected]/etc. I'm not positive that that will work. If it doesn't, then I can add some code for x.setRequestHeader('Authorization',etc). Or apparently the x.open method also supports two optional arguments to supply auth info. Let me know if you need me to pursue this.
  • peet
    peet about 11 years
    you're great, i almost forgot about this easy way of authentification as ie and i guess most other browsers don't support this way anymore, but with the script it did work well for me. so the only intersting option would be if with other ways it would be possible to use a hash instead of clear password. anyway great work, thanks a lot man.
  • peet
    peet about 11 years
    hello rojo, i tried to parse but don't see the forest because of so much tree's around, here i asked for help [link]stackoverflow.com/questions/15493297/parse-batch-line-‌​by-line maybe you could show me how to parse the returned lines right please?
  • peet
    peet about 11 years
    could you take a look please
  • rojo
    rojo about 11 years
    @peet - Jeez, settle down. :) OK, as you wish. Incidentally, if you want to put a link in a comment, do [text to display](url of link) with no space between, where the text is enclosed in brackets and the URL in parentheses. To link as you wished in your previous comment, you would type [here I asked for help](http://stackoverflow.com/questions/15493297/).
  • peet
    peet about 11 years
    you are totally right, i am fighting with syntax here sometimes, will change my use. i still did not get a markup working inside of code blocks like the markup syntax help sites always do and like possible in comments by backtic. could you explain to me what the first line @a==@b does? in the original code this was x, while x is used in the Jscript i can imagine some sence but your code does not use a or b?
  • rojo
    rojo about 11 years
    @peet - The if (@a==@b) @end line is a valid if statement in both Windows batch language and JScript. It's valid, but it's intentionally false. The interesting bit of that line is the /* at the end. That begins a multiline comment in JScript, so JScript ignores everything after /* until it encounters a */. And since, in fact, @a does not equal @b, the Windows cmd interpreter does not bother trying to execute @end /*, but happily continues to process the next lines which JScript ignores as comments.
  • peet
    peet about 11 years
    now i got it, cool, thank you very much for explaining. i got it working now except i don't manage to get the results passed back to the calling batch file. i tried different ways (except errorlevel retun) but am missing something, mybe you could gimme a light here?
  • JaseC
    JaseC about 10 years
    This is awesome. So much better than wget.