CURL + POST + multipart/form-data

23,812

Solution 1

Set the CURLOPT_POST to true:

CURLOPT_POST = true

Then fill your post fields like this 'setup':

$postfields = array();
$postfields['field1'] = 'value1';
$postfields['field2'] = 'value2';
CURLOPT_POSTFIELDS => $postfields

If value is an array, the Content-Type header will be set to multipart/form-data.

The PHP manual

Solution 2

Yes, $post_string needs to be an array.

Also set CURLOPT_POST to true.

Share:
23,812
user3352382
Author by

user3352382

Updated on April 10, 2020

Comments

  • user3352382
    user3352382 about 4 years

    I am trying to scrape a website using PHP, CURL and POST method in order to submit a form before web scraping the page. The problem I am experiencing is that there is connected with POST method: no data is submitted to the server, so the scraped webpage doesn't contain what I am looking for.

    I quit sure the problem is connected with the form type: enctype="multipart/form-data". How can I manage this POST request, considering that the form is multipart/form-data? Do I have to encode the post_string in a special way?

    Here's the code I'm using:

     function curl($url) {
    
    //POST string
    $post_string="XXXX";
    
    $options = Array(
            CURLOPT_RETURNTRANSFER => TRUE,  
            CURLOPT_FOLLOWLOCATION => TRUE, 
            CURLOPT_AUTOREFERER => TRUE, 
            CURLOPT_CONNECTTIMEOUT => 120,  
            CURLOPT_TIMEOUT => 120, 
            CURLOPT_MAXREDIRS => 10, 
            CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",  
            CURLOPT_URL => $url, 
            CURLOPT_CAINFO => dirname(__FILE__)."/cacert.pem",
    
            CURLOPT_POSTFIELDS => $post_string,
    
        );
    
        $ch = curl_init(); 
        curl_setopt_array($ch, $options);   
        $data = curl_exec($ch); 
        curl_error($ch);
        curl_close($ch);       
        return $data;   
    }
    
    $scraped_page = curl("XXXURLXXX");    
    echo $scraped_page; 
    

    Thank you!

  • user3352382
    user3352382 about 10 years
    Well, I've added these lines: CURLOPT_POST => TRUE, CURLOPT_POSTFIELDS => http_build_query($postfields), and filled post fields with your setup, but it still doesn't work: Firebug confirms that no POST is executed...
  • KoalaBear
    KoalaBear about 10 years
    Firebug will not show the real POST as this happens 'inside' the PHP execution, not inside the 'output' of the page. Add the following to your PHP, after the $data = curl_exec($ch); ==> var_dump(curl_getinfo($ch)); And see what that will show.
  • user3352382
    user3352382 about 10 years
    Ah, ok, thank you, but the page which is "echoed" still shows the empty form and no results..
  • KoalaBear
    KoalaBear about 10 years
    Hmm. Some people recommend to just do this, without the http_build_query function: CURLOPT_POSTFIELDS => $postfields And I found out why: nl1.php.net/curl_setopt "If value is an array, the Content-Type header will be set to multipart/form-data."
  • KoalaBear
    KoalaBear about 10 years
    I'm only saying what PHP will do, if this does not work, and if we cannot test this code locally, it's the best thing we can do :) :|
  • user3352382
    user3352382 about 10 years
    I've edited the code as you suggest: now, if I put CURLOPT_POST = true AFTER CURLOPT_POSTFIELDS => $postfields, I receive "Bad Request (Invalid Number)"...while, if I put CURLOPT_POST = true BEFORE CURLOPT_POSTFIELDS => $postfields, nothing special happens (the form page is echoed)...
  • KoalaBear
    KoalaBear about 10 years
    It looks like there is a difference indeed. Maybe try to NOT specify CURLOPT_POST and only setting CURLOPT_POSTFIELDS? Worth a try. stackoverflow.com/questions/7979854/…
  • user3352382
    user3352382 about 10 years
    Without CURLOPT_POST no error messages, but the scraped webpage contains only the form and no results. May I show you the code, so you can test it locally? Here it is (please, do not paste the scraped URL, it's quite confidential): pastebin.com/p4CeZVuQ
  • KoalaBear
    KoalaBear about 10 years
    OK. I see this example redirects (HTTP CODE 302) because it's URL is HTTP and not HTTPS. So I used HTTPS and 'bypassed' SSL checks to make it work. Check it out, with this one I do get results back: pastebin.com/64NV22fV (pastbin will expire in 2 weeks) You can remove CURLOPT_HEADER, I used it for debugging, also the var_dump(curl_getinfo($ch));