How to set the Referer header before loading a page with Ruby mechanize?

10,762

Solution 1

The docs say:

get(uri, parameters = [], referer = nil, headers = {}) { |page| ... }

so for example:

agent.get 'http://www.google.com/', [], agent.page.uri, {'foo' => 'bar'}

alternatively you might like:

agent.request_headers = {'foo' => 'bar'}
agent.get url

Solution 2

You misunderstood the code you were copying. There was a newline in the example, but it disappeared in the formatting as it wasn't tagged as code. $agent contains nil since you're trying to use it before it has been initialized. You must initialize the object and then use it. Just try this:

$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' }

Solution 3

For this question I noticed people seem to use:

page = agent.get("http://www.you.com/index_login/", :referer => "http://www.you.com/")

As an aside, now that I tested this answer, it seems this was not the issue behind my actual problem: that every visit to a site I'm scraping requires going through the login sequence pages again, even seconds later after the first logged-in visit, despite that I'm always loading and saving the complete cookie jar in yaml format. But that would lead to another question of course.

Share:
10,762
Marcos
Author by

Marcos

Pioneers are the ones with arrows in their backs. Investment Portfolio Software, Trade Automation Designer of infinite compression algorithms, and other research projects

Updated on June 24, 2022

Comments

  • Marcos
    Marcos almost 2 years

    Is there a straightforward way to set custom headers with Mechanize 2.3?

    I tried a former solution but get:

    $agent = Mechanize.new
    $agent.pre_connect_hooks << lambda { |p|
      p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main'
    } 
    
    # ./mech.rb:30:in `<main>': undefined method `pre_connect_hooks' for nil:NilClass (NoMethodError)
    
  • Guilherme Y. Hatano
    Guilherme Y. Hatano over 4 years
    You shouldn't accept your own answer, furthermore your answer is not right, you should choose pguardiario answer
  • Marcos
    Marcos over 4 years
    As you suggest, I switched answers (now in 2019), although I haven't confirmed nor tested it; that environment is long gone. Back in 2012, my eventual solution was the only one that worked for me among others, if any. There is a lot to be said about the effects of libraries and even APIs changing their behavior & nuances over time. I.e. what used to be "right" nearly eight years ago...