How to set the Referer header before loading a page with Ruby mechanize?
Solution 1
The docs say:
get(uri, parameters = [], referer = nil, headers = {}) { |page| ... }
so for example:
agent.get 'http://www.google.com/', [], agent.page.uri, {'foo' => 'bar'}
alternatively you might like:
agent.request_headers = {'foo' => 'bar'}
agent.get url
Solution 2
You misunderstood the code you were copying. There was a newline in the example, but it disappeared in the formatting as it wasn't tagged as code. $agent
contains nil
since you're trying to use it before it has been initialized. You must initialize the object and then use it. Just try this:
$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' }
Solution 3
For this question I noticed people seem to use:
page = agent.get("http://www.you.com/index_login/", :referer => "http://www.you.com/")
As an aside, now that I tested this answer, it seems this was not the issue behind my actual problem: that every visit to a site I'm scraping requires going through the login sequence pages again, even seconds later after the first logged-in visit, despite that I'm always loading and saving the complete cookie jar in yaml format. But that would lead to another question of course.
Marcos
Pioneers are the ones with arrows in their backs. Investment Portfolio Software, Trade Automation Designer of infinite compression algorithms, and other research projects
Updated on June 24, 2022Comments
-
Marcos almost 2 years
Is there a straightforward way to set custom headers with Mechanize 2.3?
I tried a former solution but get:
$agent = Mechanize.new $agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' } # ./mech.rb:30:in `<main>': undefined method `pre_connect_hooks' for nil:NilClass (NoMethodError)
-
Guilherme Y. Hatano over 4 yearsYou shouldn't accept your own answer, furthermore your answer is not right, you should choose pguardiario answer
-
Marcos over 4 yearsAs you suggest, I switched answers (now in 2019), although I haven't confirmed nor tested it; that environment is long gone. Back in 2012, my eventual solution was the only one that worked for me among others, if any. There is a lot to be said about the effects of libraries and even APIs changing their behavior & nuances over time. I.e. what used to be "right" nearly eight years ago...