Add http(s) to URL if it's not there?

20,995

Solution 1

Use a before filter to add it if it is not there:

before_validation :smart_add_url_protocol

protected

def smart_add_url_protocol
  unless url[/\Ahttp:\/\//] || url[/\Ahttps:\/\//]
    self.url = "http://#{url}"
  end
end

Leave the validation you have in, that way if they make a typo they can correct the protocol.

Solution 2

Don't do this with a regex, use URI.parse to pull it apart and then see if there is a scheme on the URL:

u = URI.parse('/pancakes')
if(!u.scheme)
  # prepend http:// and try again
elsif(%w{http https}.include?(u.scheme))
  # you're okay
else
  # you've been give some other kind of
  # URL and might want to complain about it
end

Using the URI library for this also makes it easy to clean up any stray nonsense (such as userinfo) that someone might try to put into a URL.

Solution 3

The accepted answer is quite okay. But if the field (url) is optional, it may raise an error such as undefined method + for nil class. The following should resolve that:

def smart_add_url_protocol
  if self.url && !url_protocol_present?
    self.url = "http://#{self.url}"
  end
end

def url_protocol_present?
  self.url[/\Ahttp:\/\//] || self.url[/\Ahttps:\/\//]
end

Solution 4

Preface, justification and how it should be done

I hate it when people change model in a before_validation hook. Then when someday it happens that for some reason models need to be persisted with save(validate: false), then some filter that was suppose to be always run on assigned fields does not get run. Sure, having invalid data is usually something you want to avoid, but there would be no need for such option if it wasn't used. Another problem with it is that every time you ask from a model is it valid these modifications also take place. The fact that simply asking if a model is valid may result in the model getting modified is just unexpected, perhaps even unwanted. There for if I'd have to choose a hook I'd go for before_save hook. However, that won't do it for me since we provide preview views for our models and that would break the URIs in the preview view since the hook would never get called. There for, I decided it's best to separate the concept in to a module or concern and provide a nice way for one to apply a "monkey patch" ensuring that changing the fields value always runs through a filter that adds a default protocol if it is missing.

The module

#app/models/helpers/uri_field.rb
module Helpers::URIField
  def ensure_valid_protocol_in_uri(field, default_protocol = "http", protocols_matcher="https?")
    alias_method "original_#{field}=", "#{field}="
    define_method "#{field}=" do |new_uri|
      if "#{field}_changed?"
        if new_uri.present? and not new_uri =~ /^#{protocols_matcher}:\/\//
          new_uri = "#{default_protocol}://#{new_uri}"
        end
        self.send("original_#{field}=", new_uri)
      end
    end
  end
end

In your model

extend Helpers::URIField
ensure_valid_protocol_in_uri :url
#Should you wish to default to https or support other protocols e.g. ftp, it is
#easy to extend this solution to cover those cases as well
#e.g. with something like this
#ensure_valid_protocol_in_uri :url, "https", "https?|ftp"

As a concern

If for some reason, you'd rather use the Rails Concern pattern it is easy to convert the above module to a concern module (it is used in an exactly similar way, except you use include Concerns::URIField:

#app/models/concerns/uri_field.rb
module Concerns::URIField
  extend ActiveSupport::Concern

  included do
    def self.ensure_valid_protocol_in_uri(field, default_protocol = "http", protocols_matcher="https?")
      alias_method "original_#{field}=", "#{field}="
      define_method "#{field}=" do |new_uri|
        if "#{field}_changed?"
          if new_uri.present? and not new_uri =~ /^#{protocols_matcher}:\/\//
            new_uri = "#{default_protocol}://#{new_uri}"
          end
          self.send("original_#{field}=", new_uri)
        end
      end
    end
  end
end

P.S. The above approaches were tested with Rails 3 and Mongoid 2.
P.P.S If you find this method redefinition and aliasing too magical you could opt not to override the method, but rather use the virtual field pattern, much like password (virtual, mass assignable) and encrypted_password (gets persisted, non mass assignable) and use a sanitize_url (virtual, mass assignable) and url (gets persisted, non mass assignable).

Solution 5

Based on mu's answer, here's the code I'm using in my model. This runs when :link is saved without the need for model filters. Super is required to call the default save method.

def link=(_link)
    u=URI.parse(_link)

    if (!u.scheme)
        link = "http://" + _link
    else
        link = _link
    end
    super(link)
end
Share:
20,995
Admin
Author by

Admin

Updated on July 22, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm using this regex in my model to validate an URL submitted by the user. I don't want to force the user to type the http part, but would like to add it myself if it's not there.

    validates :url, :format => { :with => /^((http|https):\/\/)?[a-z0-9]+([-.]{1}[a-z0-9]+).[a-z]{2,5}(:[0-9]{1,5})?(\/.)?$/ix, :message => " is not valid" }
    

    Any idea how I could do that? I have very little experience with validation and regex..

  • d11wtq
    d11wtq over 12 years
    We use Addressable for exactly this. It's a little weird if there's no scheme in the URL, however, since it considers the host to be the path.
  • Tony Beninate
    Tony Beninate about 12 years
    unless self.url[/^http?s:\/\//] wasn't quite working for me. I had to do unless self.url[/^http:\/\//] || self.url[/^https:\/\//]
  • koffeinfrei
    koffeinfrei over 10 years
    This shouldn't be a validation, but merely a before_save hook. The purpose of validations are to invalidate the instance (preventing it from saving), which is not the case here.
  • Douglas F Shearer
    Douglas F Shearer over 10 years
    You're getting confused. This isn't a validation. It's a method that's run before the validations are run.
  • mu is too short
    mu is too short over 10 years
    @d11wtq I just switched to Addressable to get sane and consistent UTF-8 support in URLs.
  • Timo
    Timo about 10 years
    I must advice against this dirty trick. It "works" with simple paths like /pancakes, but why would anybody want to enforce protocol on paths? However, if we're talking about "web addresses" as normal human beings understand and write them, then using URI will not parse them correctly. That is because most people leave out the double forward slashes indicating the beginning of authority definition out of "web addresses". In fact I believe many think they belong to the protocol definition, but they do not. To be continued... (sorry for the super long double post)
  • Timo
    Timo about 10 years
    When "URLs" like these are then parsed using some specifications abiding component like Rubys URI the result is that there is no host in the URI object, but it is inferred as a mere path in its entirety. This is unlikely what is intended and gives a false impression that you have a properly parsed URI object at your disposal, but if someone was to modify any of its components the results would be surprising. To properly parse a "web address" as a URI you should always ensure first that the forward dashes are in place.
  • mu is too short
    mu is too short about 10 years
    @TimoLehto: I don't get your point. Do you have an example where this fails?
  • Timo
    Timo about 10 years
    @muistooshort Well your example lacks some details, but I can see two obvious implementations for the "# prepend http:// and try again" part. The more natural solution u.scheme = "http"; u.to_s == "http:www.example.com" #Ooops, what happened? or you prepend it to the original string and parse it again. u = "http://#{orig_string}". This however fails if anybody gives you a protocol relative URI as then you'll end up with "http:////www.example.com" and after reparsing u.host == nil && u.path == "//www.example.com", which is not right and that's why I consider this quite dangerous.
  • mu is too short
    mu is too short about 10 years
    @TimoLehto: Fair enough, but a simply replacing "prepend http:// and try again" with "remove leading slashes, prepend http://, and try again" would handle that. One big advantage of going through URI over simple regex wrangling is that it is easier to strip out almost-always-nefarious things such as userinfo, normalize case, ... And modifying the scheme using URI doesn't do anything useful, the library really shouldn't offer a mutator for that property but that's a separate issue.
  • Timo
    Timo about 10 years
    @muistooshort Yes, I'm not saying your solution is all horrible or anything. It just looks way better solution than it actually is and that's why I consider it dangerous and deceptive. I don't understand what you mean when you say "modifying scheme doesn't do anything useful". Why shouldn't it offer a mutator for that? It works like a charm so long as you feed it proper URLs: u = URI.parse("http://www.example.com/"); u.scheme = "https"; u.to_s == "https://www.example.com"; u.path = "/index.html; u.to_s == "https://www.example.com/index.html"
  • mu is too short
    mu is too short about 10 years
    @TimoLehto: u = URI.parse('/pancakes'); u.scheme = 'http'; u.class vs URI.parse('http://example.com/').class. The design of the URI library conflates scheme and class but changing the scheme cannot change the class.
  • Alter Lagos
    Alter Lagos over 9 years
    I agree with @koffeinfrei in the sense that this should be applied after the field is already validated, not before. For instance, with that solution something like validates :url, :presence => true will never fail because always will have at least the http:// value
  • Nuno Silva
    Nuno Silva over 9 years
    does not work with rails 4 due to attribute methods not being defined until api.rubyonrails.org/classes/ActiveModel/… happens. related: stackoverflow.com/questions/16727976/…
  • Earl Jenkins
    Earl Jenkins about 5 years
    As an aside, I'd like to point out that interpolation is always more performant in Ruby than concatenation. Thus, it should be link = "http://#{_link}".
  • ryaz
    ryaz about 4 years
    \Ahttp(s)?:\/\/ one regexp to catch http and https