Ruby String split with regex

43,058

Solution 1

I think this would do it:

a.split(/\.(?=[\w])/)

I don't know how much you know about regex, but the (?=[\w]) is a lookahead that says "only match the dot if the next character is a letter kind of character". A lookahead won't actually grab the text it matches. It just "looks". So the result is exactly what you're looking for:

> a.split(/\.(?=[\w])/)
 => ["foo", "bar", "size", "split('.')", "last"] 

Solution 2

I'm afraid that regular expressions won't take you very far. Consider for example the following expressions (which are also valid Ruby)

"(foo.bar.size.split( '.' )).last"
"(foo.bar.size.split '.').last"
"(foo.bar.size.split '( . ) . .(). .').last"

The problem is, that the list of calls is actually a tree of calls. The easiest solution in sight is probably to use a Ruby parser and transform the parse tree according to your needs (in this example we are recursively descending into the call tree, gathering the calls into a list):

# gem install ruby_parser
# gem install awesome_print
require 'ruby_parser'
require 'ap'

def calls_as_list code
    tree = RubyParser.new.parse(code)

    t = tree
    calls = []

    while t
        # gather arguments if present
        args = nil
        if t[3][0] == :arglist
            args = t[3][1..-1].to_a
        end
        # append all information to our list
        calls << [t[2].to_s, args]
        # descend to next call
        t = t[1]
    end

    calls.reverse
end

p calls_as_list "foo.bar.size.split('.').last"
#=> [["foo", []], ["bar", []], ["size", []], ["split", [[:str, "."]]], ["last", []]]
p calls_as_list "puts 3, 4"
#=> [["puts", [[:lit, 3], [:lit, 4]]]]

And to show the parse tree of any input:

ap RubyParser.new.parse("puts 3, 4")

Solution 3

a = "foo.bar.size.split('.').last"
p a.split(/(?<!')\.(?!')/)

#=> ["foo", "bar", "size", "split('.')", "last"]

You are looking for Lookahead and Lookbehind assertions. http://www.regular-expressions.info/lookaround.html

Solution 4

here I don't have ruby env. I tried with python re.split().

In : re.split("(?<!')\.(?!')",a)
Out: ['foo', 'bar', 'size', "split('.')", 'last']

the regex above has negative lookahead AND lookbehind, to make sure only the "dot" between single quotes won't work as separator.

of course, for the given example by you, one of lookbehind or lookahead is sufficient. you can choose the right way for your requirement.

Share:
43,058

Related videos on Youtube

Haris Krajina
Author by

Haris Krajina

I am a software and product engineer that is very passionate about building innovative and simple products.

Updated on September 10, 2020

Comments

  • Haris Krajina
    Haris Krajina over 3 years

    This is Ruby 1.8.7 but should be same as for 1.9.x

    I am trying to split a string for example:

    a = "foo.bar.size.split('.').last"
    # trying to split into ["foo", "bar","split('.')","last"]
    

    Basically splitting it in commands it represents, I am trying to do it with Regexp but not sure how, idea was to use regexp

    a.split(/[a-z\(\)](\.)[a-z\(\)]/)
    

    Here trying to use group (\.) to split it with but this seems not to be good approach.

    • sawa
      sawa over 11 years
      It is not as easy as you think.
    • iconoclast
      iconoclast over 9 years
      @sawa: you closed a question because you think it's too hard?
    • sawa
      sawa over 9 years
      @iconoclast I don't remember, but not because of the reason you think.
    • iconoclast
      iconoclast over 9 years
      @sawa I see no legitimate reason to close this question. What am I missing?
    • sawa
      sawa over 9 years
      @iconoclast It is not constructive to do such thing. See Matt's answer and comments under Jason Swett's answer. But the reason is not that either.
    • iconoclast
      iconoclast over 9 years
      How does that justify closing the question? How is it constructive to shutdown all attempts to solve difficult problems? The question is clearly not an opinion-based question. The main thing that is opinionated is your claim that this is not a good idea.
    • reducing activity
      reducing activity over 7 years
      @sawa - "It is not constructive to do such thing". Maybe in this particular example. But this is top result for googling "Ruby split with regexp" (see duckduckgo.com/?q=Ruby+split+with+regexp). I see no reason whatsoever to close this question.
  • Haris Krajina
    Haris Krajina over 11 years
    Wow, excellent and thank you for info about lookahead. No I did not know that and it is excellent thing to learn seems very useful.
  • sawa
    sawa over 11 years
    This will split a string like "foo.bar.size.split('.bar').last" into ["foo", "bar", "size", "split('", "bar')", "last"].
  • sawa
    sawa over 11 years
    As you may have noticed, this will not work correctly for ["foo", "bar", "size", "split('o.b')", "last"].
  • Jason Swett
    Jason Swett over 11 years
    Good point. Hats off to the person who can figure out how to make this work with any argument inside the split - it's beyond my skill level.
  • Haris Krajina
    Haris Krajina over 11 years
    That is true @sawa, as you assume this is for metaprograming purposes so I will experience problems down the road. Looking for the away to make it ' aware.
  • Jason Swett
    Jason Swett over 11 years
    Not just ' but probably "! Might want to account for the whole split(...) depending on what your requirements are.