Case-insensitive Lua pattern-matching

18,991

Solution 1

Try something like this:

function case_insensitive_pattern(pattern)

  -- find an optional '%' (group 1) followed by any character (group 2)
  local p = pattern:gsub("(%%?)(.)", function(percent, letter)

    if percent ~= "" or not letter:match("%a") then
      -- if the '%' matched, or `letter` is not a letter, return "as is"
      return percent .. letter
    else
      -- else, return a case-insensitive character class of the matched letter
      return string.format("[%s%s]", letter:lower(), letter:upper())
    end

  end)

  return p
end

print(case_insensitive_pattern("xyz = %d+ or %% end"))

which prints:

[xX][yY][zZ] = %d+ [oO][rR] %% [eE][nN][dD]

Solution 2

Lua 5.1, LPeg v0.12

do
    local p = re.compile([[
        pattern  <- ( {b} / {escaped} / brackets / other)+
        b        <- "%b" . .
        escaped  <- "%" .
        brackets <- { "[" ([^]%]+ / escaped)* "]" }
        other    <- [^[%]+ -> cases
    ]], {
        cases = function(str) return (str:gsub('%a',function(a) return '['..a:lower()..a:upper()..']' end)) end
    })
    local pb = re.compile([[
        pattern  <- ( {b} / {escaped} / brackets / other)+
        b        <- "%b" . .
        escaped  <- "%" .
        brackets <- {: {"["} ({escaped} / bcases)* {"]"} :}
        bcases   <- [^]%]+ -> bcases
        other    <- [^[%]+ -> cases
    ]], {
        cases = function(str) return (str:gsub('%a',function(a) return '['..a:lower()..a:upper()..']' end)) end
        , bcases = function(str) return (str:gsub('%a',function(a) return a:lower()..a:upper() end)) end
    })
    function iPattern(pattern,brackets)
        ('sanity check'):find(pattern)
        return table.concat({re.match(pattern, brackets and pb or p)})
    end
end

local test                  = '[ab%c%]d%%]+ o%%r %bnm'
print(iPattern(test))       -- [ab%c%]d%%]+ [oO]%%[rR] %bnm
print(iPattern(test,true))  -- [aAbB%c%]dD%%]+ [oO]%%[rR] %bnm
print(('qwe [%D]% O%r n---m asd'):match(iPattern(test, true))) -- %D]% O%r n---m

Pure Lua version:

It is necessary to analyze all the characters in the string to convert it into a correct pattern because Lua patterns do not have alternations like in regexps (abc|something).

function iPattern(pattern, brackets)
    ('sanity check'):find(pattern)
    local tmp = {}
    local i=1
    while i <= #pattern do              -- 'for' don't let change counter
        local char = pattern:sub(i,i)   -- current char
        if char == '%' then
            tmp[#tmp+1] = char          -- add to tmp table
            i=i+1                       -- next char position
            char = pattern:sub(i,i)
            tmp[#tmp+1] = char
            if char == 'b' then         -- '%bxy' - add next 2 chars
                tmp[#tmp+1] = pattern:sub(i+1,i+2)
                i=i+2
            end
        elseif char=='[' then           -- brackets
            tmp[#tmp+1] = char
            i = i+1
            while i <= #pattern do
                char = pattern:sub(i,i)
                if char == '%' then     -- no '%bxy' inside brackets
                    tmp[#tmp+1] = char
                    tmp[#tmp+1] = pattern:sub(i+1,i+1)
                    i = i+1
                elseif char:match("%a") then    -- letter
                    tmp[#tmp+1] = not brackets and char or char:lower()..char:upper()
                else                            -- something else
                    tmp[#tmp+1] = char
                end
                if char==']' then break end -- close bracket
                i = i+1
            end
        elseif char:match("%a") then    -- letter
            tmp[#tmp+1] = '['..char:lower()..char:upper()..']'
        else
            tmp[#tmp+1] = char          -- something else
        end
        i=i+1
    end
    return table.concat(tmp)
end

local test                  = '[ab%c%]d%%]+ o%%r %bnm'
print(iPattern(test))       -- [ab%c%]d%%]+ [oO]%%[rR] %bnm
print(iPattern(test,true))  -- [aAbB%c%]dD%%]+ [oO]%%[rR] %bnm
print(('qwe [%D]% O%r n---m asd'):match(iPattern(test, true))) -- %D]% O%r n---m
Share:
18,991

Related videos on Youtube

Nubbychadnezzar
Author by

Nubbychadnezzar

Updated on September 15, 2022

Comments

  • Nubbychadnezzar
    Nubbychadnezzar over 1 year

    I'm writing a grep utility in Lua for our mobile devices running Windows CE 6/7, but I've run into some issues implementing case-insensitive match patterns. The obvious solution of converting everything to uppercase (or lower) does not work so simply due to the character classes.

    The only other thing I can think of is converting the literals in the pattern itself to uppercase.

    Here's what I have so far:

    function toUpperPattern(instr)
        -- Check first character
        if string.find(instr, "^%l") then
            instr = string.upper(string.sub(instr, 1, 1)) .. string.sub(instr, 2)
        end
        -- Check the rest of the pattern
        while 1 do
            local a, b, str = string.find(instr, "[^%%](%l+)")
            if not a then break end
            if str then
                instr = string.sub(instr, 1, a) .. string.upper(string.sub(instr, a+1, b)) .. string.sub(instr, b + 1)
            end
        end
        return instr
    end
    

    I hate to admit how long it took to get even that far, and I can still see right away there are going to be problems with things like escaped percent signs '%%'

    I figured this must be a fairly common issue, but I can't seem to find much on the topic. Are there any easier (or at least complete) ways to do this? I'm starting to go crazy here... Hoping you Lua gurus out there can enlighten me!

  • Mud
    Mud almost 12 years
    Awesome. I was drawing a blank. BTW: you can say pattern:gsub just as you said letter:lower. You could even say ('[%s%s]'):format but that's a little weirder.
  • Bart Kiers
    Bart Kiers almost 12 years
    Yeah, string.format(...) looks more familiar than ('[%s%s]'):format(...), but I like the pattern:gsub(...) better! Thanks.
  • Nubbychadnezzar
    Nubbychadnezzar almost 12 years
    Incredible. But one question... How does that not convert something like %%test to %%[tT]est? Is that match skipped because the previous iteration would have matched both '%%'? Maybe my brain is just a little fried today :/
  • Bart Kiers
    Bart Kiers almost 12 years
    @Nubbychadnezzar, %%test gets converted to %%[tT][eE][sS][tT]. Once a pattern has matched, it will never be a part of another match. So %%test has 5 matches: %%, t, e, s and t. %% remains the same, and the letters are converted to [tT], [eE], ...
  • Nubbychadnezzar
    Nubbychadnezzar almost 12 years
    @Bart "Once a pattern has matched, it will never be a part of another match." That's exactly what I needed to hear. Failing to grasp that was the primary source of my frustration! Thanks!
  • Stomp
    Stomp over 11 years
    Note: This function can't currently handle patterns that contain square brackets. For example, "The answer is [ABCD]".