How to implement a SQL like 'LIKE' operator in java?

130,208

Solution 1

.* will match any characters in regular expressions

I think the java syntax would be

"digital".matches(".*ital.*");

And for the single character match just use a single dot.

"digital".matches(".*gi.a.*");

And to match an actual dot, escape it as slash dot

\.

Solution 2

Yes, this could be done with a regular expression. Keep in mind that Java's regular expressions have different syntax from SQL's "like". Instead of "%", you would have ".*", and instead of "?", you would have ".".

What makes it somewhat tricky is that you would also have to escape any characters that Java treats as special. Since you're trying to make this analogous to SQL, I'm guessing that ^$[]{}\ shouldn't appear in the regex string. But you will have to replace "." with "\\." before doing any other replacements. (Edit: Pattern.quote(String) escapes everything by surrounding the string with "\Q" and "\E", which will cause everything in the expression to be treated as a literal (no wildcards at all). So you definitely don't want to use it.)

Furthermore, as Dave Webb says, you also need to ignore case.

With that in mind, here's a sample of what it might look like:

public static boolean like(String str, String expr) {
    expr = expr.toLowerCase(); // ignoring locale for now
    expr = expr.replace(".", "\\."); // "\\" is escaped to "\" (thanks, Alan M)
    // ... escape any other potentially problematic characters here
    expr = expr.replace("?", ".");
    expr = expr.replace("%", ".*");
    str = str.toLowerCase();
    return str.matches(expr);
}

Solution 3

Regular expressions are the most versatile. However, some LIKE functions can be formed without regular expressions. e.g.

String text = "digital";
text.startsWith("dig"); // like "dig%"
text.endsWith("tal"); // like "%tal"
text.contains("gita"); // like "%gita%"

Solution 4

Every SQL reference I can find says the "any single character" wildcard is the underscore (_), not the question mark (?). That simplifies things a bit, since the underscore is not a regex metacharacter. However, you still can't use Pattern.quote() for the reason given by mmyers. I've got another method here for escaping regexes when I might want to edit them afterward. With that out of the way, the like() method becomes pretty simple:

public static boolean like(final String str, final String expr)
{
  String regex = quotemeta(expr);
  regex = regex.replace("_", ".").replace("%", ".*?");
  Pattern p = Pattern.compile(regex,
      Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
  return p.matcher(str).matches();
}

public static String quotemeta(String s)
{
  if (s == null)
  {
    throw new IllegalArgumentException("String cannot be null");
  }

  int len = s.length();
  if (len == 0)
  {
    return "";
  }

  StringBuilder sb = new StringBuilder(len * 2);
  for (int i = 0; i < len; i++)
  {
    char c = s.charAt(i);
    if ("[](){}.*+?$^|#\\".indexOf(c) != -1)
    {
      sb.append("\\");
    }
    sb.append(c);
  }
  return sb.toString();
}

If you really want to use ? for the wildcard, your best bet would be to remove it from the list of metacharacters in the quotemeta() method. Replacing its escaped form -- replace("\\?", ".") -- wouldn't be safe because there might be backslashes in the original expression.

And that brings us to the real problems: most SQL flavors seem to support character classes in the forms [a-z] and [^j-m] or [!j-m], and they all provide a way to escape wildcard characters. The latter is usually done by means of an ESCAPE keyword, which lets you define a different escape character every time. As you can imagine, this complicates things quite a bit. Converting to a regex is probably still the best option, but parsing the original expression will be much harder--in fact, the first thing you would have to do is formalize the syntax of the LIKE-like expressions themselves.

Solution 5

To implement LIKE functions of sql in java you don't need regular expression in They can be obtained as:

String text = "apple";
text.startsWith("app"); // like "app%"
text.endsWith("le"); // like "%le"
text.contains("ppl"); // like "%ppl%"
Share:
130,208
Chris
Author by

Chris

Updated on July 09, 2022

Comments

  • Chris
    Chris almost 2 years

    I need a comparator in java which has the same semantics as the sql 'like' operator. For example:

    myComparator.like("digital","%ital%");
    myComparator.like("digital","%gi?a%");
    myComparator.like("digital","digi%");
    

    should evaluate to true, and

    myComparator.like("digital","%cam%");
    myComparator.like("digital","tal%");
    

    should evaluate to false. Any ideas how to implement such a comparator or does anyone know an implementation with the same semantics? Can this be done using a regular expression?

  • Chris
    Chris about 15 years
    yeah, thanks! But in case the word ins't so simple like "%dig%" and the string needs some escping? Is there anything already exsiting? What about the '?' ?
  • Chris
    Chris about 15 years
    what abot "%this%string%"? split on the '%' sign, iterate over the array and than check for every entry? i think this could be done better ...
  • Bob
    Bob about 15 years
    I edited my answer for the question mark operator. I am a little confused by the rest of your comment though. Are you saying the string is coming to you in sql syntax and you want to evaluate it as is? If that is the case I think you will need to replace to sql syntax manually.
  • Chris
    Chris about 15 years
    what if the string which is used as a search pattern contains grouping characters like '(' or ')' escape them too? how mayn other characters needs escaping?
  • Bob
    Bob about 15 years
    I think that will depend on how many options you are allowing.
  • Chris
    Chris about 15 years
    exists there a method, which escapes every charachter with special meaning in java regex?
  • palantus
    palantus about 15 years
    Yes, Pattern.quote (java.sun.com/javase/6/docs/api/java/util/regex/… ) will do it. For some reason, I thought that might cause a problem, but now I don't know why I didn't include it in the answer.
  • palantus
    palantus about 15 years
    Oh yes, now I remember. It's because ? is a special regex character, so it would be escaped before we could replace it. I suppose we could instead use Pattern.quote and then expr = expr.replace("\\?", ".");
  • GreenieMeanie
    GreenieMeanie about 15 years
    Just beware that .* is greedy(.*? might be more approriate). I don't think .* in regex is exactly the same semantics as % in SQL.
  • Alan Moore
    Alan Moore almost 15 years
    Your inner split() and loop replaces any \? sequence with a dot--I don't get that. Why single out that sequence, only to replace it with a dot just like a lone question mark?
  • tommyL
    tommyL almost 15 years
    it replaces the '?' with a '.' because '?' is a place holder for a single arbitrary character. i know '\\\\\\?' looks strange but i testedt it and for my tests it seems to work.
  • tommyL
    tommyL almost 15 years
    do you know if hibernate does support this feature? i mean, to filter objects currently in memory using such an expression?
  • True Soft
    True Soft almost 12 years
    You can add also expr = expr.replaceAll("(?<!\\\\)_", ".");, because "\_" can be escaped in SQL, and should not be replaced with "." in this case. (I used _ instead of ? for one character.)
  • True Soft
    True Soft almost 12 years
    Also, for %, this replacement would be better: expr = expr.replaceAll("(?<!\\\\)%", ".*");
  • Leo
    Leo over 7 years
    if(s == null) throw new IllegalArgumentException("String cannot be null"); else if(s.isEmpty()) return "";
  • Pang
    Pang about 7 years
    This is essentially just a repeat of this existing answers posted many years ago.
  • Christian
    Christian about 4 years
    Oh really? And what about if text was "I like apples but not oranges" and the search is something like "%oranges%apples%"