How to write regex pattern in lucene?

17,124

Two problems with your regex (assuming here, based on previous questions, that your test string is indexed without any tokenization. As a StringField, for instance):

  1. The regex must match a whole term. Without any analysis, as we're assuming, that means it must match the whole field. In this case, you need to add a .* to match the rest of the field

  2. Since you have to match the whole field anyway, anchors are not supported, so get rid of the ^ at the beginning.

So the regex that should work is:

[a-z0-9 ]{6}[^*]\s*(program-id)\..*
Share:
17,124
rocky
Author by

rocky

Updated on June 04, 2022

Comments

  • rocky
    rocky almost 2 years

    I want to match a string from regexp query in lucene.

    Test String:

           program-id.  acinstal.
    

    Regex pattern in java:

    ^[a-z0-9 ]{6}[^*]\s*(program-id)\.
    

    How would i write this regex specifically for lucene regexp query to match the string.

  • rocky
    rocky about 8 years
    Is there any way to run multiple regex in lucene by grouping them with AND OR operators.
  • femtoRgon
    femtoRgon about 8 years
    @rocky - Yes, you can combine multiple queries with BooleanQuery.
  • rocky
    rocky about 8 years
    [A-Z0-9 ]{6}[^*]\senvironment\s+division\.. its not working, can you please tell me what's wrong with this pattern??
  • femtoRgon
    femtoRgon about 8 years
    The \s escape sequence is not supported. See the docs for what regex syntax is available in lucene.