Regular expression for a JIRA identifier

18,719

Solution 1

You can make sure that character before your pattern is either a whitespace, or the beginning of the string using alternation. Similarly make sure, it is followed by either whitespace or end of the string.

You can use this regex:

my ( $id ) = ( $line =~ /(?:\s|^)([A-Z]+-[0-9]+)(?=\s|$)/ );

Solution 2

Official JIRA ID Regex (Java):

Atlassian themselves have a couple webpages floating around that suggest a good (java) regex is this:

((?<!([A-Z]{1,10})-?)[A-Z]+-\d+)

(Source: https://confluence.atlassian.com/display/STASHKB/Integrating+with+custom+JIRA+issue+key)

Test String:
"BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"

Matches:
BF-18, X-88, ABCDEFGHIJKL-999, DEF-33, ABC-1

Improved JIRA ID Regex (Java):

But, I don't really like it because it will match the "DEF-33" from "abcDEF-33", whereas I prefer to ignore "abcDEF-33" altogether. So in my own code I'm using:

((?<!([A-Za-z]{1,10})-?)[A-Z]+-\d+)

Notice how "DEF-33" is no longer matched:

Test String:
"BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"

Matches:
BF-18, X-88, ABCDEFGHIJKL-999, ABC-1

Improved JIRA ID Regex (JavaScript):

I also needed this regex in JavaScript. Unfortunately, JavaScript does not support the LookBehind (?<!a)b, and so I had to port it to LookAhead a(?!b) and reverse everything:

var jira_matcher = /\d+-[A-Z]+(?!-?[a-zA-Z]{1,10})/g

This means the string to be matched needs to be reversed ahead of time, too:

var s = "BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"
s = reverse(s)
var m = s.match(jira_matcher);

// Also need to reverse all the results!
for (var i = 0; i < m.length; i++) {
    m[i] = reverse(m[i])
}
m.reverse()
console.log(m)

// Output:
[ 'BF-18', 'X-88', 'ABCDEFGHIJKL-999', 'ABC-1' ]

Solution 3

If you include sample data with your question, you get the best shot at answers from those who might not have Jira, etc.

Here's another take on it-

my $matcher = qr/ (?: (?<=\A) | (?<=\s) )
                  ([A-Z]{1,4}-[1-9][0-9]{0,6})
                  (?=\z|\s|[[:punct:]]) /x;

while ( <DATA> )
{
    chomp;
    my @matches = /$matcher/g;
    printf "line: %s\n\tmatches: %s\n",
        $_,
        @matches ? join(", ", @matches) : "none";
}

__DATA__
JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?

Remember that [0-9] will match 0001 and friends which you probably don't want. I think, but can't verify, Jira truncates issue prefixes to 4 characters max. So the regex I did only allows 1-4 capital letters; easy to change if wrong. 10 million tickets seems like a reasonably high top end for issue numbers. I also allowed for trailing punctuation. You may have to season that kind of thing to taste, wild data. You need the g and capture to an array instead of a scalar if you're matching strings that could have more than one issue id.

line: JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
        matches: JIRA-1, BIN-10000
line: A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?
        matches: A-1, TACO-7133
Share:
18,719

Related videos on Youtube

DaveG
Author by

DaveG

Senior IT manager with significant experience in a wide-range of technologies across systems/network administration, software development and service delivery within global blue-chip companies. Particular expertise in JIRA, Confluence, Subversion, ClearCase, Nexus and other development technologies.

Updated on June 12, 2022

Comments

  • DaveG
    DaveG almost 2 years

    I'm trying to extract a JIRA identifier from a line of text.

    JIRA identifiers are of the form [A-Z]+-[0-9] - I have the following pattern:

    foreach my $line ( @textBlock ) {
        my ( $id ) = ( $line =~ /[\s|]?([A-Z]+-[0-9]+)[\s:|]?/ );
        push @jiraIDs, $id if ( defined $id && $id !~ /^$/ );
    }
    

    This doesn't cope if someone specifies some text which contains the pattern inside another string - for example blah_blah_ABC-123 would match upon ABC-123. I don't want to mandate that there must be a space or other delimiter in front of the match as that would fail if the identifier were at the start of the line.

    Can anyone suggest the necessary runes?

    Thanks.

  • DaveG
    DaveG over 10 years
    That doesn't quite work ... because the lookbehind is variable length (one character [\s] or none [^]) which causes a Variable length lookbehind not implemented in regex error.
  • Rohit Jain
    Rohit Jain over 10 years
    @DaveG Fixed it. Thanks :)
  • DaveG
    DaveG over 10 years
    Good point about the [0-9] matching 0001. I'll re-use the [1-9][0-9] aspect of your regex. Wouldn't your use of [:punct:] mean that you would match "ABZ-123-foo"? FYI: JIRA doesn't truncate prefixes though - for example, we have one of out projects with a key of INCIDENT.
  • Julius Musseau
    Julius Musseau almost 9 years
    JIRA doesn't care about leading zeroes: issues.apache.org/jira/browse/CODEC-0000000069
  • grayaii
    grayaii over 6 years
    Any idea how to do it in python? The "Official JIRA ID Regex" and the "Improved JIRA ID Regex" cause a python error, "look-behind requires fixed-width pattern". The "Improved JIRA ID Regex" in python seems to be the best bet, but it matches things like 'INXX-2222s'[::-1]. Mabe this is worth a standalone question, rather than a comment?
  • rafasoares
    rafasoares over 6 years
    @grayaii Ruby has the same problem, I solved it with the JavaScript method (reverse, match, reverse back). However, I prefer the official one (just remove the lowercase a-z), as it adds some tolerance to formatting errors (let's say the commit message was supposed to be "Fixed this\nABC-123", but, for some reason, you got "Fixed thisABC-123"). I bet that's the reasoning behind the official regex.
  • rominf
    rominf over 5 years
    Works in Python too! Thank you!
  • rominf
    rominf over 5 years
    For matching project keys use: my ( $id ) = ( $line =~ /(?:\s|^)([A-Z0-9_]+)(?=\s|$)/ );
  • Rounder
    Rounder over 5 years
    I had an issue with this because my JIRA Issue ID key was like this: ABC1-123 where it had a number after the letters to the left of the dash. I ended up with this regex that worked: ((?<!([A-Z])-?)[A-Za-z0-9_]+-\d+)
  • Kit Grose
    Kit Grose about 2 years
    The regex in Jira itself changed, such that the "correct" regex for a Jira issue is apparently [A-Z][A-Z0-9]+-[0-9]+ (the product code must be at least two characters long, must start with a letter and must be all uppercase or digits).