Regular expression for a JIRA identifier
Solution 1
You can make sure that character before your pattern is either a whitespace, or the beginning of the string using alternation. Similarly make sure, it is followed by either whitespace or end of the string.
You can use this regex:
my ( $id ) = ( $line =~ /(?:\s|^)([A-Z]+-[0-9]+)(?=\s|$)/ );
Solution 2
Official JIRA ID Regex (Java):
Atlassian themselves have a couple webpages floating around that suggest a good (java) regex is this:
((?<!([A-Z]{1,10})-?)[A-Z]+-\d+)
(Source: https://confluence.atlassian.com/display/STASHKB/Integrating+with+custom+JIRA+issue+key)
Test String:
"BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"
Matches:
BF-18, X-88, ABCDEFGHIJKL-999, DEF-33, ABC-1
Improved JIRA ID Regex (Java):
But, I don't really like it because it will match the "DEF-33" from "abcDEF-33", whereas I prefer to ignore "abcDEF-33" altogether. So in my own code I'm using:
((?<!([A-Za-z]{1,10})-?)[A-Z]+-\d+)
Notice how "DEF-33" is no longer matched:
Test String:
"BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"
Matches:
BF-18, X-88, ABCDEFGHIJKL-999, ABC-1
Improved JIRA ID Regex (JavaScript):
I also needed this regex in JavaScript. Unfortunately, JavaScript does not support the LookBehind (?<!a)b
, and so I had to port it to LookAhead a(?!b)
and reverse everything:
var jira_matcher = /\d+-[A-Z]+(?!-?[a-zA-Z]{1,10})/g
This means the string to be matched needs to be reversed ahead of time, too:
var s = "BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"
s = reverse(s)
var m = s.match(jira_matcher);
// Also need to reverse all the results!
for (var i = 0; i < m.length; i++) {
m[i] = reverse(m[i])
}
m.reverse()
console.log(m)
// Output:
[ 'BF-18', 'X-88', 'ABCDEFGHIJKL-999', 'ABC-1' ]
Solution 3
If you include sample data with your question, you get the best shot at answers from those who might not have Jira, etc.
Here's another take on it-
my $matcher = qr/ (?: (?<=\A) | (?<=\s) )
([A-Z]{1,4}-[1-9][0-9]{0,6})
(?=\z|\s|[[:punct:]]) /x;
while ( <DATA> )
{
chomp;
my @matches = /$matcher/g;
printf "line: %s\n\tmatches: %s\n",
$_,
@matches ? join(", ", @matches) : "none";
}
__DATA__
JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?
Remember that [0-9]
will match 0001 and friends which you probably don't want. I think, but can't verify, Jira truncates issue prefixes to 4 characters max. So the regex I did only allows 1-4 capital letters; easy to change if wrong. 10 million tickets seems like a reasonably high top end for issue numbers. I also allowed for trailing punctuation. You may have to season that kind of thing to taste, wild data. You need the g
and capture to an array instead of a scalar if you're matching strings that could have more than one issue id.
line: JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
matches: JIRA-1, BIN-10000
line: A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?
matches: A-1, TACO-7133
Related videos on Youtube
DaveG
Senior IT manager with significant experience in a wide-range of technologies across systems/network administration, software development and service delivery within global blue-chip companies. Particular expertise in JIRA, Confluence, Subversion, ClearCase, Nexus and other development technologies.
Updated on June 12, 2022Comments
-
DaveG almost 2 years
I'm trying to extract a JIRA identifier from a line of text.
JIRA identifiers are of the form [A-Z]+-[0-9] - I have the following pattern:
foreach my $line ( @textBlock ) { my ( $id ) = ( $line =~ /[\s|]?([A-Z]+-[0-9]+)[\s:|]?/ ); push @jiraIDs, $id if ( defined $id && $id !~ /^$/ ); }
This doesn't cope if someone specifies some text which contains the pattern inside another string - for example
blah_blah_ABC-123
would match upon ABC-123. I don't want to mandate that there must be a space or other delimiter in front of the match as that would fail if the identifier were at the start of the line.Can anyone suggest the necessary runes?
Thanks.
-
DaveG over 10 yearsThat doesn't quite work ... because the lookbehind is variable length (one character [\s] or none [^]) which causes a
Variable length lookbehind not implemented in regex
error. -
Rohit Jain over 10 years@DaveG Fixed it. Thanks :)
-
DaveG over 10 yearsGood point about the
[0-9]
matching 0001. I'll re-use the [1-9][0-9] aspect of your regex. Wouldn't your use of[:punct:]
mean that you would match "ABZ-123-foo"? FYI: JIRA doesn't truncate prefixes though - for example, we have one of out projects with a key of INCIDENT. -
Julius Musseau almost 9 yearsJIRA doesn't care about leading zeroes: issues.apache.org/jira/browse/CODEC-0000000069
-
grayaii over 6 yearsAny idea how to do it in python? The "Official JIRA ID Regex" and the "Improved JIRA ID Regex" cause a python error, "look-behind requires fixed-width pattern". The "Improved JIRA ID Regex" in python seems to be the best bet, but it matches things like 'INXX-2222s'[::-1]. Mabe this is worth a standalone question, rather than a comment?
-
rafasoares over 6 years@grayaii Ruby has the same problem, I solved it with the JavaScript method (reverse, match, reverse back). However, I prefer the official one (just remove the lowercase
a-z
), as it adds some tolerance to formatting errors (let's say the commit message was supposed to be "Fixed this\nABC-123", but, for some reason, you got "Fixed thisABC-123"). I bet that's the reasoning behind the official regex. -
rominf over 5 yearsWorks in Python too! Thank you!
-
rominf over 5 yearsFor matching project keys use:
my ( $id ) = ( $line =~ /(?:\s|^)([A-Z0-9_]+)(?=\s|$)/ );
-
Rounder over 5 yearsI had an issue with this because my JIRA Issue ID key was like this: ABC1-123 where it had a number after the letters to the left of the dash. I ended up with this regex that worked:
((?<!([A-Z])-?)[A-Za-z0-9_]+-\d+)
-
Kit Grose about 2 yearsThe regex in Jira itself changed, such that the "correct" regex for a Jira issue is apparently
[A-Z][A-Z0-9]+-[0-9]+
(the product code must be at least two characters long, must start with a letter and must be all uppercase or digits).