Regular expression to extract a JSON array

28,395

Solution 1

If the number of items in the array is limited (and manageable), you could define it with a finite number of optional items. Like this one with a maximum of 5 items:

"category":\["([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)"(?:,"([^"]*)")?)?)?)?

regex101 example here.

Regards.

Solution 2

Does this match your needs? It should match the category array regardless of its size.

"category":(\[.*?\])

regex101 example

Solution 3

JSON not a regular language. Since it allows arbitrary embedding of balanced
delimiters, it must be at least context-free.

For example, consider an array of arrays of arrays:

[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ]
Clearly you couldn't parse that with true regular expressions.
See This Topic: Regex for parsing single key: values out of JSON in Javascript Maybe Helpful for you.

Solution 4

Using a set of non-capturing group you can extract a predefined json array

regex answer: (?:\"category\":)(?:\[)(.*)(?:\"\])

That expression extract "category":["Jebb","Bush"], so access the first group to extract the array, sample java code:

Pattern pattern = Pattern.compile("(?:\"category\":)(?:\\[)(.*)(?:\"\\])");        
String body = "{\"device_types\":[\"smartphone\"],\"isps\":[\"a\",\"B\"],\"network_types\":[],\"countries\":[],\"category\":[\"Jebb\",\"Bush\"],\"carriers\":[],\"exclude_carriers\":[]}";
Matcher matcher = pattern.matcher(body);
assertThat(matcher.find(), is(true));
String[] categories = matcher.group(1).replaceAll("\"","").split(",");

assertThat(categories.length, is(2));
assertThat(categories[0], is("Jebb"));
assertThat(categories[1], is("Bush"));
Share:
28,395
GGGforce
Author by

GGGforce

Updated on July 29, 2022

Comments

  • GGGforce
    GGGforce over 1 year

    I'm trying to use a PCRE regular expression to extract some JSON. I'm using a version of MariaDB which does not have JSON functions but does have REGEX functions.

    My string is:

    {"device_types":["smartphone"],"isps":["a","B"],"network_types":[],"countries":[],"category":["Jebb","Bush"],"carriers":[],"exclude_carriers":[]}

    I want to grab the contents of category. I'd like a matching group that contains 2 items, Jebb and Bush (or however many items are in the array).

    I've tried this pattern but it only matches the first occurrence: /(?<=category":\[).([^"]*).*?(?=\])/g