Regex for parsing single key: values out of JSON in Javascript

64,184

I would strongly discourage you from doing this. JSON is not a regular language as clearly stated here: https://cstheory.stackexchange.com/questions/3987/is-json-a-regular-language

To quote from the above post:

For example, consider an array of arrays of arrays:

[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ] 

Clearly you couldn't parse that with true regular expressions.

I'd recommend converting your JSON to an object (JSON.parse) & implementing a find function to traverse the structure.

Other than that, you can take a look at guts of Douglas Crockford's json2.js parse method. Perhaps an altered version would allow you to search through the JSON string & just return the particular object you were looking for without converting the entire structure to an object. This is only useful if you never retrieve any other data from your JSON. If you do, you might as well have converted the whole thing to begin with.

EDIT

Just to further show how Regex breaks down, here's a regex that attempts to parse JSON

If you plug it into http://regexpal.com/ with "Dot Matches All" checked. You'll find that it can match some elements nicely like:

Regex

"Comments"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\") 

JSON Matched

"Comments": [
                { 
                    "User":"Fairy God Mother",
                    "Comment": "Ha, can't say I didn't see it coming"
                }
            ]

Regex

"Name"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")

JSON Matched

"Name": "Humpty"

However as soon as you start querying for the higher structures like "Posts", which has nested arrays, you'll find that you cannot correctly return the structure since the regex does not have context of which "]" is the designated end of the structure.

Regex

"Posts"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")

JSON Matched

"Posts": [
  {
      "Title": "How I fell",
      "Comments": [
          { 
              "User":"Fairy God Mother",
              "Comment": "Ha, can't say I didn't see it coming"
          }
      ]
Share:
64,184
AshHeskes
Author by

AshHeskes

Updated on July 09, 2022

Comments

  • AshHeskes
    AshHeskes almost 2 years

    I'm trying to see if it's possible to lookup individual keys out of a JSON string in Javascript and return it's Value with Regex. Sort of like building a JSON search tool.

    Imagine the following JSON

    "{
        "Name": "Humpty",
        "Age": "18",
        "Siblings" : ["Dracula", "Snow White", "Merlin"],
        "Posts": [
            {
                "Title": "How I fell",
                "Comments": [
                    { 
                        "User":"Fairy God Mother",
                        "Comment": "Ha, can't say I didn't see it coming"
                    }
                ]
            }
        ]
    }"
    

    I want to be able to search through the JSON string and only pull out individual properties.

    lets assume it's a function already, it would look something like.

    function getPropFromJSON(prop, JSONString){
        // Obviously this regex will only match Keys that have
        // String Values.
        var exp = new RegExp("\""+prop+"\"\:[^\,\}]*");
        return JSONString.match(exp)[0].replace("\""+prop+"\":","");    
    }
    

    It would return the substring of the Value for the Key.

    e.g.

    getPropFromJSON("Comments")
    
    > "[
        { 
            "User":"Fairy God Mother",
            "Comment": "Ha, can't say I didn't see it coming"
        }
    ]"
    

    If your wondering why I want to do this instead of using JSON.parse(), I'm building a JSON document store around localStorage. localStorage only supports key/value pairs, so I'm storing a JSON string of the entire Document in a unique Key. I want to be able to run a query on the documents, ideally without the overhead of JSON.parsing() the entire Collection of Documents then recursing over the Keys/nested Keys to find a match.

    I'm not the best at regex so I don't know how to do this, or if it's even possible with regex alone. This is only an experiment to find out if it's possible. Any other ideas as a solution would be appreciated.

  • AshHeskes
    AshHeskes over 12 years
    I had a look at the json2.js parse method earlier. It doesn't really do any kind of parsing. It just does a lot of replacing bad/dangerous/escaped characters/content/scripts so the JSON is clean. Then it just passes the clean string to eval();. I think your right on using the Regex alone thing. I'm going to try and use a combination of JS and Regex. I disagree on the converting the whole thing and traversing it, for my use case. It would be far too intensive on large collections || documents, not to mention searching and matching on multiple properties.
  • Brandon Boone
    Brandon Boone over 12 years
    Fair enough. Only other thing I could recommend (and I'm not an expert in this field) is to use a format that is relational data friendly. I'm assuming Ms-Sql, MySql, & Oracle have optimal ways of storing the data so reading, writing, comparing, & joining data is super fast (and I doubt it's stored as JSON). Just a thought.
  • JAAulde
    JAAulde over 12 years
    You should follow the advice in this answer and avoid doing this via any method other than properly deserializing the JSON and searching through the resulting structure.
  • Paul
    Paul almost 11 years
    If you put a finite fixed limit on the nesting depth of your JSON, it becomes a regular language, however the regex would be very ugly unless your limit is only 1 or 2.