How to get the JSON STRING from the given string

8,681

Here are a few options:

  1. Use grep with the -o flag to print only the matching part of the line and filter with head to get the first match only:

    grep -o '"accountHeader[^}]*}' file.json | head -n1 
    

    The regular expression looks for a "accountHeader then as many as possible non-} characters until the first }. It's basically the same regex used in the other solutions below as well.

  2. Use sed with -n to suppress printing and p after the substitution to print only if the substitution was a success. Then, substitute (remove) everything but what you want:

    sed -n  's/.*\("accountHeader[^}]*}\).*/\1/p' file.json 
    
  3. Use Perl, the -l adds a newline to each print call, the -n means "process each line of the input file" and apply the script given by -e:

    perl -lne  '/.*("accountHeader[^}]*})/ && print $1' file.json 
    
  4. Use awk, this assumes that the ** in your example input are there to highlight the part you're interested in and don't actually exists in the real data. If so, this should work:

    awk -F'},' '{print $2"}"}' file.json
    

If the ** are actually there, things are even simpler, just use them as field delimiters:

awk -F'**' '{print $2"}"}' file.json

or

perl -F"\*\*" -alne 'print $F[1]' file.json
Share:
8,681

Related videos on Youtube

ahamedirshad123
Author by

ahamedirshad123

Updated on September 18, 2022

Comments

  • ahamedirshad123
    ahamedirshad123 almost 2 years

    I have a string given below. I just need to get the first accountHeader (bolded) JSON string.

    <START AdditionalInfo#:>[FormsGenerationServiceImpl::,  accountNumber:ABC07667 ,  [Source System Request  : {"Info":{"Holder": {"nameData": {"shortName": "McIntosh"}},**"accountHeader": {"a": "Y","b": "1","c": "4","draft": "P","e": "Y0000DU9","f": "T","g": "1"}**,"forms": {"maskSSN": "N","deliverForms": "G","selectedForms": {"T5": ["F10"],"T1": ["F1403"],"T2": ["F100001401"]}},"accountHeader": {"a": "Y","b": "1","c": "4","d": "HWA","draftRequestType": "P","e": "Y0000DU9","f": "T","g": "1"}}} ], null]<AdditionalInfo#: END>
    

    My output should be

    "accountHeader": {"a": "Y","b": "1","c": "4","draft": "P","e": "Y0000DU9","f": "T","g": "1"}
    
    • terdon
      terdon over 10 years
      Are the ** actually in the data or are you using them to highlight the part you're interested in?
    • terdon
      terdon over 10 years
      Yes they are actually in the data? OK, then everything in my answer except the awk one should work.
  • ahamedirshad123
    ahamedirshad123 over 10 years
    1) these are the available pattern for grep - -hblcnsviw
  • terdon
    terdon over 10 years
    @ahamedirshad123 1) you mean you have no -o? What operating system are you using? I guess OSX since that lacks the -o. Please remember to mention your OS in your questions. 2) The sed works fine using the example you posted (at least on my system) but there might be a difference on OSX. The perl one should be the most portable. Does that work?
  • Warren Young
    Warren Young over 10 years
    Parsing with regexes is brittle.
  • terdon
    terdon over 10 years
    @WarrenYoung 1) this is not [XH]TML, the opening and closing tags are irrelevant 2) this is a single line, implementing a whole script with a language parser seems a little overkill don't you think? 3) There is a huge difference between parsing an entire file containing structured data and extracting only a single pattern from it. Regex is the perfect tool for this, what kind of language parser would you suggest for what the OP wants?
  • Warren Young
    Warren Young over 10 years
    Your answers will work for this one line, today. Next week, the line contains "accountHeader": {"a": "{boom}", ...
  • terdon
    terdon over 10 years
    @WarrenYoung yes, but that was the question asked. The OP wanted the 1st account header. If the end objective is more complex, then I would have provided a different answer. If you want to post a moire complete solution, I would be happy to read and upvote it. As it stands, my answer(s) solve the problem posed and I don't see how you could write a more generalized solution without a lot more information on the format used by the OP. What if tomorrow the string of interest does not start with accountHeader? I ask again, what language parser would you use for this that warrants downvoting?
  • Warren Young
    Warren Young over 10 years
    If this file format has a formal, well-defined grammar, there is a single, unambiguous way to parse the file, with no risk of breakage. Given the grammar, you can construct a parser using any of dozens of parser generators. As for belittling the risk: The question's data has clearly been sanitized. (a, b, c...) You don't really know what is in the source data. And since JSON allows almost anything in a string, the OP is saying that almost anything may be legal.
  • ahamedirshad123
    ahamedirshad123 over 10 years
    @WarrenYoung I can assure you the format of the first account header in the JSON String will never change. I just made it as a,b,c instead of original data.
  • ahamedirshad123
    ahamedirshad123 over 10 years
    @terdon I thought ** would bold the string inside the code too :) I will try sed, and accept your answer if it works well. I pasted a single JSON string. My file contains more than 10000 such data. What I will do is just cut the field (totally 17 fields) using delimiter (pipe in my case) and send it as an input to the expression you provided. Is that okay? Any other way it can be done?
  • Warren Young
    Warren Young over 10 years
    @ahamedirshad123: I'm not worried about the headers changing. I'm worried that the content of the JSON strings could contain }, which terdon's answer treats as a "stop" character. Generally speaking, regexes cannot cope with nested delimiters.
  • ahamedirshad123
    ahamedirshad123 over 10 years
    @WarrenYoung Accountheader will always be like this only (no inner JSON strings): {"a": "Y","b": "1","c": "4","draft": "P","e": "Y0000DU9","f": "T","g": "1"} It just has basic info like account number, the type of account. I need few data from the string for metrics scorecard
  • Warren Young
    Warren Young over 10 years
    Okay, I've done my Cassandra bit. Go forth and regex the world.
  • terdon
    terdon over 10 years
    @WarrenYoung lol :). I'm not saying you're wrong (you're not) but given the lack of formal syntax and info here, regexes seem to be the way to go.