How to get the JSON STRING from the given string
Here are a few options:
Use
grep
with the-o
flag to print only the matching part of the line and filter withhead
to get the first match only:grep -o '"accountHeader[^}]*}' file.json | head -n1
The regular expression looks for a
"accountHeader
then as many as possible non-}
characters until the first}
. It's basically the same regex used in the other solutions below as well.Use
sed
with-n
to suppress printing andp
after the substitution to print only if the substitution was a success. Then, substitute (remove) everything but what you want:sed -n 's/.*\("accountHeader[^}]*}\).*/\1/p' file.json
Use Perl, the
-l
adds a newline to eachprint
call, the-n
means "process each line of the input file" and apply the script given by-e
:perl -lne '/.*("accountHeader[^}]*})/ && print $1' file.json
Use
awk
, this assumes that the**
in your example input are there to highlight the part you're interested in and don't actually exists in the real data. If so, this should work:awk -F'},' '{print $2"}"}' file.json
If the **
are actually there, things are even simpler, just use them as field delimiters:
awk -F'**' '{print $2"}"}' file.json
or
perl -F"\*\*" -alne 'print $F[1]' file.json
Related videos on Youtube
ahamedirshad123
Updated on September 18, 2022Comments
-
ahamedirshad123 almost 2 years
I have a string given below. I just need to get the first accountHeader (bolded) JSON string.
<START AdditionalInfo#:>[FormsGenerationServiceImpl::, accountNumber:ABC07667 , [Source System Request : {"Info":{"Holder": {"nameData": {"shortName": "McIntosh"}},**"accountHeader": {"a": "Y","b": "1","c": "4","draft": "P","e": "Y0000DU9","f": "T","g": "1"}**,"forms": {"maskSSN": "N","deliverForms": "G","selectedForms": {"T5": ["F10"],"T1": ["F1403"],"T2": ["F100001401"]}},"accountHeader": {"a": "Y","b": "1","c": "4","d": "HWA","draftRequestType": "P","e": "Y0000DU9","f": "T","g": "1"}}} ], null]<AdditionalInfo#: END>
My output should be
"accountHeader": {"a": "Y","b": "1","c": "4","draft": "P","e": "Y0000DU9","f": "T","g": "1"}
-
terdon over 10 yearsAre the
**
actually in the data or are you using them to highlight the part you're interested in? -
terdon over 10 yearsYes they are actually in the data? OK, then everything in my answer except the
awk
one should work.
-
-
ahamedirshad123 over 10 years1) these are the available pattern for grep - -hblcnsviw
-
terdon over 10 years@ahamedirshad123 1) you mean you have no
-o
? What operating system are you using? I guess OSX since that lacks the-o
. Please remember to mention your OS in your questions. 2) Thesed
works fine using the example you posted (at least on my system) but there might be a difference on OSX. The perl one should be the most portable. Does that work? -
Warren Young over 10 yearsParsing with regexes is brittle.
-
terdon over 10 years@WarrenYoung 1) this is not [XH]TML, the opening and closing tags are irrelevant 2) this is a single line, implementing a whole script with a language parser seems a little overkill don't you think? 3) There is a huge difference between parsing an entire file containing structured data and extracting only a single pattern from it. Regex is the perfect tool for this, what kind of language parser would you suggest for what the OP wants?
-
Warren Young over 10 yearsYour answers will work for this one line, today. Next week, the line contains
"accountHeader": {"a": "{boom}", ...
-
terdon over 10 years@WarrenYoung yes, but that was the question asked. The OP wanted the 1st account header. If the end objective is more complex, then I would have provided a different answer. If you want to post a moire complete solution, I would be happy to read and upvote it. As it stands, my answer(s) solve the problem posed and I don't see how you could write a more generalized solution without a lot more information on the format used by the OP. What if tomorrow the string of interest does not start with
accountHeader
? I ask again, what language parser would you use for this that warrants downvoting? -
Warren Young over 10 yearsIf this file format has a formal, well-defined grammar, there is a single, unambiguous way to parse the file, with no risk of breakage. Given the grammar, you can construct a parser using any of dozens of parser generators. As for belittling the risk: The question's data has clearly been sanitized. (
a
,b
,c
...) You don't really know what is in the source data. And since JSON allows almost anything in a string, the OP is saying that almost anything may be legal. -
ahamedirshad123 over 10 years@WarrenYoung I can assure you the format of the first account header in the JSON String will never change. I just made it as a,b,c instead of original data.
-
ahamedirshad123 over 10 years@terdon I thought ** would bold the string inside the code too :) I will try sed, and accept your answer if it works well. I pasted a single JSON string. My file contains more than 10000 such data. What I will do is just cut the field (totally 17 fields) using delimiter (pipe in my case) and send it as an input to the expression you provided. Is that okay? Any other way it can be done?
-
Warren Young over 10 years@ahamedirshad123: I'm not worried about the headers changing. I'm worried that the content of the JSON strings could contain
}
, which terdon's answer treats as a "stop" character. Generally speaking, regexes cannot cope with nested delimiters. -
ahamedirshad123 over 10 years@WarrenYoung Accountheader will always be like this only (no inner JSON strings): {"a": "Y","b": "1","c": "4","draft": "P","e": "Y0000DU9","f": "T","g": "1"} It just has basic info like account number, the type of account. I need few data from the string for metrics scorecard
-
Warren Young over 10 yearsOkay, I've done my Cassandra bit. Go forth and regex the world.
-
terdon over 10 years@WarrenYoung lol :). I'm not saying you're wrong (you're not) but given the lack of formal syntax and info here, regexes seem to be the way to go.