Parse JavaScript with jsoup
39,750
Since jsoup isn't a javascript library you have two ways to solve this:
A. Use a javascript library
Pro:
- Full Javascript support
Con:
- Additional libraray / dependencies
B. Use Jsoup + manual parsing
Pro:
- No extra libraries required
- Enough for simple tasks
Con:
- Not as flexible as a javascript library
Here's an example how to get the key
with jsoupand some "manual" code:
Document doc = ...
Element script = doc.select("script").first(); // Get the script part
Pattern p = Pattern.compile("(?is)key=\"(.+?)\""); // Regex for the value of the key
Matcher m = p.matcher(script.html()); // you have to use html here and NOT text! Text will drop the 'key' part
while( m.find() )
{
System.out.println(m.group()); // the whole key ('key = value')
System.out.println(m.group(1)); // value only
}
Output (using your html part):
key="pqRjnA"
pqRjnA
Author by
ravi
Updated on November 15, 2021Comments
-
ravi over 2 years
In an
HTML
page, I want to pick the value of ajavascript
variable.
Below is the snippet ofHTML
page:<input id="hidval" value="" type="hidden"> <form method="post" style="padding: 0px;margin: 0px;" name="profile" autocomplete="off"> <input name="pqRjnA" id="pqRjnA" value="" type="hidden"> <script type="text/javascript"> key="pqRjnA"; </script>
My aim is to read the value of variable
key
from this page usingjsoup
.
Is it possible withjsoup
? If yes then how? -
Anil Kumar Pandey almost 10 yearsHey,
Jsoup + manual parsing
is very good solution for this, but breaking while I am using the js variable as array.eg: keyArray = [1, 2, 3]
can you please give me solution for this. -
ollo almost 10 yearsYou can use this regex instead:
(?s)(keyArray)\\s??=\\s??\\[(.*?)\\]
. If defined two groups: Group 1 = variable name, group 2 = value (those within[ ]
). -
user79307 over 9 yearsAnd What if I have something like
abc.xyz.init({requiredJsonObjectAsAnArgument});
inside script tags and I want to parse requiredJsonObjectAsAnArgument only. Can you suggest me the applicable regex for this case? -
ollo over 9 yearsPlease try
(?s)\\.init\\(\\{(.+?)\\}\\);
- group #1 contains therequiredJsonObjectAsAnArgument
.