Convert html source code to json object

51,350

jsonD = json.dumps(htmlContent.text) converts the raw HTML content into a JSON string representation. jsonL = json.loads(jsonD) parses the JSON string back into a regular string/unicode object. This results in a no-op, as any escaping done by dumps() is reverted by loads(). jsonL contains the same data as htmlContent.text.

Try to use json.dumps to generate your final JSON instead of building the JSON by hand:

ContentUrl = json.dumps({
    'url': str(urls),
    'uid': str(uniqueID),
    'page_content': htmlContent.text,
    'date': finalDate
})
Share:
51,350
Umesh Kaushik
Author by

Umesh Kaushik

Updated on August 22, 2020

Comments

  • Umesh Kaushik
    Umesh Kaushik over 3 years

    I am fetching html source code of many pages from one website, I need to convert it into json object and combine with other elements in json doc. . I have seen many questions on same topic but non of them were helpful.

    My code:

    url = "https://totalhash.cymru.com/analysis/?1ce201cf28c6dd738fd4e65da55242822111bd9f"
    htmlContent = requests.get(url, verify=False)
    data = htmlContent.text
    print("data",data)
    jsonD = json.dumps(htmlContent.text)
    jsonL = json.loads(jsonD)
    
    ContentUrl='{ \"url\" : \"'+str(urls)+'\" ,'+"\n"+' \"uid\" : \"'+str(uniqueID)+'\" ,\n\"page_content\" : \"'+jsonL+'\" , \n\"date\" : \"'+finalDate+'\"}'
    

    above code gives me unicode type, however, when I put that output in jsonLint it gives me invalid json error. Can somebody help me understand how can I convert the complete html into a json objet?

  • Umesh Kaushik
    Umesh Kaushik about 7 years
    It worked like charm. Thanks for making my understanding better as well. I clicked on accept answer, but have no idea why it is not working