Convert a HTML Table to JSON

35,475

Probably your data is something like:

html_data = """
<table>
  <tr>
    <td>Card balance</td>
    <td>$18.30</td>
  </tr>
  <tr>
    <td>Card name</td>
    <td>NAMEn</td>
  </tr>
  <tr>
    <td>Account holder</td>
    <td>NAME</td>
  </tr>
  <tr>
    <td>Card number</td>
    <td>1234</td>
  </tr>
  <tr>
    <td>Status</td>
    <td>Active</td>
  </tr>
</table>
"""

From which we can get your result as a list using this code:

from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")]
                         for row in BeautifulSoup(html_data)("tr")]

To convert the result to JSON, if you don't care about the order:

import json
print json.dumps(dict(table_data))

Result:

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": "$18.30"
}

If you need the same order, use this:

from collections import OrderedDict
import json
print json.dumps(OrderedDict(table_data))

Which gives you:

{
    "Card balance": "$18.30",
    "Card name": "NAMEn",
    "Account holder": "NAME",
    "Card number": "1234",
    "Status": "Active"
}
Share:
35,475
declanjscott
Author by

declanjscott

Updated on July 09, 2022

Comments

  • declanjscott
    declanjscott almost 2 years

    I'm trying to convert a table I have extracted via BeautifulSoup into JSON.

    So far I've managed to isolate all the rows, though I'm not sure how to work with the data from here. Any advice would be very much appreciated.

    [<tr><td><strong>Balance</strong></td><td><strong>$18.30</strong></td></tr>, 
    <tr><td>Card name</td><td>Name</td></tr>, 
    <tr><td>Account holder</td><td>NAME</td></tr>, 
    <tr><td>Card number</td><td>1234</td></tr>, 
    <tr><td>Status</td><td>Active</td></tr>]
    

    (Line breaks mine for readability)

    This was my attempt:

    result = []
    allrows = table.tbody.findAll('tr')
    for row in allrows:
        result.append([])
        allcols = row.findAll('td')
        for col in allcols:
            thestrings = [unicode(s) for s in col.findAll(text=True)]
            thetext = ''.join(thestrings)
            result[-1].append(thetext)
    

    which gave me the following result:

    [
     [u'Card balance', u'$18.30'],
     [u'Card name', u'NAMEn'],
     [u'Account holder', u'NAME'],
     [u'Card number', u'1234'],
     [u'Status', u'Active']
    ]
    
  • declanjscott
    declanjscott over 10 years
    Thanks a lot, I was getting an error which was due to the encoding of some characters in the server's response, once I figured that out your answer worked perfectly. Thanks again, and have a great day.