'Expected String or Unicode' when reading JSON with Pandas
If you print the json string to a file,
content = osm.read()
with open('/tmp/out', 'w') as f:
f.write(content)
you'll see something like this:
{
"version": 0.6,
"generator": "Overpass API",
"osm3s": {
"timestamp_osm_base": "2014-07-20T07:52:02Z",
"copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
},
"elements": [
{
"type": "node",
"id": 536694,
"lat": 50.9849256,
"lon": 13.6821776,
"tags": {
"highway": "bus_stop",
"name": "Niederhäslich Bergmannsweg"
}
},
...]}
If the JSON string were to be converted to a Python object, it would be a dict whose elements
key is a list of dicts. The vast majority of the data is inside this list of dicts.
This JSON string is not directly convertible to a Pandas object. What would be the index, and what would be the columns?
Surely you don't want [u'elements', u'version', u'osm3s', u'generator']
to be the columns, since almost all the information is in the elements
list-of-dicts.
But if you want the DataFrame to consist of the data only in the elements
list-of-dicts, then you'd have to specify that, since Pandas can't make that assumption for you.
Further complicating things is that each dict in elements
is a nested dict. Consider the first dict in elements
:
{
"type": "node",
"id": 536694,
"lat": 50.9849256,
"lon": 13.6821776,
"tags": {
"highway": "bus_stop",
"name": "Niederhäslich Bergmannsweg"
}
}
Should ['lat', 'lon', 'type', 'id', 'tags']
be the columns?
That seems plausible, except that the tags
column would end up being a column of dicts. That's usually not very useful. It would be nicer perhaps if the keys inside the tags
dict were made into columns. We can do that, but again we have to code it ourselves since Pandas has no way of knowing that's what we want.
import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232
# Rechts oben
maxLat = 51.1390
maxLon = 13.89873
osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)
osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:
for key, val in dct['tags'].iteritems():
dct[key] = val
del dct['tags']
osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())
yields
lat lon name
0 50.984926 13.682178 Niederhäslich Bergmannsweg
1 51.123623 13.782789 Sagarder Weg
2 51.065752 13.895734 Weißig, Einkaufszentrum
3 51.007140 13.698498 Stuttgarter Straße
4 51.010199 13.701411 Heilbronner Straße
Balzer82
Hi! I am using Python and love Maps. I am in data analytics and stuff.
Updated on June 05, 2022Comments
-
Balzer82 almost 2 years
I try to read an Openstreetmaps API output JSON string, which is valid.
I am using following code:
import pandas as pd import requests # Links unten minLat = 50.9549 minLon = 13.55232 # Rechts oben maxLat = 51.1390 maxLon = 13.89873 osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)} osmurl = 'http://overpass-api.de/api/interpreter' osm = requests.get(osmurl, params=osmrequest) osmdata = osm.json() osmdataframe = pd.read_json(osmdata)
which throws following error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-66-304b7fbfb645> in <module>() ----> 1 osmdataframe = pd.read_json(osmdata) /Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit) 196 obj = FrameParser(json, orient, dtype, convert_axes, convert_dates, 197 keep_default_dates, numpy, precise_float, --> 198 date_unit).parse() 199 200 if typ == 'series' or obj is None: /Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self) 264 265 else: --> 266 self._parse_no_numpy() 267 268 if self.obj is None: /Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self) 481 if orient == "columns": 482 self.obj = DataFrame( --> 483 loads(json, precise_float=self.precise_float), dtype=None) 484 elif orient == "split": 485 decoded = dict((str(k), v) TypeError: Expected String or Unicode
How to modify the request or Pandas
read_json
, to avoid an error? By the way, what's the problem? -
Balzer82 almost 10 yearsGreat explanation!! I understand this perfectly, but couldn't solve the riddle for my own. Thanks! In the line
osmdata = json.loads(osm.read())
might be something not correct, because I get:AttributeError: 'Response' object has no attribute 'read'
-
Balzer82 almost 10 yearsIt have to be
osmdata = json.loads(osm.content)
-
unutbu almost 10 years@Balzer82: My version of requests was too old; the API has changed since then. I've updated by
requests
and will edit the code above.