Convert string to dict, then access key:values??? How to access data in a <class 'dict'> for Python?
Solution 1
Just ran into this problem. My solution:
import ast
import pandas as pd
df = pd.DataFrame(["{u'type': u'Point', u'coordinates': [-43,144]}","{u'type': u'Point', u'coordinates': [-34,34]}","{u'type': u'Point', u'coordinates': [-102,344]}"],columns=["Coordinates"])
df = df["Coordinates"].astype('str')
df = df.apply(lambda x: ast.literal_eval(x))
df = df.apply(pd.Series)
Solution 2
My first instinct is to use the json.loads
to cast the strings into dicts. But the example you've posted does not follow the json standard since it uses single instead of double quotes. So you have to convert the strings first.
A second option is to just use regex to parse the strings. If the dict strings in your actual DataFrame do not exactly match my examples, I expect the regex method to be more robust since lat/long coords are fairly standard.
import re
import pandasd as pd
df = pd.DataFrame(data={'Coordinates':["{u'type': u'Point', u'coordinates': [-43.30175, 123.45]}",
"{u'type': u'Point', u'coordinates': [-51.17913, 123.45]}"],
'idx': [130, 278]})
##
# Solution 1- use json.loads
##
def string_to_dict(dict_string):
# Convert to proper json format
dict_string = dict_string.replace("'", '"').replace('u"', '"')
return json.loads(dict_string)
df.CoordDicts = df.Coordinates.apply(string_to_dict)
df.CoordDicts[0]['coordinates']
#>>> [-43.30175, 123.45]
##
# Solution 2 - use regex
##
def get_lat_lon(dict_string):
# Get the coordinates string with regex
rs = re.search("(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)", dict_string).group()
# Cast to floats
coords = [float(x) for x in rs.split(',')]
return coords
df.Coords = df.Coordinates.apply(get_lat_lon)
df.Coords[0]
#>>> [-43.30175, 123.45]
Solution 3
Assuming you start with a Series of dicts, you can use the .tolist()
method to create a list of dicts and use this as input for a DataFrame. This approach will map each distinct key to a column.
You can filter by keys on creation by setting the columns
argument in pd.DataFrame()
, giving you the neat one-liner below. Hope that helps.
# Starting assumption:
data = ["{'coordinates': [-43.301755, -22.990065], 'type': 'Point', 'elevation': 1000}",
"{'coordinates': [-51.17913026, -30.01201896], 'type': 'Point'}"]
s = pd.Series(data).apply(eval)
# Create a DataFrame with a list of dicts with a selection of columns
pd.DataFrame(s.tolist(), columns=['coordinates'])
Out[1]:
coordinates
0 [-43.301755, -22.990065]
1 [-51.17913026, -30.01201896]
Linwoodc3
Updated on July 05, 2022Comments
-
Linwoodc3 almost 2 years
I am having issues accessing data inside a dictionary.
Sys: Macbook 2012
Python: Python 3.5.1 :: Continuum Analytics, Inc.I am working with a dask.dataframe created from a csv.
Edit Question
How I got to this point
Assume I start out with a Pandas Series:
df.Coordinates 130 {u'type': u'Point', u'coordinates': [-43.30175... 278 {u'type': u'Point', u'coordinates': [-51.17913... 425 {u'type': u'Point', u'coordinates': [-43.17986... 440 {u'type': u'Point', u'coordinates': [-51.16376... 877 {u'type': u'Point', u'coordinates': [-43.17986... 1313 {u'type': u'Point', u'coordinates': [-49.72688... 1734 {u'type': u'Point', u'coordinates': [-43.57405... 1817 {u'type': u'Point', u'coordinates': [-43.77649... 1835 {u'type': u'Point', u'coordinates': [-43.17132... 2739 {u'type': u'Point', u'coordinates': [-43.19583... 2915 {u'type': u'Point', u'coordinates': [-43.17986... 3035 {u'type': u'Point', u'coordinates': [-51.01583... 3097 {u'type': u'Point', u'coordinates': [-43.17891... 3974 {u'type': u'Point', u'coordinates': [-8.633880... 3983 {u'type': u'Point', u'coordinates': [-46.64960... 4424 {u'type': u'Point', u'coordinates': [-43.17986...
The problem is, this is not a true dataframe of dictionaries. Instead, it's a column full of strings that LOOK like dictionaries. Running this show it:
df.Coordinates.apply(type) 130 <class 'str'> 278 <class 'str'> 425 <class 'str'> 440 <class 'str'> 877 <class 'str'> 1313 <class 'str'> 1734 <class 'str'> 1817 <class 'str'> 1835 <class 'str'> 2739 <class 'str'> 2915 <class 'str'> 3035 <class 'str'> 3097 <class 'str'> 3974 <class 'str'> 3983 <class 'str'> 4424 <class 'str'>
My Goal: Access the
coordinates
key and value in the dictionary. That's it. But it's astr
I converted the strings to dictionaries using
eval
.new = df.Coordinates.apply(eval) 130 {'coordinates': [-43.301755, -22.990065], 'typ... 278 {'coordinates': [-51.17913026, -30.01201896], ... 425 {'coordinates': [-43.17986794, -22.91000096], ... 440 {'coordinates': [-51.16376782, -29.95488677], ... 877 {'coordinates': [-43.17986794, -22.91000096], ... 1313 {'coordinates': [-49.72688407, -29.33757253], ... 1734 {'coordinates': [-43.574057, -22.928059], 'typ... 1817 {'coordinates': [-43.77649254, -22.86940539], ... 1835 {'coordinates': [-43.17132318, -22.90895217], ... 2739 {'coordinates': [-43.1958313, -22.98755333], '... 2915 {'coordinates': [-43.17986794, -22.91000096], ... 3035 {'coordinates': [-51.01583481, -29.63593292], ... 3097 {'coordinates': [-43.17891379, -22.96476163], ... 3974 {'coordinates': [-8.63388008, 41.14594453], 't... 3983 {'coordinates': [-46.64960938, -23.55902666], ... 4424 {'coordinates': [-43.17986794, -22.91000096], ...
Next I text the type of object and get:
130 <class 'dict'> 278 <class 'dict'> 425 <class 'dict'> 440 <class 'dict'> 877 <class 'dict'> 1313 <class 'dict'> 1734 <class 'dict'> 1817 <class 'dict'> 1835 <class 'dict'> 2739 <class 'dict'> 2915 <class 'dict'> 3035 <class 'dict'> 3097 <class 'dict'> 3974 <class 'dict'> 3983 <class 'dict'> 4424 <class 'dict'>
If I try to access my dictionaries: new.apply(lambda x: x['coordinates']
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-71-c0ad459ed1cc> in <module>() ----> 1 dfCombined.Coordinates.apply(coord_getter) /Users/linwood/anaconda/envs/dataAnalysisWithPython/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 2218 else: 2219 values = self.asobject -> 2220 mapped = lib.map_infer(values, f, convert=convert_dtype) 2221 2222 if len(mapped) and isinstance(mapped[0], Series): pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:62658)() <ipython-input-68-748ce2d8529e> in coord_getter(row) 1 import ast 2 def coord_getter(row): ----> 3 return (ast.literal_eval(row))['coordinates'] TypeError: 'bool' object is not subscriptable
It's some type of class, because when I run
dir
I get this for one object:new.apply(lambda x: dir(x))[130] 130 __class__ 130 __contains__ 130 __delattr__ 130 __delitem__ 130 __dir__ 130 __doc__ 130 __eq__ 130 __format__ 130 __ge__ 130 __getattribute__ 130 __getitem__ 130 __gt__ 130 __hash__ 130 __init__ 130 __iter__ 130 __le__ 130 __len__ 130 __lt__ 130 __ne__ 130 __new__ 130 __reduce__ 130 __reduce_ex__ 130 __repr__ 130 __setattr__ 130 __setitem__ 130 __sizeof__ 130 __str__ 130 __subclasshook__ 130 clear 130 copy 130 fromkeys 130 get 130 items 130 keys 130 pop 130 popitem 130 setdefault 130 update 130 values Name: Coordinates, dtype: object
My Problem: I just want to access the dictionary. But, the object is
<class 'dict'>
. How do I covert this to a regular dict or just access the key:value pairs?Any ideas??
-
Linwoodc3 almost 8 yearsThanks for the help @piRSquared, but that gave me the same error. I added more information above. When I run
dir
on the objects, it's some type of class. Any suggestions? -
andrew almost 8 years@Linwoodc3, FYI, on my system, your method of using
eval
works with my example DataFrame. I am using Python 2.7. Despite the version differences, I expect the regex solution to still work. -
Linwoodc3 almost 8 yearsSorry, just came back. Will check!
-
Linwoodc3 almost 8 yearsGot an error again. "TypeError: expected string or bytes-like object"
-
fpersyn over 4 yearsNote - The dicts in your list do not need to be of the same length for this to work. Dicts may miss multiple keys that are present in other dicts and vice versa. For example, when you run
pd.DataFrame(s.tolist())
you will notice thatelevation
is set toNaN
in the second row. -
szeitlin over 2 yearsSo the string.replace for the quotes, followed by json.loads, works in my case. However, I think this shouldn't happen - in my case the original data was formatted correctly as dictionaries, and only got coerced to strings after I wrote it out to CSV and read it back in.