Remove duplicate JSON objects from list in python
Solution 1
You can easily remove duplicate keys by dictionary comprehension, since dictionary does not allow duplicate keys, as below-
te = [
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala",
"phone": "None"
},
{
"Name": "Bala1",
"phone": "None"
}
]
unique = { each['Name'] : each for each in te }.values()
print unique
Output-
[{'phone': 'None', 'Name': 'Bala1'}, {'phone': 'None', 'Name': 'Bala'}]
Solution 2
Because you can't add a dict
to set
. From this question:
You're trying to use a
dict
as a key to anotherdict
or in aset
. That does not work because the keys have to be hashable.As a general rule, only immutable objects (strings, integers, floats, frozensets, tuples of immutables) are hashable (though exceptions are possible).
>>> foo = dict()
>>> bar = set()
>>> bar.add(foo)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>>
Instead, you're already using if x not in seen
, so just use a list:
>>> te = [
... {
... "Name": "Bala",
... "phone": "None"
... },
... {
... "Name": "Bala",
... "phone": "None"
... },
... {
... "Name": "Bala",
... "phone": "None"
... },
... {
... "Name": "Bala",
... "phone": "None"
... }
... ]
>>> def removeduplicate(it):
... seen = []
... for x in it:
... if x not in seen:
... yield x
... seen.append(x)
>>> removeduplicate(te)
<generator object removeduplicate at 0x7f3578c71ca8>
>>> list(removeduplicate(te))
[{'phone': 'None', 'Name': 'Bala'}]
>>>
Solution 3
You can still use a set
for duplicate detection, you just need to convert the dictionary into something hashable such as a tuple
. Your dictionaries can be converted to tuples by tuple(d.items())
where d
is a dictionary. Applying that to your generator function:
def removeduplicate(it):
seen = set()
for x in it:
t = tuple(x.items())
if t not in seen:
yield x
seen.add(t)
>>> for d in removeduplicate(te):
... print(d)
{'phone': 'None', 'Name': 'Bala'}
>>> te.append({'Name': 'Bala', 'phone': '1234567890'})
>>> te.append({'Name': 'Someone', 'phone': '1234567890'})
>>> for d in removeduplicate(te):
... print(d)
{'phone': 'None', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Bala'}
{'phone': '1234567890', 'Name': 'Someone'}
This provides faster lookup (avg. O(1)) than a "seen" list
(O(n)). Whether it is worth the extra computation of converting every dict into a tuple depends on the number of dictionaries that you have and how many duplicates there are. If there are a lot of duplicates, a "seen" list
will grow quite large, and testing whether a dict has already been seen could become an expensive operation. This might justify the tuple conversion - you would have to test/profile it.
Tony Roczz
Updated on November 28, 2020Comments
-
Tony Roczz over 3 years
I have a list of dict where a particular value is repeated multiple times, and I would like to remove the duplicate values.
My list:
te = [ { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" }, { "Name": "Bala", "phone": "None" } ]
function to remove duplicate values:
def removeduplicate(it): seen = set() for x in it: if x not in seen: yield x seen.add(x)
When I call this function I get
generator object
.<generator object removeduplicate at 0x0170B6E8>
When I try to iterate over the generator I get
TypeError: unhashable type: 'dict'
Is there a way to remove the duplicate values or to iterate over the generator
-
Thomas Guyot-Sionnest over 8 yearsReally nice, I'll keep that in my backpocket. OTOH please note this is not exactly the same as the OP's function as he's checking the full dict, in your case you'll discard any dict that has the same Name, whenever different or not.
-
Thomas Guyot-Sionnest over 8 yearsActually, after testing, this would be more like it:
unique = { repr(each): each for each in te }.values()
-
mhawke over 8 yearsThe OP has accepted it, but I am not sure that this answer is correct considering that it replaces (from list
te
) previous dicts with later dicts, i.e. it loses data. E.g. ifte
contained another dict{'Name': 'Bala', 'phone': '1234'}
, only the last item inte
with nameBala
will be retained.