Count how many times a dictionary value is found with more than one key

21,090

Solution 1

If I understand correctly, you want to count the counts of dictionary values. If the values are countable by collections.Counter, you just need to call Counter on the dictionaries values and then again on the first counter's values. Here is an example using a dictionary where the keys are range(100) and the values are random between 0 and 10:

from collections import Counter
d = dict(enumerate([str(random.randint(0, 10)) for _ in range(100)]))
counter = Counter(d.values())
counts_counter = Counter(counter.values())

EDIT:

After the sample dictionary was added to the question, you need to do the first count in a slightly different way (d is the dictionary in the question):

from collections import Counter
c = Counter()
for v in d.itervalues():
    c.update(set(v))
Counter(c.values())

Solution 2

You could use a Counter

>>>from collections import Counter
>>>d = dict(((1, 1), (2, 1), (3, 1), (4, 2), (5, 2), (6, 3), (7, 3)))
>>>d
{1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 3, 7: 3}
>>>Counter(d.values())
Counter({1: 3, 2: 2, 3: 2})
Share:
21,090
Jen
Author by

Jen

Learning python, ni! And now also learning Rrrrrrrrrr (And now some perl too!)

Updated on September 04, 2020

Comments

  • Jen
    Jen over 3 years

    I'm working in python. Is there a way to count how many times values in a dictionary are found with more than one key, and then return a count?

    So if for example I had 50 values and I ran a script to do this, I would get a count that would look something like this:

    1: 23  
    2: 15  
    3: 7  
    4: 5  
    

    The above would be telling me that 23 values appear in 1 key, 15 values appear in 2 keys, 7 values appear in 3 keys and 5 values appear in 4 keys.

    Also, would this question change if there were multiple values per key in my dictionary?

    Here is a sample of my dictionary (it's bacteria names):

    {'0': ['Pyrobaculum'], '1': ['Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium', 'Mycobacterium'], '3': ['Thermoanaerobacter', 'Thermoanaerobacter'], '2': ['Helicobacter', 'Mycobacterium'], '5': ['Thermoanaerobacter', 'Thermoanaerobacter'], '4': ['Helicobacter'], '7': ['Syntrophomonas'], '6': ['Gelria'], '9': ['Campylobacter', 'Campylobacter'], '8': ['Syntrophomonas'], '10': ['Desulfitobacterium', 'Mycobacterium']}

    So from this sample, there are 8 unique values, I the ideal feedback I would get be:

    1:4
    2:3
    3:1
    

    So 4 bacteria names are only in one key, 3 bacteria are found in two keys and 1 bacteria is found in three keys.