Get a random sample of a dict

11,559

Solution 1

Given your example of:

dy = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

Then the sum of all the values is more simply put as:

s = sum(dy.values())

Then if it's not memory prohibitive, you can sample using:

import random

values = list(dy.values())
s = sum(random.sample(values, 2))

Or, since random.sample can take a set-like object, then:

from operator import itemgetter
import random

s = sum(itemgetter(*random.sample(dy.keys(), 2))(dy))

Or just use:

s = sum(dy[k] for k in random.sample(dy.keys(), 2))

An alternative is to use a heapq, eg:

import heapq
import random

s = sum(heapq.nlargest(2, dy.values(), key=lambda L: random.random()))

Solution 2

def sample_from_dict(d, sample=10):
    keys = random.sample(list(d), sample)
    values = [d[k] for k in keys]
    return dict(zip(keys, values))

Solution 3

Replace the range(10) with some randome sample from numphy

{v:rows[v] for v in [list(rows.keys())[k] for k in range(10)]}

Solution 4

This should be quicker than creating a new dict and checking if the keys are part of the sample:

import random    
sample_n = 1000
output_dict = dict(random.sample(input_dict.items(), sample_n))
Share:
11,559

Related videos on Youtube

user2988577
Author by

user2988577

Updated on June 04, 2022

Comments

  • user2988577
    user2988577 almost 2 years

    I'm working with a big dictionary and for some reason I also need to work on small random samples from that dictionary. How can I get this small sample (for example of length 2)?

    Here is a toy-model:

    dy={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
    

    I need to perform some task on dy which involves all the entries. Let us say, to simplify, I need to sum together all the values:

    s=0
    for key in dy.key:
        s=s+dy[key]
    

    Now, I also need to perform the same task on a random sample of dy; for that I need a random sample of the keys of dy. The simple solution I can imagine is

    sam=list(dy.keys())[:1]
    

    In that way I have a list of two keys of the dictionary which are somehow random. So, going back to may task, the only change I need in the code is:

    s=0
    for key in sam:
        s=s+dy[key]
    

    The point is I do not fully understand how dy.keys is constructed and then I can't foresee any future issue

  • Ismael Padilla
    Ismael Padilla over 4 years
    Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you’ve made.