Pandas: create new column in df with random integers from range

104,516

Solution 1

One solution is to use numpy.random.randint:

import numpy as np
df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0])

Or if the numbers are non-consecutive (albeit slower), you can use this:

df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0])

In order to make the results reproducible you can set the seed with numpy.random.seed (e.g. np.random.seed(42))

Solution 2

To add a column of random integers, use randint(low, high, size). There's no need to waste memory allocating range(low, high); that could be a lot of memory if high is large.

df1['randNumCol'] = np.random.randint(0,5, size=len(df1))

Notes:

Share:
104,516
screechOwl
Author by

screechOwl

https://financenerd.blog/blog/

Updated on July 16, 2021

Comments

  • screechOwl
    screechOwl almost 3 years

    I have a pandas data frame with 50k rows. I'm trying to add a new column that is a randomly generated integer from 1 to 5.

    If I want 50k random numbers I'd use:

    df1['randNumCol'] = random.sample(xrange(50000), len(df1))
    

    but for this I'm not sure how to do it.

    Side note in R, I'd do:

    sample(1:5, 50000, replace = TRUE)
    

    Any suggestions?