Pandas - Generate Unique ID based on row values
11,925
Solution 1
You can try using hash function.
df['id'] = df[['first', 'last']].sum(axis=1).map(hash)
Please note the hash id is greater than 10 digits and is a unique integer sequence.
Solution 2
Here's a way of doing using numpy
import numpy as np
np.random.seed(1)
# create a list of unique names
names = df[['first', 'last']].agg(' '.join, 1).unique().tolist()
# generte ids
ids = np.random.randint(low=1e9, high=1e10, size = len(names))
# maps ids to names
maps = {k:v for k,v in zip(names, ids)}
# add new id column
df['id'] = df[['first', 'last']].agg(' '.join, 1).map(maps)
index first last dob id
0 0 peter jones 20000101 9176146523
1 1 john doe 19870105 8292931172
2 2 adam smith 19441212 4108641136
3 3 john doe 19870105 8292931172
4 4 jenny fast 19640822 6385979058
Author by
swifty
Updated on August 05, 2022Comments
-
swifty almost 2 years
I would like to generate an integer-based unique ID for users (in my df).
Let's say I have:
index first last dob 0 peter jones 20000101 1 john doe 19870105 2 adam smith 19441212 3 john doe 19870105 4 jenny fast 19640822
I would like to generate an ID column like so:
index first last dob id 0 peter jones 20000101 1244821450 1 john doe 19870105 1742118427 2 adam smith 19441212 1841181386 3 john doe 19870105 1742118427 4 jenny fast 19640822 1687411973
10 digit ID, but it's based on the value of the fields (john doe identical row values get the same ID).
I've looked into hashing, encrypting, UUID's but can't find much related to this specific non-security use case. It's just about generating an internal identifier.
- I can't use groupby/cat code type methods in case the order of the rows change.
- The dataset won't grow beyond 50k rows.
- Safe to assume there won't be a first, last, dob duplicate.
Feel like I may be tackling this the wrong way as I can't find much literature on it!
Thanks
-
Jon Clements over 4 yearsDoes something like:
df.groupby(['first', 'last', 'dob'], sort=False).ngroup().apply('{:010}'.format)
do what you want? -
Mahendra Singh over 4 yearsYou can follow this thread to learn more about hashing stackoverflow.com/questions/16008670/…
-
swifty over 4 yearsThis is pretty nice though I'm getting some 9 digit ID's mixed in
-
RockStar over 4 yearsCan you share couple of string where 9 digits generated?
-
Umar.H over 4 yearswould you need to use
seed
to make the generation consistent? -
swifty over 4 years
Sarah Wood
,Tom Almond
-
RockStar over 4 yearsI have tested on multiple environments, it generating 10 digits only. Check on this link - onlinegdb.com/ByUhl5z48
-
RockStar over 4 years@swifty Add some code, you can use, test out, modify the same.
-
swifty over 4 yearsthis is bad code but should demonstrate it - onlinegdb.com/rJ6o_qGNU
-
RockStar over 4 years@swifty I tested your code with my updated function in the answer it works properly. Check - onlinegdb.com/B1tucqfN8
-
RockStar over 4 years@swifty Does it helped?