How to assign a unique ID number to each group of identical values in a column
61,496
Solution 1
How about
df2 <- transform(df,id=as.numeric(factor(sample)))
?
I think this (cribbed from Add ID column by group) should be slightly more efficient, although perhaps a little harder to remember:
df3 <- transform(df, id=match(sample, unique(sample)))
all.equal(df2,df3) ## TRUE
If you want to do this in tidyverse:
library(dplyr)
df %>% group_by(sample) %>% mutate(id=cur_group_id())
Solution 2
Here's a data.table
solution
library(data.table)
setDT(df)[, id := .GRP, by = sample]
Author by
jjulip
Updated on July 09, 2022Comments
-
jjulip almost 2 years
I have a data frame with a number of columns. I would like to create a new column called “id” that gives a unique id number to each group of identical values in the “sample” column.
Example data:
# dput(df) df <- structure(list(index = 1:30, val = c(14L, 22L, 1L, 25L, 3L, 34L, 35L, 36L, 24L, 35L, 33L, 31L, 30L, 30L, 29L, 28L, 26L, 12L, 41L, 36L, 32L, 37L, 56L, 34L, 23L, 24L, 28L, 22L, 10L, 19L), sample = c(5L, 6L, 6L, 7L, 7L, 7L, 8L, 9L, 10L, 11L, 11L, 12L, 13L, 14L, 14L, 15L, 15L, 15L, 16L, 17L, 18L, 18L, 19L, 19L, 19L, 20L, 21L, 22L, 23L, 23L)), .Names = c("index", "val", "sample"), class = "data.frame", row.names = c(NA, -30L)) head(df) index val sample 1 1 14 5 2 2 22 6 3 3 1 6 4 4 25 7 5 5 3 7 6 6 34 7
What I would like to end up with:
index val sample id 1 1 14 5 1 2 2 22 6 2 3 3 1 6 2 4 4 25 7 3 5 5 3 7 3 6 6 34 7 3
-
Carl Witthoft about 10 yearsLove it: a use for
factors
that I can understand. :-) -
David Arenburg about 8 yearsJust a small note here: the
as.numeric(factor(sample))
method will only result in a descending numbers sequence ifsample
is already ordered. -
Will T-E over 7 yearsthe nice thing about the
factor()
solution is that it ignoresNA
values -
Alex over 3 years@Ben Bolker, thanks! can you write your code with
dplyr
? -
Ben Bolker over 3 yearsdid you see the comment above stackoverflow.com/questions/24119599/… ?
-
Alex over 3 years@Ben Bolker, I assumed I can write your code using
dplyr