How to assign a unique ID number to each group of identical values in a column

61,496

Solution 1

How about

df2 <- transform(df,id=as.numeric(factor(sample)))

?

I think this (cribbed from Add ID column by group) should be slightly more efficient, although perhaps a little harder to remember:

df3 <- transform(df, id=match(sample, unique(sample)))
all.equal(df2,df3)  ## TRUE

If you want to do this in tidyverse:

library(dplyr)
df %>% group_by(sample) %>% mutate(id=cur_group_id())

Solution 2

Here's a data.table solution

library(data.table)
setDT(df)[, id := .GRP, by = sample]
Share:
61,496
jjulip
Author by

jjulip

Updated on July 09, 2022

Comments

  • jjulip
    jjulip almost 2 years

    I have a data frame with a number of columns. I would like to create a new column called “id” that gives a unique id number to each group of identical values in the “sample” column.

    Example data:

    # dput(df)
    df <- structure(list(index = 1:30, val = c(14L, 22L, 1L, 25L, 3L, 34L, 
    35L, 36L, 24L, 35L, 33L, 31L, 30L, 30L, 29L, 28L, 26L, 12L, 41L, 
    36L, 32L, 37L, 56L, 34L, 23L, 24L, 28L, 22L, 10L, 19L), sample = c(5L, 
    6L, 6L, 7L, 7L, 7L, 8L, 9L, 10L, 11L, 11L, 12L, 13L, 14L, 14L, 
    15L, 15L, 15L, 16L, 17L, 18L, 18L, 19L, 19L, 19L, 20L, 21L, 22L, 
    23L, 23L)), .Names = c("index", "val", "sample"), class = "data.frame", 
    row.names = c(NA, -30L))
    
    head(df)
      index val sample 
    1     1  14      5  
    2     2  22      6  
    3     3   1      6  
    4     4  25      7  
    5     5   3      7  
    6     6  34      7  
    

    What I would like to end up with:

      index val sample id
    1     1  14      5  1
    2     2  22      6  2
    3     3   1      6  2
    4     4  25      7  3
    5     5   3      7  3
    6     6  34      7  3
    
  • Carl Witthoft
    Carl Witthoft about 10 years
    Love it: a use for factors that I can understand. :-)
  • David Arenburg
    David Arenburg about 8 years
    Just a small note here: the as.numeric(factor(sample)) method will only result in a descending numbers sequence if sample is already ordered.
  • Will T-E
    Will T-E over 7 years
    the nice thing about the factor() solution is that it ignores NA values
  • Alex
    Alex over 3 years
    @Ben Bolker, thanks! can you write your code with dplyr?
  • Ben Bolker
    Ben Bolker over 3 years
    did you see the comment above stackoverflow.com/questions/24119599/… ?
  • Alex
    Alex over 3 years
    @Ben Bolker, I assumed I can write your code using dplyr