Create an ID (row number) column

122,752

Solution 1

You could use cbind:

d <- data.frame(V1=c(23, 45, 56), V2=c(45, 45, 67))

## enter id here, you could also use 1:nrow(d) instead of rownames
id <- rownames(d)
d <- cbind(id=id, d)

## set colnames to OP's wishes
colnames(d) <- paste0("V", 1:ncol(d))

EDIT: Here a comparison of @dacko suggestions. d$id <- seq_len(nrow(d) is slightly faster, but the order of the columns is different (id is the last column; reorder them seems to be slower than using cbind):

library("microbenchmark")

set.seed(1)
d <- data.frame(V1=rnorm(1e6), V2=rnorm(1e6))

cbindSeqLen <- function(x) {
  return(cbind(id=seq_len(nrow(x)), x))
}

dickoa <- function(x) {
  x$id <- seq_len(nrow(x))
  return(x)
}

dickoaReorder <- function(x) {
  x$id <- seq_len(nrow(x))
  nc <- ncol(x)
  x <- x[, c(nc, 1:(nc-1))]
  return(x)
}

microbenchmark(cbindSeqLen(d), dickoa(d), dickoaReorder(d), times=100)

# Unit: milliseconds
#             expr      min       lq   median       uq      max neval
#   cbindSeqLen(d) 23.00683 38.54196 40.24093 42.60020 47.73816   100
#        dickoa(d) 10.70718 36.12495 37.58526 40.22163 72.92796   100
# dickoaReorder(d) 19.25399 68.46162 72.45006 76.51468 88.99620   100

Solution 2

Two alternatives (using sgibb's example data):

tibble::rowid_to_column(d, "ID")

which gives:

  ID V1 V2
1  1 23 45
2  2 45 45
3  3 56 67

Or:

dplyr::mutate(d, ID = row_number())

which gives:

  V1 V2 ID
1 23 45  1
2 45 45  2
3 56 67  3

As you can see, the rowid_to_column-function adds the new column in front of the other ones while the mutate&row_number()-combo adds the new column after the others.


And another base R alternative:

d$ID <- seq_along(d[,1])

Solution 3

Many presented their ideas, but I think this is the sortest and simplest code for this task:

data$ID <- 1:nrow(data)

One line. The one and only.

Solution 4

You could also do this using dplyr:

DF <- mutate(DF, id = rownames(DF))

Solution 5

data.table solution

Easier syntax and much faster

library(data.table)

dt <- data.table(V1=c(23, 45, 56), V2=c(45, 45, 67))

setnames(dt, c("V2", "V3")) # changing column names
dt[, V1 := .I] # Adding ID column
Share:
122,752
user10745
Author by

user10745

Updated on July 09, 2022

Comments

  • user10745
    user10745 almost 2 years

    I need to create a column with unique ID, basically add the row number as an own column. My current data frame looks like this:

       V1  V2
    1  23  45
    2  45  45
    3  56  67
    

    How to make it look like this:

    V1  V2  V3
    1  23  45
    2  45  45
    3  56  67
    

    ?

    Many thanks

  • dickoa
    dickoa about 11 years
    Why not d$id <- seq_len(nrow(d)) and then colnames(d) <- paste0("V", 1:ncol(d))
  • sgibb
    sgibb about 11 years
    @dickoa: I just have not thought of it. Please see my edit. Your solution is a bit faster but doesn't preserve the order of the columns (but this isn't important in most cases).
  • James Hirschorn
    James Hirschorn almost 6 years
    Curiously, the mutate and seq_along solutions do not work for data.table.
  • Eric Lino
    Eric Lino almost 6 years
    True, but if your data has 0 rows, then I guess you have no data at all. Therefore, why would you need to create an ID for it?
  • James Hirschorn
    James Hirschorn almost 6 years
    In my case, it was inside a function call where the dataframe is passed as an argument and is not known in advance. Could have 10 rows one time, 0 the next.
  • Admin
    Admin almost 5 years
    This worked perfectly for me. Used arrange() first, and then applied 1:nrow() creating a new variable of sequential IDs. Thank you for this simple solution.
  • Jaap
    Jaap over 4 years
    @JamesHirschorn Besides the method as shown by @altabq (which is the preferred one for data.table), you could do seq_along(d[[1]]) when d is a data.table.
  • PolII
    PolII almost 3 years
    data <- cbind(data, 1:nrow(data)) and then followed by names(data)[names(data)=="1:nrow(data)"] <- "ID" would be the Wikibooks way of doing it.
  • zx8754
    zx8754 over 2 years
    There is a big assumption that rownames are numeric 1:n.
  • hyman
    hyman about 2 years
    Suggestion: relocate(id) instead of the select statement is more consicse.