How does one reorder columns in a data frame?

710,285

Solution 1

Your dataframe has four columns like so df[,c(1,2,3,4)]. Note the first comma means keep all the rows, and the 1,2,3,4 refers to the columns.

To change the order as in the above question do df2[,c(1,3,2,4)]

If you want to output this file as a csv, do write.csv(df2, file="somedf.csv")

Solution 2

# reorder by column name
data <- data[, c("A", "B", "C")] # leave the row index blank to keep all rows

#reorder by column index
data <- data[, c(1,3,2)] # leave the row index blank to keep all rows

Solution 3

You can also use the subset function:

data <- subset(data, select=c(3,2,1))

You should better use the [] operator as in the other answers, but it may be useful to know that you can do a subset and a column reorder operation in a single command.

Update:

You can also use the select function from the dplyr package:

data = data %>% select(Time, out, In, Files)

I am not sure about the efficiency, but thanks to dplyr's syntax this solution should be more flexible, specially if you have a lot of columns. For example, the following will reorder the columns of the mtcars dataset in the opposite order:

mtcars %>% select(carb:mpg)

And the following will reorder only some columns, and discard others:

mtcars %>% select(mpg:disp, hp, wt, gear:qsec, starts_with('carb'))

Read more about dplyr's select syntax.

Solution 4

As mentioned in this comment, the standard suggestions for re-ordering columns in a data.frame are generally cumbersome and error-prone, especially if you have a lot of columns.

This function allows to re-arrange columns by position: specify a variable name and the desired position, and don't worry about the other columns.

##arrange df vars by position
##'vars' must be a named vector, e.g. c("var.name"=1)
arrange.vars <- function(data, vars){
    ##stop if not a data.frame (but should work for matrices as well)
    stopifnot(is.data.frame(data))

    ##sort out inputs
    data.nms <- names(data)
    var.nr <- length(data.nms)
    var.nms <- names(vars)
    var.pos <- vars
    ##sanity checks
    stopifnot( !any(duplicated(var.nms)), 
               !any(duplicated(var.pos)) )
    stopifnot( is.character(var.nms), 
               is.numeric(var.pos) )
    stopifnot( all(var.nms %in% data.nms) )
    stopifnot( all(var.pos > 0), 
               all(var.pos <= var.nr) )

    ##prepare output
    out.vec <- character(var.nr)
    out.vec[var.pos] <- var.nms
    out.vec[-var.pos] <- data.nms[ !(data.nms %in% var.nms) ]
    stopifnot( length(out.vec)==var.nr )

    ##re-arrange vars by position
    data <- data[ , out.vec]
    return(data)
}

Now the OP's request becomes as simple as this:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5

arrange.vars(table, c("Out"=2))
##  Time Out In Files
##1    1   3  2     4
##2    2   4  3     5

To additionally swap Time and Files columns you can do this:

arrange.vars(table, c("Out"=2, "Files"=1, "Time"=4))
##  Files Out In Time
##1     4   3  2    1
##2     5   4  3    2

Solution 5

A dplyr solution (part of the tidyverse package set) is to use select:

select(table, "Time", "Out", "In", "Files") 

# or

select(table, Time, Out, In, Files)
Share:
710,285
Catherine
Author by

Catherine

Updated on July 08, 2022

Comments

  • Catherine
    Catherine almost 2 years

    How would one change this input (with the sequence: time, in, out, files):

    Time   In    Out  Files
    1      2     3    4
    2      3     4    5
    

    To this output (with the sequence: time, out, in, files)?

    Time   Out   In  Files
    1      3     2    4
    2      4     3    5
    

    Here's the dummy R data:

    table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
    table
    ##  Time In Out Files
    ##1    1  2   3     4
    ##2    2  3   4     5
    
  • Herman Toothrot
    Herman Toothrot over 10 years
    This is ok when you have a limited number of columns, but what if you have for example 50 columns, it would take too much time to type all column numbers or names. What would be a quicker solution?
  • dalloliogm
    dalloliogm about 10 years
    @user4050: in that case you can use the ":" syntax, e.g. df[,c(1,3,2,4,5:50)].
  • kasterma
    kasterma almost 10 years
    to put the columns in idcols at the start: idcols <- c("name", "id2", "start", "duration"); cols <- c(idcols, names(cts)[-which(names(cts) %in% idcols)]); df <- df[cols]
  • MERose
    MERose over 9 years
    There are some reasons not to use subset(), see this question.
  • dalloliogm
    dalloliogm over 9 years
    Thank you. In any case I would now use the select function from the dplyr package, instead of subset.
  • Bram Vanroy
    Bram Vanroy over 9 years
    Question as a beginner, can you combine ordering by index and by name? E.g. data <- data[c(1,3,"Var1", 2)]?
  • Terry Brown
    Terry Brown over 9 years
    @BramVanroy nope, c(1,3,"Var1", 2) will be read as c("1","3","Var1", "2") because vectors can contain data of only one type, so types are promoted to the most general type present. Because there are no columns with the character names "1", "3", etc. you'll get "undefined columns". list(1,3,"Var1", 2) keeps values without type promotion, but you can't use a list in the above context.
  • guyabel
    guyabel about 9 years
    When you want to bring a couple of columns to the left hand side and not drop the others, I find everything() particularly awesome; mtcars %>% select(wt, gear, everything())
  • landroni
    landroni over 8 years
    Why does the mtcars[c(1,3,2)] subsetting work? I would have expected an error relating to incorrect dimensions or similar... Shouldn't it be mtcars[,c(1,3,2)]?
  • petermeissner
    petermeissner over 8 years
    data.frames are lists under the hood with columns as first order items
  • arekolek
    arekolek about 8 years
    @user4050: you can also use df[,c(1,3,2,4:ncol(df))] when you don't know how many columns there are.
  • landroni
    landroni about 8 years
    @user4050 This answer proposes a solution that should be more convenient (and less error-prone) when dealing with large numbers of columns. It allows to specify the desired position of chosen variables, and not worry about the remaining variables, which will automatically be slotted in the remaining positions.
  • CoderGuy123
    CoderGuy123 almost 8 years
    Very nice function. I added a modified version of this function to my personal package.
  • Chris
    Chris almost 8 years
    You can also use dput(colnames(df)), it prints column names in R character format. You can then rearrange the names.
  • richiemorrisroe
    richiemorrisroe over 7 years
    @landroni that is a really good answer. It's a little verbose (I would pre-filter at the repl and use that), and in general, I think that df` > names(.) > grep "some_col_name_pattern >> df(names %in% .)" (untested) is more elegant. But nonetheless, your answer is more general (but more obscure) so thank you for making this answer better :)
  • landroni
    landroni over 7 years
    @richiemorrisroe Thanks for the feedback. I've now simplified slightly the answer which should make it more readable.
  • Arthur Yip
    Arthur Yip almost 7 years
    Here is another way to use the everything() select_helper function to rearrange the columns to the right/end. stackoverflow.com/a/44353144/4663008 github.com/tidyverse/dplyr/issues/2838 Seems like you will need to use 2 select()'s to move some columns to the right end and others to the left.
  • Garini
    Garini almost 6 years
    The best option for me. Even if I had to install it, it is clearly the clearest possibility.
  • Zachary Ryan Smith
    Zachary Ryan Smith almost 6 years
    !! WARNING !! data.table turns TARGET into an int vector: TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)] to fix that: TARGET <- as.data.frame(TARGET) TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]
  • Paul Rougieux
    Paul Rougieux almost 6 years
    Tidyverse (dplyr in fact) also has the option to select groups of columns, for example to move the Species variable to the front: select(iris, Species, everything()). Also note that quotes are not needed.
  • Mike Dolan Fliss
    Mike Dolan Fliss over 5 years
    Herman - if you've got 50 columns and you want to custom reorder them, use a helper csv file with a new column order, e.g. name_df$new_order (which you could construct by write_csv(data.frame(old_order = names(df), "name_df.csv")). Then mess with the order out of R and read it back in. Now you can df_reordered = df[, name_df$new_order]. Referencing columns by position number doesn't scale well as the number of columns goes up.
  • divibisan
    divibisan about 5 years
    It's important to note that this will drop all columns which are not explicitly specified unless you include everything() as in PaulRougieux's comment
  • Mrmoleje
    Mrmoleje almost 5 years
    This is really useful - it's going to save me a lot of time when I just want to move one column from the end of a really wide tibble to the beginning
  • Triamus
    Triamus almost 5 years
    pls state the library you take the function setcolorder from.
  • David Tonhofer
    David Tonhofer over 4 years
    dplyr's group will also rearrange the variables, so watch out when using that in a chain.
  • Arthur Yip
    Arthur Yip about 4 years
    new function dplyr::relocate is exactly for this. see H 1 's answer below
  • otteheng
    otteheng over 3 years
    As of dplyr version 1.0.0 they added a relocate() function that's intuitive and easy to read. It's especially helpful if you just want to add columns after or before a specific column.
  • Sandy
    Sandy almost 3 years
    That's a very neat solution. Thanks!
  • Dominique Paul
    Dominique Paul almost 2 years
    This is probably the most flexible and simple solution. Thanks!