How does one reorder columns in a data frame?

r sorting dataframe r-faq

710,285

Solution 1

Your dataframe has four columns like so df[,c(1,2,3,4)]. Note the first comma means keep all the rows, and the 1,2,3,4 refers to the columns.

To change the order as in the above question do df2[,c(1,3,2,4)]

If you want to output this file as a csv, do write.csv(df2, file="somedf.csv")

Solution 2

# reorder by column name
data <- data[, c("A", "B", "C")] # leave the row index blank to keep all rows

#reorder by column index
data <- data[, c(1,3,2)] # leave the row index blank to keep all rows

Solution 3

You can also use the subset function:

data <- subset(data, select=c(3,2,1))

You should better use the [] operator as in the other answers, but it may be useful to know that you can do a subset and a column reorder operation in a single command.

Update:

You can also use the select function from the dplyr package:

data = data %>% select(Time, out, In, Files)

I am not sure about the efficiency, but thanks to dplyr's syntax this solution should be more flexible, specially if you have a lot of columns. For example, the following will reorder the columns of the mtcars dataset in the opposite order:

mtcars %>% select(carb:mpg)

And the following will reorder only some columns, and discard others:

mtcars %>% select(mpg:disp, hp, wt, gear:qsec, starts_with('carb'))

Solution 4

As mentioned in this comment, the standard suggestions for re-ordering columns in a data.frame are generally cumbersome and error-prone, especially if you have a lot of columns.

This function allows to re-arrange columns by position: specify a variable name and the desired position, and don't worry about the other columns.

##arrange df vars by position
##'vars' must be a named vector, e.g. c("var.name"=1)
arrange.vars <- function(data, vars){
    ##stop if not a data.frame (but should work for matrices as well)
    stopifnot(is.data.frame(data))

    ##sort out inputs
    data.nms <- names(data)
    var.nr <- length(data.nms)
    var.nms <- names(vars)
    var.pos <- vars
    ##sanity checks
    stopifnot( !any(duplicated(var.nms)), 
               !any(duplicated(var.pos)) )
    stopifnot( is.character(var.nms), 
               is.numeric(var.pos) )
    stopifnot( all(var.nms %in% data.nms) )
    stopifnot( all(var.pos > 0), 
               all(var.pos <= var.nr) )

    ##prepare output
    out.vec <- character(var.nr)
    out.vec[var.pos] <- var.nms
    out.vec[-var.pos] <- data.nms[ !(data.nms %in% var.nms) ]
    stopifnot( length(out.vec)==var.nr )

    ##re-arrange vars by position
    data <- data[ , out.vec]
    return(data)
}

Now the OP's request becomes as simple as this:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5

arrange.vars(table, c("Out"=2))
##  Time Out In Files
##1    1   3  2     4
##2    2   4  3     5

To additionally swap Time and Files columns you can do this:

arrange.vars(table, c("Out"=2, "Files"=1, "Time"=4))
##  Files Out In Time
##1     4   3  2    1
##2     5   4  3    2

Solution 5

A dplyr solution (part of the tidyverse package set) is to use select:

select(table, "Time", "Out", "In", "Files") 

# or

select(table, Time, Out, In, Files)

View more solutions

710,285

Author by

Catherine

Updated on July 08, 2022

Comments

Catherine almost 2 years

How would one change this input (with the sequence: time, in, out, files):

Time   In    Out  Files
1      2     3    4
2      3     4    5

To this output (with the sequence: time, out, in, files)?

Time   Out   In  Files
1      3     2    4
2      4     3    5

Here's the dummy R data:

table <- data.frame(Time=c(1,2), In=c(2,3), Out=c(3,4), Files=c(4,5))
table
##  Time In Out Files
##1    1  2   3     4
##2    2  3   4     5

Herman Toothrot over 10 years

This is ok when you have a limited number of columns, but what if you have for example 50 columns, it would take too much time to type all column numbers or names. What would be a quicker solution?
dalloliogm about 10 years

@user4050: in that case you can use the ":" syntax, e.g. df[,c(1,3,2,4,5:50)].
kasterma almost 10 years

to put the columns in idcols at the start: idcols <- c("name", "id2", "start", "duration"); cols <- c(idcols, names(cts)[-which(names(cts) %in% idcols)]); df <- df[cols]
MERose over 9 years

There are some reasons not to use subset(), see this question.
dalloliogm over 9 years

Thank you. In any case I would now use the select function from the dplyr package, instead of subset.
Bram Vanroy over 9 years

Question as a beginner, can you combine ordering by index and by name? E.g. data <- data[c(1,3,"Var1", 2)]?
Terry Brown over 9 years

@BramVanroy nope, c(1,3,"Var1", 2) will be read as c("1","3","Var1", "2") because vectors can contain data of only one type, so types are promoted to the most general type present. Because there are no columns with the character names "1", "3", etc. you'll get "undefined columns". list(1,3,"Var1", 2) keeps values without type promotion, but you can't use a list in the above context.
guyabel about 9 years

When you want to bring a couple of columns to the left hand side and not drop the others, I find everything() particularly awesome; mtcars %>% select(wt, gear, everything())
landroni over 8 years

Why does the mtcars[c(1,3,2)] subsetting work? I would have expected an error relating to incorrect dimensions or similar... Shouldn't it be mtcars[,c(1,3,2)]?
petermeissner over 8 years

data.frames are lists under the hood with columns as first order items
arekolek about 8 years

@user4050: you can also use df[,c(1,3,2,4:ncol(df))] when you don't know how many columns there are.
landroni about 8 years

@user4050 This answer proposes a solution that should be more convenient (and less error-prone) when dealing with large numbers of columns. It allows to specify the desired position of chosen variables, and not worry about the remaining variables, which will automatically be slotted in the remaining positions.
CoderGuy123 almost 8 years

Very nice function. I added a modified version of this function to my personal package.
Chris almost 8 years

You can also use dput(colnames(df)), it prints column names in R character format. You can then rearrange the names.
richiemorrisroe over 7 years

@landroni that is a really good answer. It's a little verbose (I would pre-filter at the repl and use that), and in general, I think that df` > names(.) > grep "some_col_name_pattern >> df(names %in% .)" (untested) is more elegant. But nonetheless, your answer is more general (but more obscure) so thank you for making this answer better :)
landroni over 7 years

@richiemorrisroe Thanks for the feedback. I've now simplified slightly the answer which should make it more readable.
Arthur Yip almost 7 years

Here is another way to use the everything() select_helper function to rearrange the columns to the right/end. stackoverflow.com/a/44353144/4663008 github.com/tidyverse/dplyr/issues/2838 Seems like you will need to use 2 select()'s to move some columns to the right end and others to the left.
Garini almost 6 years

The best option for me. Even if I had to install it, it is clearly the clearest possibility.
Zachary Ryan Smith almost 6 years

!! WARNING !! data.table turns TARGET into an int vector: TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)] to fix that: TARGET <- as.data.frame(TARGET) TARGET <- TARGET[ , order(colnames(TARGET), decreasing=TRUE)]
Paul Rougieux almost 6 years

Tidyverse (dplyr in fact) also has the option to select groups of columns, for example to move the Species variable to the front: select(iris, Species, everything()). Also note that quotes are not needed.
Mike Dolan Fliss over 5 years

Herman - if you've got 50 columns and you want to custom reorder them, use a helper csv file with a new column order, e.g. name_df$new_order (which you could construct by write_csv(data.frame(old_order = names(df), "name_df.csv")). Then mess with the order out of R and read it back in. Now you can df_reordered = df[, name_df$new_order]. Referencing columns by position number doesn't scale well as the number of columns goes up.
divibisan about 5 years

It's important to note that this will drop all columns which are not explicitly specified unless you include everything() as in PaulRougieux's comment
Mrmoleje almost 5 years

This is really useful - it's going to save me a lot of time when I just want to move one column from the end of a really wide tibble to the beginning
Triamus almost 5 years

pls state the library you take the function setcolorder from.
David Tonhofer over 4 years

dplyr's group will also rearrange the variables, so watch out when using that in a chain.
Arthur Yip about 4 years

new function dplyr::relocate is exactly for this. see H 1 's answer below
otteheng over 3 years

As of dplyr version 1.0.0 they added a relocate() function that's intuitive and easy to read. It's especially helpful if you just want to add columns after or before a specific column.
Sandy almost 3 years

That's a very neat solution. Thanks!
Dominique Paul almost 2 years

This is probably the most flexible and simple solution. Thanks!