For each row in an R dataframe

416,484

Solution 1

You can try this, using apply() function

> d
  name plate value1 value2
1    A    P1      1    100
2    B    P2      2    200
3    C    P3      3    300

> f <- function(x, output) {
 wellName <- x[1]
 plateName <- x[2]
 wellID <- 1
 print(paste(wellID, x[3], x[4], sep=","))
 cat(paste(wellID, x[3], x[4], sep=","), file= output, append = T, fill = T)
}

> apply(d, 1, f, output = 'outputfile')

Solution 2

You can use the by() function:

by(dataFrame, seq_len(nrow(dataFrame)), function(row) dostuff)

But iterating over the rows directly like this is rarely what you want to; you should try to vectorize instead. Can I ask what the actual work in the loop is doing?

Solution 3

First, Jonathan's point about vectorizing is correct. If your getWellID() function is vectorized, then you can skip the loop and just use cat or write.csv:

write.csv(data.frame(wellid=getWellID(well$name, well$plate), 
         value1=well$value1, value2=well$value2), file=outputFile)

If getWellID() isn't vectorized, then Jonathan's recommendation of using by or knguyen's suggestion of apply should work.

Otherwise, if you really want to use for, you can do something like this:

for(i in 1:nrow(dataFrame)) {
    row <- dataFrame[i,]
    # do stuff with row
}

You can also try to use the foreach package, although it requires you to become familiar with that syntax. Here's a simple example:

library(foreach)
d <- data.frame(x=1:10, y=rnorm(10))
s <- foreach(d=iter(d, by='row'), .combine=rbind) %dopar% d

A final option is to use a function out of the plyr package, in which case the convention will be very similar to the apply function.

library(plyr)
ddply(dataFrame, .(x), function(x) { # do stuff })

Solution 4

I think the best way to do this with basic R is:

for( i in rownames(df) )
   print(df[i, "column1"])

The advantage over the for( i in 1:nrow(df))-approach is that you do not get into trouble if df is empty and nrow(df)=0.

Solution 5

I use this simple utility function:

rows = function(tab) lapply(
  seq_len(nrow(tab)),
  function(i) unclass(tab[i,,drop=F])
)

Or a faster, less clear form:

rows = function(x) lapply(seq_len(nrow(x)), function(i) lapply(x,"[",i))

This function just splits a data.frame to a list of rows. Then you can make a normal "for" over this list:

tab = data.frame(x = 1:3, y=2:4, z=3:5)
for (A in rows(tab)) {
    print(A$x + A$y * A$z)
}        

Your code from the question will work with a minimal modification:

for (well in rows(dataFrame)) {
  wellName <- well$name    # string like "H1"
  plateName <- well$plate  # string like "plate67"
  wellID <- getWellID(wellName, plateName)
  cat(paste(wellID, well$value1, well$value2, sep=","), file=outputFile)
}
Share:
416,484

Related videos on Youtube

Carl Coryell-Martin
Author by

Carl Coryell-Martin

Updated on December 15, 2020

Comments

  • Carl Coryell-Martin
    Carl Coryell-Martin over 3 years

    I have a dataframe, and for each row in that dataframe I have to do some complicated lookups and append some data to a file.

    The dataFrame contains scientific results for selected wells from 96 well plates used in biological research so I want to do something like:

    for (well in dataFrame) {
      wellName <- well$name    # string like "H1"
      plateName <- well$plate  # string like "plate67"
      wellID <- getWellID(wellName, plateName)
      cat(paste(wellID, well$value1, well$value2, sep=","), file=outputFile)
    }
    

    In my procedural world, I'd do something like:

    for (row in dataFrame) {
        #look up stuff using data from the row
        #write stuff to the file
    }
    

    What is the "R way" to do this?

    • Dirk Eddelbuettel
      Dirk Eddelbuettel over 14 years
      What is your question here? A data.frame is a two-dimensional object and looping over the rows is a perfectly normal way of doing things as rows are commonly sets of 'observations' of the 'variables' in each column.
    • Carl Coryell-Martin
      Carl Coryell-Martin over 14 years
      what I end up doing is: for (index in 1:nrow(dataFrame)) { row = dataFrame[index, ]; # do stuff with the row } which never seemed very pretty to me.
    • Shane
      Shane over 14 years
      Does getWellID call a database or anything? Otherwise, Jonathan is probably right and you could vectorize this.
  • Carl Coryell-Martin
    Carl Coryell-Martin over 14 years
    Shane, thank you. I'm not sure how to write a vectorized getWellID. What I need to do right now is to dig into an existing list of lists to look it up or pull it out of a database.
  • Shane
    Shane over 14 years
    Feel free to post the getWellID question (i.e. can this function be vectorized?) separately, and I'm sure I (or someone else) will answer it.
  • Jonathan Chang
    Jonathan Chang over 14 years
    Even if getWellID is not vectorized, I think you should go with this solution, and replace getWellId with mapply(getWellId, well$name, well$plate).
  • Shane
    Shane over 14 years
    Even if you pull it from a database, you can pull them all at once and then filter the result in R; that will be faster than an iterative function.
  • Darren Cook
    Darren Cook over 12 years
    Be careful, as the dataframe is converted to a matrix, and what you end up with (x) is a vector. This is why the above example has to use numeric indexes; the by() approach gives you a data.frame, which makes your code more robust.
  • Josh Bode
    Josh Bode over 11 years
    +1 for foreach - I'm going to use the hell out of that one.
  • sds
    sds about 11 years
    this will not work well if the data frame has 0 rows because 1:0 is not empty
  • Jim
    Jim almost 10 years
    Easy fix for the 0 row case is to use seq_len(), insert seq_len(nrow(dataFrame)) in place of 1:nrow(dataFrame).
  • Zahy
    Zahy almost 10 years
    did not work for me. The apply function treated every x given to f as a character value and not a row.
  • founddrama
    founddrama over 9 years
    Note too that you can refer to the columns by name. So: wellName <- x[1] could also be wellName <- x["name"].
  • SmallChess
    SmallChess over 8 years
    When Darren mentioned robust, he meant something like shifting the orders of the columns. This answer would not work whereas the one with by() would still work.
  • uh_big_mike_boi
    uh_big_mike_boi about 8 years
    How do you actually implement (row)? Is it dataframe$column? dataframe[somevariableNamehere]? How do you actually say its a row. The pseudocode "function(row) dostuff" how would that actually look?
  • Ł Łaniewski-Wołłk
    Ł Łaniewski-Wołłk about 8 years
    It's faster to access a straight list then a data.frame.
  • Ł Łaniewski-Wołłk
    Ł Łaniewski-Wołłk about 8 years
    Just realized it's even faster to make the same thing with double lapply: rows = function(x) lapply(seq_len(nrow(x)), function(i) lapply(x,function(c) c[i]))
  • Aaron McDaid
    Aaron McDaid about 8 years
    So the inner lapply iterates over the columns of the entire dataset x, giving each column the name c, and then extracting the ith entry from that column vector. Is this correct?
  • Steve Pitchers
    Steve Pitchers over 7 years
    Very nice! In my case, I had to convert from "factor" values to the underlying value: wellName <- as.character(well$name).
  • pwilcox
    pwilcox about 7 years
    @Mike, change dostuff in this answer to str(row) You'll see multiple lines printed in the console beginning with " 'data.frame': 1 obs of x variables." But be careful, changing dostuff to row does not return a data.frame object for the outer function as a whole. Instead it returns a list of one row data-frames.
  • stephanmg
    stephanmg over 3 years
    Not everything should be vectorized. But in this case it would make sense I guess.
  • Ken Williams
    Ken Williams over 3 years
    I fixed the issue noted by sds and Jim with an edit.