For each row in an R dataframe

r dataframe rows

416,484

Solution 1

You can try this, using apply() function

> d
  name plate value1 value2
1    A    P1      1    100
2    B    P2      2    200
3    C    P3      3    300

> f <- function(x, output) {
 wellName <- x[1]
 plateName <- x[2]
 wellID <- 1
 print(paste(wellID, x[3], x[4], sep=","))
 cat(paste(wellID, x[3], x[4], sep=","), file= output, append = T, fill = T)
}

> apply(d, 1, f, output = 'outputfile')

Solution 2

You can use the by() function:

by(dataFrame, seq_len(nrow(dataFrame)), function(row) dostuff)

But iterating over the rows directly like this is rarely what you want to; you should try to vectorize instead. Can I ask what the actual work in the loop is doing?

Solution 3

First, Jonathan's point about vectorizing is correct. If your getWellID() function is vectorized, then you can skip the loop and just use cat or write.csv:

write.csv(data.frame(wellid=getWellID(well$name, well$plate), 
         value1=well$value1, value2=well$value2), file=outputFile)

If getWellID() isn't vectorized, then Jonathan's recommendation of using by or knguyen's suggestion of apply should work.

Otherwise, if you really want to use for, you can do something like this:

for(i in 1:nrow(dataFrame)) {
    row <- dataFrame[i,]
    # do stuff with row
}

You can also try to use the foreach package, although it requires you to become familiar with that syntax. Here's a simple example:

library(foreach)
d <- data.frame(x=1:10, y=rnorm(10))
s <- foreach(d=iter(d, by='row'), .combine=rbind) %dopar% d

A final option is to use a function out of the plyr package, in which case the convention will be very similar to the apply function.

library(plyr)
ddply(dataFrame, .(x), function(x) { # do stuff })

Solution 4

I think the best way to do this with basic R is:

for( i in rownames(df) )
   print(df[i, "column1"])

The advantage over the for( i in 1:nrow(df))-approach is that you do not get into trouble if df is empty and nrow(df)=0.

Solution 5

I use this simple utility function:

rows = function(tab) lapply(
  seq_len(nrow(tab)),
  function(i) unclass(tab[i,,drop=F])
)

Or a faster, less clear form:

rows = function(x) lapply(seq_len(nrow(x)), function(i) lapply(x,"[",i))

This function just splits a data.frame to a list of rows. Then you can make a normal "for" over this list:

tab = data.frame(x = 1:3, y=2:4, z=3:5)
for (A in rows(tab)) {
    print(A$x + A$y * A$z)
}

Your code from the question will work with a minimal modification:

for (well in rows(dataFrame)) {
  wellName <- well$name    # string like "H1"
  plateName <- well$plate  # string like "plate67"
  wellID <- getWellID(wellName, plateName)
  cat(paste(wellID, well$value1, well$value2, sep=","), file=outputFile)
}

View more solutions

416,484

Carl Coryell-Martin

Updated on December 15, 2020

Comments

Carl Coryell-Martin over 3 years
I have a dataframe, and for each row in that dataframe I have to do some complicated lookups and append some data to a file.

The dataFrame contains scientific results for selected wells from 96 well plates used in biological research so I want to do something like:
```
for (well in dataFrame) {
  wellName <- well$name    # string like "H1"
  plateName <- well$plate  # string like "plate67"
  wellID <- getWellID(wellName, plateName)
  cat(paste(wellID, well$value1, well$value2, sep=","), file=outputFile)
}
```
In my procedural world, I'd do something like:
```
for (row in dataFrame) {
    #look up stuff using data from the row
    #write stuff to the file
}
```
What is the "R way" to do this?
- Dirk Eddelbuettel over 14 years
  
  What is your question here? A data.frame is a two-dimensional object and looping over the rows is a perfectly normal way of doing things as rows are commonly sets of 'observations' of the 'variables' in each column.
- Carl Coryell-Martin over 14 years
  
  what I end up doing is: for (index in 1:nrow(dataFrame)) { row = dataFrame[index, ]; # do stuff with the row } which never seemed very pretty to me.
- Shane over 14 years
  
  Does getWellID call a database or anything? Otherwise, Jonathan is probably right and you could vectorize this.
Carl Coryell-Martin over 14 years

Shane, thank you. I'm not sure how to write a vectorized getWellID. What I need to do right now is to dig into an existing list of lists to look it up or pull it out of a database.
Shane over 14 years

Feel free to post the getWellID question (i.e. can this function be vectorized?) separately, and I'm sure I (or someone else) will answer it.
Jonathan Chang over 14 years

Even if getWellID is not vectorized, I think you should go with this solution, and replace getWellId with mapply(getWellId, well$name, well$plate).
Shane over 14 years

Even if you pull it from a database, you can pull them all at once and then filter the result in R; that will be faster than an iterative function.
Darren Cook over 12 years

Be careful, as the dataframe is converted to a matrix, and what you end up with (x) is a vector. This is why the above example has to use numeric indexes; the by() approach gives you a data.frame, which makes your code more robust.
Josh Bode over 11 years

+1 for foreach - I'm going to use the hell out of that one.
sds about 11 years

this will not work well if the data frame has 0 rows because 1:0 is not empty
Jim almost 10 years

Easy fix for the 0 row case is to use seq_len(), insert seq_len(nrow(dataFrame)) in place of 1:nrow(dataFrame).
Zahy almost 10 years

did not work for me. The apply function treated every x given to f as a character value and not a row.
founddrama over 9 years

Note too that you can refer to the columns by name. So: wellName <- x[1] could also be wellName <- x["name"].
SmallChess over 8 years

When Darren mentioned robust, he meant something like shifting the orders of the columns. This answer would not work whereas the one with by() would still work.
uh_big_mike_boi about 8 years

How do you actually implement (row)? Is it dataframe$column? dataframe[somevariableNamehere]? How do you actually say its a row. The pseudocode "function(row) dostuff" how would that actually look?
Ł Łaniewski-Wołłk about 8 years

It's faster to access a straight list then a data.frame.
Ł Łaniewski-Wołłk about 8 years

Just realized it's even faster to make the same thing with double lapply: rows = function(x) lapply(seq_len(nrow(x)), function(i) lapply(x,function(c) c[i]))
Aaron McDaid about 8 years

So the inner lapply iterates over the columns of the entire dataset x, giving each column the name c, and then extracting the ith entry from that column vector. Is this correct?
Steve Pitchers over 7 years

Very nice! In my case, I had to convert from "factor" values to the underlying value: wellName <- as.character(well$name).
pwilcox about 7 years

@Mike, change dostuff in this answer to str(row) You'll see multiple lines printed in the console beginning with " 'data.frame': 1 obs of x variables." But be careful, changing dostuff to row does not return a data.frame object for the outer function as a whole. Instead it returns a list of one row data-frames.
stephanmg over 3 years

Not everything should be vectorized. But in this case it would make sense I guess.
Ken Williams over 3 years

I fixed the issue noted by sds and Jim with an edit.