For each row in an R dataframe
Solution 1
You can try this, using apply()
function
> d
name plate value1 value2
1 A P1 1 100
2 B P2 2 200
3 C P3 3 300
> f <- function(x, output) {
wellName <- x[1]
plateName <- x[2]
wellID <- 1
print(paste(wellID, x[3], x[4], sep=","))
cat(paste(wellID, x[3], x[4], sep=","), file= output, append = T, fill = T)
}
> apply(d, 1, f, output = 'outputfile')
Solution 2
You can use the by()
function:
by(dataFrame, seq_len(nrow(dataFrame)), function(row) dostuff)
But iterating over the rows directly like this is rarely what you want to; you should try to vectorize instead. Can I ask what the actual work in the loop is doing?
Solution 3
First, Jonathan's point about vectorizing is correct. If your getWellID() function is vectorized, then you can skip the loop and just use cat or write.csv:
write.csv(data.frame(wellid=getWellID(well$name, well$plate),
value1=well$value1, value2=well$value2), file=outputFile)
If getWellID() isn't vectorized, then Jonathan's recommendation of using by
or knguyen's suggestion of apply
should work.
Otherwise, if you really want to use for
, you can do something like this:
for(i in 1:nrow(dataFrame)) {
row <- dataFrame[i,]
# do stuff with row
}
You can also try to use the foreach
package, although it requires you to become familiar with that syntax. Here's a simple example:
library(foreach)
d <- data.frame(x=1:10, y=rnorm(10))
s <- foreach(d=iter(d, by='row'), .combine=rbind) %dopar% d
A final option is to use a function out of the plyr
package, in which case the convention will be very similar to the apply function.
library(plyr)
ddply(dataFrame, .(x), function(x) { # do stuff })
Solution 4
I think the best way to do this with basic R is:
for( i in rownames(df) )
print(df[i, "column1"])
The advantage over the for( i in 1:nrow(df))
-approach is that you do not get into trouble if df
is empty and nrow(df)=0
.
Solution 5
I use this simple utility function:
rows = function(tab) lapply(
seq_len(nrow(tab)),
function(i) unclass(tab[i,,drop=F])
)
Or a faster, less clear form:
rows = function(x) lapply(seq_len(nrow(x)), function(i) lapply(x,"[",i))
This function just splits a data.frame to a list of rows. Then you can make a normal "for" over this list:
tab = data.frame(x = 1:3, y=2:4, z=3:5)
for (A in rows(tab)) {
print(A$x + A$y * A$z)
}
Your code from the question will work with a minimal modification:
for (well in rows(dataFrame)) {
wellName <- well$name # string like "H1"
plateName <- well$plate # string like "plate67"
wellID <- getWellID(wellName, plateName)
cat(paste(wellID, well$value1, well$value2, sep=","), file=outputFile)
}
Related videos on Youtube
Carl Coryell-Martin
Updated on December 15, 2020Comments
-
Carl Coryell-Martin over 3 years
I have a dataframe, and for each row in that dataframe I have to do some complicated lookups and append some data to a file.
The dataFrame contains scientific results for selected wells from 96 well plates used in biological research so I want to do something like:
for (well in dataFrame) { wellName <- well$name # string like "H1" plateName <- well$plate # string like "plate67" wellID <- getWellID(wellName, plateName) cat(paste(wellID, well$value1, well$value2, sep=","), file=outputFile) }
In my procedural world, I'd do something like:
for (row in dataFrame) { #look up stuff using data from the row #write stuff to the file }
What is the "R way" to do this?
-
Dirk Eddelbuettel over 14 yearsWhat is your question here? A data.frame is a two-dimensional object and looping over the rows is a perfectly normal way of doing things as rows are commonly sets of 'observations' of the 'variables' in each column.
-
Carl Coryell-Martin over 14 yearswhat I end up doing is: for (index in 1:nrow(dataFrame)) { row = dataFrame[index, ]; # do stuff with the row } which never seemed very pretty to me.
-
Shane over 14 yearsDoes getWellID call a database or anything? Otherwise, Jonathan is probably right and you could vectorize this.
-
-
Carl Coryell-Martin over 14 yearsShane, thank you. I'm not sure how to write a vectorized getWellID. What I need to do right now is to dig into an existing list of lists to look it up or pull it out of a database.
-
Shane over 14 yearsFeel free to post the getWellID question (i.e. can this function be vectorized?) separately, and I'm sure I (or someone else) will answer it.
-
Jonathan Chang over 14 yearsEven if getWellID is not vectorized, I think you should go with this solution, and replace getWellId with
mapply(getWellId, well$name, well$plate)
. -
Shane over 14 yearsEven if you pull it from a database, you can pull them all at once and then filter the result in R; that will be faster than an iterative function.
-
Darren Cook over 12 yearsBe careful, as the dataframe is converted to a matrix, and what you end up with (
x
) is a vector. This is why the above example has to use numeric indexes; the by() approach gives you a data.frame, which makes your code more robust. -
Josh Bode over 11 years+1 for
foreach
- I'm going to use the hell out of that one. -
sds about 11 yearsthis will not work well if the data frame has 0 rows because
1:0
is not empty -
Jim almost 10 yearsEasy fix for the 0 row case is to use seq_len(), insert
seq_len(nrow(dataFrame))
in place of1:nrow(dataFrame)
. -
Zahy almost 10 yearsdid not work for me. The apply function treated every x given to f as a character value and not a row.
-
founddrama over 9 yearsNote too that you can refer to the columns by name. So:
wellName <- x[1]
could also bewellName <- x["name"]
. -
SmallChess over 8 yearsWhen Darren mentioned robust, he meant something like shifting the orders of the columns. This answer would not work whereas the one with by() would still work.
-
uh_big_mike_boi about 8 yearsHow do you actually implement (row)? Is it dataframe$column? dataframe[somevariableNamehere]? How do you actually say its a row. The pseudocode "function(row) dostuff" how would that actually look?
-
Ł Łaniewski-Wołłk about 8 yearsIt's faster to access a straight list then a data.frame.
-
Ł Łaniewski-Wołłk about 8 yearsJust realized it's even faster to make the same thing with double lapply: rows = function(x) lapply(seq_len(nrow(x)), function(i) lapply(x,function(c) c[i]))
-
Aaron McDaid about 8 yearsSo the inner
lapply
iterates over the columns of the entire datasetx
, giving each column the namec
, and then extracting thei
th entry from that column vector. Is this correct? -
Steve Pitchers over 7 yearsVery nice! In my case, I had to convert from "factor" values to the underlying value:
wellName <- as.character(well$name)
. -
pwilcox about 7 years@Mike, change
dostuff
in this answer tostr(row)
You'll see multiple lines printed in the console beginning with " 'data.frame': 1 obs of x variables." But be careful, changingdostuff
torow
does not return a data.frame object for the outer function as a whole. Instead it returns a list of one row data-frames. -
stephanmg over 3 yearsNot everything should be vectorized. But in this case it would make sense I guess.
-
Ken Williams over 3 yearsI fixed the issue noted by
sds
andJim
with an edit.