Mutate multiple columns in a dataframe
Solution 1
Update (as of the 18th of March, 2019)
There has been a change. We have been using funs()
in .funs
(funs(name = f(.)
). But this is changed (dplyr 0.8.0 above). Instead of funs
, now we use list
(list(name = ~f(.))
). See the following new examples.
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = vars(cash:loans))
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = c("cash", "bond", "loans"))
bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = 5:7)
Update (as of the 2nd of December, 2017)
Since I answered this question, I have realized that some SO users have been checking this answer. The dplyr package has changed since then. Therefore, I leave the following update. I hope this will help some R users to learn how to use mutate_at()
.
mutate_each()
is now deprecated. You want to use mutate_at()
, instead. You can specify which columns you want to apply your function in .vars
. One way is to use vars()
. Another is to use a character vector containing column names, which you want to apply your custom function in .fun
. The other is to specify columns with numbers (e.g., 5:7 in this case). Note that, if you use a column for group_by()
, you need to change the numbers of column positions. Have a look of this question.
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans))
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans"))
bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7)
# bankname bankid year totass cash bond loans cash_toAsset bond_toAsset loans_toAsset
#1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
I purposely gave toAsset
to the custom function in .fun
since this will help me to arrange new column names. Previously, I used rename()
. But I think it is much easier to clean up column names with gsub()
in the present approach. If the above result is saved as out
, you want to run the following code in order to remove _
in the column names.
names(out) <- gsub(names(out), pattern = "_", replacement = "")
Original answer
I think you can save some typing in this way with dplyr. The downside is you overwrite cash, bond, and loans.
bankdata %>%
group_by(bankname) %>%
mutate_each(funs(whatever = ./totass), cash:loans)
# bankname bankid year totass cash bond loans
#1 Bank A 1 1881 244789 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 0.01097706 0.009812875 NA
If you prefer your expected outcome, I think some typing is necessary. The renaming part seems to be something you gotta do.
bankdata %>%
group_by(bankname) %>%
summarise_each(funs(whatever = ./totass), cash:loans) %>%
rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana;
ana %>%
merge(bankdata,., by = "bankname")
# bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset
#1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
#2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
#3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
#4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
#5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
Solution 2
Apply
and cbind
cbind(bankdata,apply(bankdata[,5:7],2, function(x) x/bankdata$totass))
names(bankdata)[8:10] <- paste0(names(bankdata)[5:7], 'toAssest’)
> bankdata
bankname bankid year totass cash bond loans cashtoAssest bondtoAssest loanstoAssest
1 Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
2 Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
3 Bank C 3 1881 107736 13357 177612 NA 0.12397899 1.648585431 NA
4 Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
5 Bank E 5 1881 32000000 351266 314012 NA 0.01097706 0.009812875 NA
Solution 3
Here is a data.table
solution.
library(data.table)
setDT(bankdata)
bankdata[, paste0(names(bankdata)[5:7], "toAsset") :=
lapply(.SD, function(x) x/totass), .SDcols=5:7]
bankdata
# bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset
# 1: Bank A 1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506
# 2: Bank B 2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359
# 3: Bank C 3 1881 107736 13357 177612 0 0.12397899 1.648585431 0.00000000
# 4: Bank D 4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832
# 5: Bank E 5 1881 32000000 351266 314012 0 0.01097706 0.009812875 0.00000000
Solution 4
This is one of the big downsides of dplyr
: as far as I'm aware, there is no straightforward way to use it programmatically rather than interactively without some kind of "hack" like the deplorable eval(parse(text=foo))
idiom.
The simplest approach is the same as in the Stata method, but string manipulation is a little more verbose in R than in Stata (or in any other scripting language, for that matter).
for (x in c("cash", "bond", "loans")) {
bankdata[sprintf("%stoAsset", x)] <- bankdata[x] / bankdata$totass # or, equivalently, bankdata["totass"] for a consistent "look"
## can also replace `sprintf("%stoAsset", x)` with `paste0(c(x, "toAsset"))` or even `paste(x, "toAsset", collapse="") depending on what makes more sense to you.
}
To make the whole thing more Stata-like, you can wrap the whole thing in within
like so:
bankdata <- within(bankdata, for (x in c("cash", "bond", "loans")) {
assign(x, get(x) / totass)
})
but this entails some hacking with the get
and assign
functions which aren't as safe to use in general, although in your case it's probably not a big deal. I wouldn't recommend trying similar tricks with dplyr
, for instance, because dplyr
abuses R's nonstandard evaluation features and it's probably more trouble than it's worth. For a faster and probably superior solution, check out the data.table
package which (I think) would allow you to use the Stata-like looping syntax but with dplyr
-like speed. Check out the package vignette on CRAN.
Also, are you really, really sure you want to reassign NA
entries to 0?
![H Park](https://lh5.googleusercontent.com/-p5BhwYa8Quw/AAAAAAAAAAI/AAAAAAAAAK8/W16a70MLWeA/photo.jpg?sz=256)
Comments
-
H Park almost 2 years
I have a data set that looks like this.
bankname bankid year totass cash bond loans Bank A 1 1881 244789 7250 20218 29513 Bank B 2 1881 195755 10243 185151 2800 Bank C 3 1881 107736 13357 177612 NA Bank D 4 1881 170600 35000 20000 5000 Bank E 5 1881 3200000 351266 314012 NA
and I want to compute some ratios based on bank balance sheets. and I want the dataset to look like this
bankname bankid year totass cash bond loans CashtoAsset BondtoAsset LoanstoAsset Bank A 1 1881 2447890 7250 202100 951300 0.002 0.082 0.388 Bank B 2 1881 195755 10243 185151 2800 0.052 0.945 0.014 Bank C 3 1881 107736 13357 177612 NA 0.123 1.648585431 NA Bank D 4 1881 170600 35000 20000 5000 0.205 0.117 0.029 Bank E 5 1881 32000000 351266 314012 NA 0.0109 0.009 NA
Here is the code to replicate the data
bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E") bankid <- c( 1, 2, 3, 4, 5) year<- c( 1881, 1881, 1881, 1881, 1881) totass <- c(244789, 195755, 107736, 170600, 32000000) cash<-c(7250,10243,13357,35000,351266) bond<-c(20218,185151,177612,20000,314012) loans<-c(29513,2800,NA,5000,NA) bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)
First, I got rid of NAs in balance sheets.
cols <- c("totass", "cash", "bond", "loans") bankdata[cols][is.na(bankdata[cols])] <- 0
Then I compute ratios
library(dplyr) bankdata<-mutate(bankdata,CashtoAsset = cash/totass) bankdata<-mutate(bankdata,BondtoAsset = bond/totass) bankdata<-mutate(bankdata,loanstoAsset =loans/totass)
But, instead of computing all these ratios line by line, I want to create a look to do this all at once. In Stata, I would do
foreach x of varlist cash bond loans { by bankid: gen `x'toAsset = `x'/ totass }
How would I do this?
-
H Park over 9 yearsHi, I am trying all different options posted here. When I tried your codes, I got. `Error: object 'ana' not found. Would you explain to me what is going on? Thanks.
-
jazzurro over 9 years@HPark I am assigning an output to the object, ana in the piping process. If this approach does not work for you, you could do,
ana <- bankdata %>% group_by(bankname) %>% summarise_each(funs(whatever = ./totass), cash:loans) %>% rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans); ana %>% merge(bankdata,., by = "bank name")
-
Skurup over 5 yearsIn the code cbind(bankdata,apply(bankdata[,5:7],2, function(x) x/bankdata$totass)), what does the 2 signify?
-
hvollmeier over 5 years@Skurup, "2" is the "margin" argument within the
apply
function. It means that the function will be applied to the columns vector. "1" instead would apply the function to rows. ( check?apply
) -
Lukas about 2 yearsReminder from 2022: all of the above have been deprecated; the state of the art would be
dplyr::mutate(dplyr::across(.cols = c(cash:loans), .fns=~.x/totass, .names="{.col}ToAsset"))
. Notable changes: there is no need to uservars()
anymore, a simplec()
will do; and there is no need to overwrite columns anymore.