Sum rows in data.frame or matrix
Solution 1
you can use rowSums
rowSums(data)
should give you what you want.
Solution 2
The rowSums function (as Greg mentions) will do what you want, but you are mixing subsetting techniques in your answer, do not use "$" when using "[]", your code should look something more like:
data$new <- rowSums( data[,43:167] )
If you want to use a function other than sum, then look at ?apply for applying general functions accross rows or columns.
Solution 3
I came here hoping to find a way to get the sum across all columns in a data table and run into issues implementing the above solutions. A way to add a column with the sum across all columns uses the cbind
function:
cbind(data, total = rowSums(data))
This method adds a total
column to the data and avoids the alignment issue yielded when trying to sum across ALL columns using the above solutions (see the post below for a discussion of this issue).
Adding a new column to matrix error
Solution 4
I will try to support you with the elapsed time for each method by an example:
mat = matrix(runif(4e6), ncol = 50)
Comprison between apply function and rowSums:
apply_func <- function(x) {
apply(x, 1, sum)
}
r_sum <- function(x) {
rowSums(x)
}
# Compare the methods
microbenchmark(
apply_func = app(mat),
r_sum = r_sum(mat), times = 1e5
)
------ output -- in milliseconds --------
expr min lq mean median uq max neval
apply_func 207.84661 260.34475 280.14621 279.18782 294.85119 354.1821 100
r_sum 10.76534 11.53194 13.00324 12.72792 14.34045 16.9014 100
As you notice that the mean time for the rowSums function is 21 times smaller than the mean time of the apply function. You will find that the difference in the elapsed time may be more significant if the matrix has too many columns.
Solution 5
You can also use this function adorn_totals from janitor package. You can sum the columns or the rows depending on the value you give to the arg: where.
Example:
tibble::tibble(
a = 10:20,
b = 55:65,
c = 2010:2020,
d = c(LETTERS[1:11])) %>%
janitor::adorn_totals(where = "col") %>%
tibble::as_tibble()
Result:
# A tibble: 11 x 5
a b c d Total
<int> <int> <int> <chr> <dbl>
1 10 55 2010 A 2065
2 11 56 2011 B 2067
3 12 57 2012 C 2069
4 13 58 2013 D 2071
5 14 59 2014 E 2073
6 15 60 2015 F 2075
7 16 61 2016 G 2077
8 17 62 2017 H 2079
9 18 63 2018 I 2081
10 19 64 2019 J 2083
11 20 65 2020 K 2085
Admin
Updated on January 06, 2022Comments
-
Admin over 2 years
I have a very large dataframe with rows as observations and columns as genetic markers. I would like to create a new column that contains the sum of a select number of columns for each observation using R.
If I have 200 columns and 100 rows, then I would like a to create a new column that has 100 rows with the sum of say columns 43 through 167. The columns have either 1 or 0. With the new column that contains the sum of each row, I will be able to sort the individuals who have the most genetic markers.
I feel it is something close to:
data$new=sum(data$[,43:167])
-
Marek over 13 yearsAnd for OP problem
data$new <- rowSums(data[43:167])
-
Augustin over 8 yearsTo save someone's time, perhaps: avoid confusion with function
rowsum
which does something else! -
munmunbb over 6 yearsI am not sure why i got this error: Error in rowSums(incomeData) : 'x' must be numeric
-
Greg Snow over 6 years@munmunbb, you received that error because
incomeData
is not numeric. Use something likestr(incomeData)
to see what it is, then possibly convert it to a numeric matrix. -
Paul Rougieux over 5 yearsSee also dplyr::mutate_all.
-
Hamzah over 2 yearsThe main goal is the idea regardless of the dataset I am working on, what is applied to a small matrix will be usually applied to a large benchmark.
-
Hamzah over 2 yearsThanks, for your suggestion. I did with times = 100.