Dplyr summarize multiple columns

11/28/2023

Pivot_longer(cols = starts_with("x")) %>% Though there are probably faster non-tidyverse options, here is a tidyverse option (using tidyr::pivot_longer): library(tidyr) If there isn't a row-wise variant for your function and you have a large data frame, consider a long-format, which is more efficient than rowwise. Large data frame without a row-wise variant function Mutate(sumrange = sum(c_across(x1:x5), na.rm = T)), However, it is inefficient.įor this example, the the row-wise variant rowSums is much faster: library(microbenchmark) Rowwise makes a pipe chain very readable and works fine for smaller data frames. Mutate(sumrow = rowSums(pick(x1:x5), na.rm = T)) However, in your specific case a row-wise variant exists ( rowSums) so you can do the following (note the use of pick instead), which will be faster: df %>% Rowise() will work for any summary function. Mutate(sum_startswithx = sum(c_across(starts_with("x")), na.rm = T)) You can use any number of tidy selection helpers like starts_with, ends_with, contains, etc. Mutate(sumnumeric = sum(c_across(where(is.numeric)), na.rm = T)) # %>% ungroup() # you'll likely want to ungroup after using rowwise() Mutate(sumrange = sum(c_across(x1:x5), na.rm = T)) Since rowwise() is just a special form of grouping and changes the way verbs work you'll likely want to pipe it to ungroup() after doing your row-wise operation. In newer versions of dplyr you can use rowwise() along with c_across to perform row-wise aggregation for functions that do not have specific row-wise variants, but if the row-wise variant exists it should be faster than using rowwise (eg rowSums, rowMeans). Operation so I would like to try avoid having to give any column names.Īny assistance would be greatly appreciated. In addition, the column names change at different iterations of the loop in which I want to implement this I could use something like: df % mutate(sumrow= x1 + x2 + x3 + x4 + x5)īut this would involve writing out the names of each of the columns. Below is a minimal example of the data frame: library(dplyr) I am thinking of a row-wise analog of the summarise_each or mutate_each function of dplyr.

The data entries in the columns are binary(0,1). My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr.

0 Comments

Dplyr summarize multiple columns

Leave a Reply.

Author

Archives

Categories