I guess part of it is that there’s some ‘non-locality’ in the pipeline where the grouping could be relatively distant from the operation acting on the grouped data. Similarly, you get to worry about eg grouping data that is already grouped.
I quite like the prql solution which is to have a ‘structured grouping’ where you have to delimit the pipeline that operates on grouped data, but maybe it can still lead to bad edits for complex queries.
https://news.ycombinator.com/item?id=30067406
And here in the docs:
https://dplyr.tidyverse.org/reference/dplyr_by.html
And maybe also:
https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-per-opera...
I guess part of it is that there’s some ‘non-locality’ in the pipeline where the grouping could be relatively distant from the operation acting on the grouped data. Similarly, you get to worry about eg grouping data that is already grouped.
I quite like the prql solution which is to have a ‘structured grouping’ where you have to delimit the pipeline that operates on grouped data, but maybe it can still lead to bad edits for complex queries.