Trying to find a good reference for you. There’s this from the author of dplyr: ...

Trying to find a good reference for you. There’s this from the author of dplyr:

https://news.ycombinator.com/item?id=30067406

And here in the docs:

https://dplyr.tidyverse.org/reference/dplyr_by.html

And maybe also:

https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-per-opera...

I guess part of it is that there’s some ‘non-locality’ in the pipeline where the grouping could be relatively distant from the operation acting on the grouped data. Similarly, you get to worry about eg grouping data that is already grouped.

I quite like the prql solution which is to have a ‘structured grouping’ where you have to delimit the pipeline that operates on grouped data, but maybe it can still lead to bad edits for complex queries.