There are a fair number of papers (start with Dart/dropout, Bart (bayesian sampl...

There are a fair number of papers (start with Dart/dropout, Bart (bayesian sampling of the whole gbm) but they start to look like global optimization problems and part of the reasons trees work so well is that the local greedy optimization can be made super fast on modern cpu caches.

So even if you can fit a more compact forest that performs well through clever regularization its usually better/faster in practice to grow more simple trees with more randomization and let overfitting average out.