To me intellect has two parts to it: "creativity" and "correctness". And from this perspective random sampler is infinitely "creative" - over (infinite) time it can come up with answer to any given problem. And from this perspective it does feel natural that base models are more "creative" (because that's what being measured in the paper), while RL models are more "correct" (that's a slope of the curve from the paper).
Probably if you use a lot of Arc<Mutex<Box<T>>> languages with proper runtime (like Go or Java) are gonna be more performant, in the end they are built with those abstractions in mind. So the question isn’t only how much the nature of the problem it is, but also how common the problem is, and is rust a correct way to solve this problem.
If you use a lot of Arc<Mutex<Box<T>>> you you probably just learn to use Rust properly and just use Arc<Mutex<T>> instead because it pretty much never makes sense to have a Box inside an Arc...
I say that as someone that thinks Rust's learning curve is the main reason it rarely makes economic sense to use it.
If GPU demand growth continues to outpace GPU production growth, that is necessarily going to change. Older GPUs may not be cost competitive to operate with newer GPUs, but when the alternative is no GPU...
> Applications do not talk to the filesystem directly,
Sometimes, they do. For instance, BTRFS_IOC_CLONE to do a copy-on-write clone of a file's contents (now promoted to other filesystems as FICLONE, but many other ioctl operation codes are still btrfs-specific; and other filesystems have their own filesystem-specific operations).
> even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch.
I don’t think this is correct - MoE routing happens at per token basis. It can be non deterministic and batch related if you try to balance out your experts load in a batch but that’s performance optimization (just like all of the blogpost) and not the way models are trained to work.
Torch.compile sits at both the level of computation graph and GPU kernels and can fuse your operations by using triton compiler. I think something similar applies to Jax and tensorflow by the way of XLA, but I’m not 100% sure.
Good point. But the overall point about Mojo availing a different level of abstraction as compared to Python still stands: I imagine that no amount of magic/operator-fusion/etc in `torch.compile()` would let one get reasonable performance for an implementation of, say, flash-attn. One would have to use CUDA/Triton/Mojo/etc.
But python is already operating fully on different level of abstraction - you mention triton yourself, and there is new python cuda api too (the one similar to triton). More to this - flash attention 4 is actually written in python.
Somehow python managed to be both high level and low level language for GPUs…
IIUC, triton uses Python syntax, but it has a separate compiler (which is kinda what Mojo is doing, except Mojo's syntax is a superset of Python's, instead of a subset, like Triton). I think it's fair to describe it as a different language (otherwise, we'd also have to describe Mojo also as "Python"). Triton's website and repo describes itself as "the Triton language and compiler" (as opposed to, I dunno, "Write GPU kernels in Python").
Also, flash attention is at v3-beta right now? [0] And it requires one of CUDA/Triton/ROCm?
From this perspective PyTorch is separate language, at least as soon as you start using torch.compile (only subset of PyTorch python will be compilable). That’s strength of python - it’s great for describing things and later for analyzing them (and compiling, for example).
Just to be clear here - you use triton from plain python, it runs compilation inside.
Just like I’m pretty sure not all mojo can be used to write kernels? I might be wrong here, but it would be very hard to fit general purpose code into kernels (and to be frank pointless, constrains bring speed).
Notice “nation” part, not “president”. Tariffs power in the US vested in Congress, and Congress created laws which regulate it. What Trump is doing is outside of his legal powers, regardless to some conceptual reasoning why countries can do retaliation tariffs.
reply