More

boroboro4 · 2025-12-05T15:49:04 1764949744

To me intellect has two parts to it: "creativity" and "correctness". And from this perspective random sampler is infinitely "creative" - over (infinite) time it can come up with answer to any given problem. And from this perspective it does feel natural that base models are more "creative" (because that's what being measured in the paper), while RL models are more "correct" (that's a slope of the curve from the paper).

boroboro4 · 2025-11-24T04:56:35 1763960195

Probably if you use a lot of Arc<Mutex<Box<T>>> languages with proper runtime (like Go or Java) are gonna be more performant, in the end they are built with those abstractions in mind. So the question isn’t only how much the nature of the problem it is, but also how common the problem is, and is rust a correct way to solve this problem.

marisen · 2025-11-24T07:08:24 1763968104

If you use a lot of Arc<Mutex<Box<T>>> you you probably just learn to use Rust properly and just use Arc<Mutex<T>> instead because it pretty much never makes sense to have a Box inside an Arc...

I say that as someone that thinks Rust's learning curve is the main reason it rarely makes economic sense to use it.

boroboro4 · 2025-10-14T13:54:55 1760450095

Data centers might be, GPUs not really. No one needs GPUs from 8 years, and hardly even 5.

dragonwriter · 2025-10-14T17:49:29 1760464169

If GPU demand growth continues to outpace GPU production growth, that is necessarily going to change. Older GPUs may not be cost competitive to operate with newer GPUs, but when the alternative is no GPU...

trenchpilgrim · 2025-10-14T14:22:35 1760451755

A100s are over 5 years old and still widely used for HPC. 8 year old V100s are still available on cloud providers for low-intensity workloads.

boroboro4 · 2025-10-13T16:33:57 1760373237

It doesn’t make sense to compare ordinary dividends to capital gains - either compare ordinary to short term gains or qualified to long term gains.

ElevenLathe · 2025-10-14T15:17:49 1760455069

Why? These are just rules in a made-up game. They can be anything we want.

boroboro4 · 2025-09-30T17:31:56 1759253516

In my opinion any family lives in user space, through a implicit contract of filesystems and data stored on disk?

ChocolateGod · 2025-09-30T18:28:15 1759256895

Applications do not talk to the filesystem directly, they talk to the generic I/O syscalls on the kernel which handles the internal filesystem calls.

Those generic syscalls are (supposed to) don't change, the internal filesystem calls can and do change.

This is one reason why ZFS regularly breaks, on top of it can't use GPL exports.

cesarb · 2025-09-30T19:21:50 1759260110

> Applications do not talk to the filesystem directly,

Sometimes, they do. For instance, BTRFS_IOC_CLONE to do a copy-on-write clone of a file's contents (now promoted to other filesystems as FICLONE, but many other ioctl operation codes are still btrfs-specific; and other filesystems have their own filesystem-specific operations).

boroboro4 · 2025-09-13T12:56:17 1757768177

Thank you for telling, I went through their comments and they all like this :-( While having substance very obviously AI generated

boroboro4 · 2025-09-10T19:43:22 1757533402

> even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch.

I don’t think this is correct - MoE routing happens at per token basis. It can be non deterministic and batch related if you try to balance out your experts load in a batch but that’s performance optimization (just like all of the blogpost) and not the way models are trained to work.

eldenring · 2025-09-10T20:20:45 1757535645

Ah interesting, good point. So I guess expert-choice routing leaks across the batch. Now I'm not sure.

boroboro4 · 2025-09-06T12:19:31 1757161171

Torch.compile sits at both the level of computation graph and GPU kernels and can fuse your operations by using triton compiler. I think something similar applies to Jax and tensorflow by the way of XLA, but I’m not 100% sure.

davidatbu · 2025-09-06T12:44:28 1757162668

Good point. But the overall point about Mojo availing a different level of abstraction as compared to Python still stands: I imagine that no amount of magic/operator-fusion/etc in `torch.compile()` would let one get reasonable performance for an implementation of, say, flash-attn. One would have to use CUDA/Triton/Mojo/etc.

boroboro4 · 2025-09-06T15:01:50 1757170910

But python is already operating fully on different level of abstraction - you mention triton yourself, and there is new python cuda api too (the one similar to triton). More to this - flash attention 4 is actually written in python.

Somehow python managed to be both high level and low level language for GPUs…

davidatbu · 2025-09-06T16:25:10 1757175910

IIUC, triton uses Python syntax, but it has a separate compiler (which is kinda what Mojo is doing, except Mojo's syntax is a superset of Python's, instead of a subset, like Triton). I think it's fair to describe it as a different language (otherwise, we'd also have to describe Mojo also as "Python"). Triton's website and repo describes itself as "the Triton language and compiler" (as opposed to, I dunno, "Write GPU kernels in Python").

Also, flash attention is at v3-beta right now? [0] And it requires one of CUDA/Triton/ROCm?

[0] https://github.com/Dao-AILab/flash-attention

But maybe I'm out of the loop? Where do you see that flash attention 4 is written in Python?

boroboro4 · 2025-09-06T16:38:06 1757176686

From this perspective PyTorch is separate language, at least as soon as you start using torch.compile (only subset of PyTorch python will be compilable). That’s strength of python - it’s great for describing things and later for analyzing them (and compiling, for example).

Just to be clear here - you use triton from plain python, it runs compilation inside.

Just like I’m pretty sure not all mojo can be used to write kernels? I might be wrong here, but it would be very hard to fit general purpose code into kernels (and to be frank pointless, constrains bring speed).

As for flash attention there was a leak: https://www.reddit.com/r/LocalLLaMA/comments/1mt9htu/flashat...

boroboro4 · 2025-08-28T18:56:21 1756407381

Notice “nation” part, not “president”. Tariffs power in the US vested in Congress, and Congress created laws which regulate it. What Trump is doing is outside of his legal powers, regardless to some conceptual reasoning why countries can do retaliation tariffs.

boroboro4 · 2025-08-28T17:37:03 1756402623

This makes me feel so doomed and sad about US future meh.