Compilers massively outperform humans if the human has to write the entire program in assembly. Even if a human could write a sizable program in assembly, it would be subpar compared to what a compiler would write.
This is true.
However, that doesn't mean that looking at the generated asm / even writing some is useless! Just because you can't globally outperform the compiler, doesn't mean you can't do it locally! If you know where the bottleneck is, and make those few functions great, that's a force multiplier for you and your program.
It’s absolutely not useless, I do it often as a way to diagnose various kinds of problems. But it’s extremely rare that a handwritten version actually performs better.
It's a bit of a generic complaint, but quite apt for the subject matter. Mission creep kills projects, and that's true across a broad range of activities.
More specifically in the case of software, egos kill projects, and expanding the scope of your project to include broader economic or social causes usually does the same.
This is correlated to a huge change in nerd culture - pseudonymity was much more common and encouraged, with people's real-life identities or views not really taken into account. ("on the internet, nobody knows you're a dog")
Social media happened, and now most people use their real-world identities and carry their real-life worldview into the internet.
This had a huge negative effect on internet toxicity and interpersonal trust, and Eich is a good example of that - auxiliary things being dredged up about someone, used as a cudgel against them for their real or perceived transgressions.
The end result is that effective project management has become a rare breed and we see all these colossal failures like Firefox...
Please explain how it adds to the discussion about different ways to broaden supported Rust target architectures. Because both have the word Rust in them?
A preprocessor step mostly solves this one. No one said that the shader source has to go into the GPU API 1:1.
Basically do what most engines do - have preprocessor constants and use different paths based on what attributes you need.
I also don't see how separated pipeline stages are against this - you already have this functionality in existing APIs where you can swap different stages individually. Some changes might need a fixup from the driver side, but nothing which can't be added in this proposed API's `gpuSetPipeline` implementation...
Most of it has been said by the other replies and they're really good, adding a few things onto it:
- Would lead to reduced memory usage on the driver side due to eliminating all the statetracking for "legacy" APIs and all the PSO/shader duplication for the "modern" APIs (who doesn't like using less memory? won't show up on a microbenchmark but a reduced working set leads to globally increased performance in most cases, due to >cache hit%)
- A much reduced cost per API operation. I don't just mean drawcalls but everything else too. And allowing more asynchrony without the "here's 5 types of fences and barriers" kind of mess. As the article says, you can either choose between mostly implicit sync (OpenGL, DX11) and tracking all your resources yourself (Vulkan) then feeding all that data into the API which mostly ignores it.
This one wouldn't really have an impact on speeding up existing applications but more like unlock new possibilities. For example massively improving scene variety with cheap drawcalls and doing more procedural objects/materials instead of the standard PBR pipeline. Yes, drawindirect and friends exist but they aren't exactly straightforward to use and require you to structure your problem in a specific way.
It's actually not that low-level! It doesn't really get into hardware specifics that much (other than showing what's possible across different HW) or stuff like what's optimal where.
And it's quite a bit simpler than what we have in the "modern" GPU APIs atm.
Not GP but I can strongly relate to it. Most of the programming I do is related to me making a game.
I follow WET principles (write everything twice at least) because the abstraction penalty is huge, both in terms of performance and design, a bad abstraction causes all subsequent content to be made much slower. Which I can't afford as a small developer.
Same with most other "clean code" principles. My codebase is ~70K LoC right now, and I can keep most of it in my head. I used to try to make more functional, more isolated and encapsulated code, but it was hard to work with and most importantly, hard to modify. I replaced most of it with global variables, shit works so much better.
I do use partial classes pretty heavily though - helps LLMs not go batshit insane from context overload whenever they try to read "the entire file".
Models sometimes try to institute these clean code practices but it almost always just makes things worse.
OK, I can follow WET before you DRY, to me that's just a non-zealous version of Don't Repeat Yourself.
I think, if you're writing code where you know the entire code base, a lot of the clean principles seem less important, but once you get someone who doesn't, and that can be you coming back to the project in three months, suddenly they have value.
The good old "studios don't play their own games" strikes again :P
Games would be much better if all people making them were forced to spend a few days each month playing the game on middle-of-the-road hardware. That will quickly teach them the value of fixing stuff like this and optimising the game in general.
I've worked in games for close to 15 years, and every studio I've worked on we've played the game very regularly. My current team every person plays the game at least once a week, and more often as we get closer to builds.
In my last project, the gameplay team played every single day.
> Games would be much better if all people making them were forced to spend a few days each month playing the game on middle-of-the-road hardware
How would playing on middle of the road hardware have caught this? The fix to this was to benchmark the load time on the absolute bottom end of hardware, with and without the duplicated logic. Which you'd only do once you have a suspicion that it's going to be faster if you change it...
They could have been lying I guess but I listened to a great podcast about the development of Helldivers 2 (I think it was gamemakers notebook) and one thing that was constantly brought up was as they iterated they forced a huge chunk of the team to sit down and play it. That’s how things like diving from a little bit too high ended up with you faceplanting and rag-dolling, tripping when jet packing over a boulder that you get a little too close to, etc. They found that making it comically realistic in some areas led to more unexpected/emergent gameplay that was way more entertaining. Turrets and such not caring if you’re in the line of fire was brought up I believe.
That’s how we wound up with this game where your friends are as much of a liability as your enemies.
Okay, but it gives you a mostly good answer! Unlike many other sorts where if you interrupt it before the last step, you get total nonsense.
It's basically asymptotically approacting the correct (sorted) list instead of shuffling the list in weird ways until it's all magically correct in the end.
> Unlike many other sorts where if you interrupt it before the last step, you get total nonsense.
which ones you have in mind? and doesn't "nonsense" depend on scoring criteria?
selection sort would give you sorted beginning, cocktail shaker would have sorted both ends
quick sort would give vast ranges separation ("small values on one side, big on the other"), and block-merge algorithms create sorted subarrays
in my view those qualities are much more useful for partial state than "number of pairs of elements out of order" metric which smells of CS-complexity talk
>I asked it about how to go about solving an issue and it completely hallucinated a solution suggesting to use a non-existing function that was supposedly exposed by a library.
Yeah, that's a huge pain point in LLMs. Personally, I'm way less impacted by them because my codebase is only minimally dependent on library stuff (by surface area) so if something doesn't exist or whatever, I can just tell the LLM to also implement the thing it hallucinated :P
These hallucinations are usually a good sign of "this logically should exist but it doesn't exist yet" as opposed to pure bs.
However, that doesn't mean that looking at the generated asm / even writing some is useless! Just because you can't globally outperform the compiler, doesn't mean you can't do it locally! If you know where the bottleneck is, and make those few functions great, that's a force multiplier for you and your program.
reply