Well... optimized addition isn't simple. But any beginner can implement naive carry-save adders on a breadboard (or Verilog/FPGAs) and get a functional Adder (maybe even ALU) in a weekend.
But you're right. To optimize addition (in particular: the propagation of carries), you need to use a Kogge-Stone Adder, which is pretty sophisticated (https://en.wikipedia.org/wiki/Kogge%E2%80%93Stone_adder). And that's from my Bachelor's degree: I'm sure there are newer algorithms that are more sophisticated than Kogge-Stone.
There's a lot of issues at play here at the electricity level. IE: Fanout. All components can only carry so much current (amps). For example, a transistor may only be able to feed 5 to 10 other transistors, so building "Buffers" to increase the current so that you can actually turn on all the transistors is a thing.
Each CMOS transistor has capacitance: so you need a certain amount of electrons before the transistor turns on. Given a level of current (amps), this means you need to wait for enough electrons to get onto the input before the transistor responds.
Doing all of this with the minimal energy usage and maximum speed (minimum latency) is surely complicated. But the "naive" version suitable for beginner play is pretty simple. Like raytracers: you can write a raytracer in just 100 lines of code (it won't be as fast as a professional raytracer, nor have as many features, but you'd get the gist in a single weekend project: https://github.com/matt77hias/smallpt)
---------
In any case, once you get to the final decoded instruction, its "just" a circuit. Sure, Kogge-Stone and Wallace Tree multipliers need a bit of study before you understand them. But they're simple black-boxes. Stick the bits into the correct wires and then a few hundred picoseconds later, you got the result coming out of another wire.
I had and lost a big response I typed up. And I don’t have the time to replicate it. But thank you both for such thoughtful discussion. This is giving me a really good broad sense of the topic. I’ve been implementing a game
Boy emulator for fun and it’s raised so many questions.
I found that getting timing correct was the hardest part (at university, where everything was done by hand — later I worked in a fab where we could make that easier for people using tools).
Well... optimized addition isn't simple. But any beginner can implement naive carry-save adders on a breadboard (or Verilog/FPGAs) and get a functional Adder (maybe even ALU) in a weekend.
But you're right. To optimize addition (in particular: the propagation of carries), you need to use a Kogge-Stone Adder, which is pretty sophisticated (https://en.wikipedia.org/wiki/Kogge%E2%80%93Stone_adder). And that's from my Bachelor's degree: I'm sure there are newer algorithms that are more sophisticated than Kogge-Stone.
There's a lot of issues at play here at the electricity level. IE: Fanout. All components can only carry so much current (amps). For example, a transistor may only be able to feed 5 to 10 other transistors, so building "Buffers" to increase the current so that you can actually turn on all the transistors is a thing.
Each CMOS transistor has capacitance: so you need a certain amount of electrons before the transistor turns on. Given a level of current (amps), this means you need to wait for enough electrons to get onto the input before the transistor responds.
Doing all of this with the minimal energy usage and maximum speed (minimum latency) is surely complicated. But the "naive" version suitable for beginner play is pretty simple. Like raytracers: you can write a raytracer in just 100 lines of code (it won't be as fast as a professional raytracer, nor have as many features, but you'd get the gist in a single weekend project: https://github.com/matt77hias/smallpt)
---------
In any case, once you get to the final decoded instruction, its "just" a circuit. Sure, Kogge-Stone and Wallace Tree multipliers need a bit of study before you understand them. But they're simple black-boxes. Stick the bits into the correct wires and then a few hundred picoseconds later, you got the result coming out of another wire.