An Apple Mac Studio with 512GB of unified memory is around the $10k. If your really need that much power on your home computer, and you have that much money to spend, this could be the easiest option.
You probably don't need fp16. Most models can be quantized down to q8 with minimal loss of quality. Models can usually be quantized to q4 or even below and run reasonably well, depending on what you expect out of them.
Even at q8, you'll need around 235GB of memory. An Nvidia RTX 5090 has 32GB of VRAM and has an official price of about $2000, but usually retails for more. If you can find them at that price, you'd need eight of them to run a 235GB model entirely in VRAM, and that doesn't include a motherboard and CPU that can handle eight GPUs. You could look for old mining rigs built from RTX 3090s or P40s. Otherwise, I don't see much prospect for fitting this much data into VRAM on consumer GPUs for under $10k.
Without NVLink, you're going to take a massive performance hit running a model distributed over several computers. It can be done, and there's research into optimizing distributed models, but the throughput is a significant bottleneck. For now, you really want to run on a single machine.
You can get pretty good performance out of a CPU. The key is memory bandwidth. Look at server or workstation class CPUs with a lot of DDR5 memory channels that support a high MT/s rate. For example, an AMD Ryzen Threadripper 7965WX has eight DDR5 memory channels at up to 5200 MT/s and retails for about $2500. Depending on your needs, this might give you acceptable performance.
Lastly, I'd question whether you really need to run this at home. Obviously, this depends on your situation and what you need it for. Any investment you put into hardware is going to depreciate significantly in just a few years. $10k of credits in the cloud will take you a long way.
You probably don't need fp16. Most models can be quantized down to q8 with minimal loss of quality. Models can usually be quantized to q4 or even below and run reasonably well, depending on what you expect out of them.
Even at q8, you'll need around 235GB of memory. An Nvidia RTX 5090 has 32GB of VRAM and has an official price of about $2000, but usually retails for more. If you can find them at that price, you'd need eight of them to run a 235GB model entirely in VRAM, and that doesn't include a motherboard and CPU that can handle eight GPUs. You could look for old mining rigs built from RTX 3090s or P40s. Otherwise, I don't see much prospect for fitting this much data into VRAM on consumer GPUs for under $10k.
Without NVLink, you're going to take a massive performance hit running a model distributed over several computers. It can be done, and there's research into optimizing distributed models, but the throughput is a significant bottleneck. For now, you really want to run on a single machine.
You can get pretty good performance out of a CPU. The key is memory bandwidth. Look at server or workstation class CPUs with a lot of DDR5 memory channels that support a high MT/s rate. For example, an AMD Ryzen Threadripper 7965WX has eight DDR5 memory channels at up to 5200 MT/s and retails for about $2500. Depending on your needs, this might give you acceptable performance.
Lastly, I'd question whether you really need to run this at home. Obviously, this depends on your situation and what you need it for. Any investment you put into hardware is going to depreciate significantly in just a few years. $10k of credits in the cloud will take you a long way.