I've been working on a hobbyist project to analyze a ROM for an architecture that wasn't covered by Ghidra, and let me just say. I had a hellish time trying to work with Sleigh, the language you use to define new architectures for Ghidra to analyze. There just isn't a ton of great info out there about it, outside of the Sleigh documentation itself. I was able to find a few guides online but none were quite at the level of detail I was looking for.
I ended up getting lucky and finding somebody else's project for the same CPU, that I was able to build on to make something that worked. And by doing that I was eventually able to figure out why I couldn't even get off the ground.
I'm also writing a processor module, and reading this is a bit encouraging to eventually write about it once it's finished.
Getting off the ground wasn't the hardest part so far. You can just pick the skeleton module that already comes with Ghidra, then lookup some existing simpler modules like the one for z80 to figure out how instructions are put together. You also have the script `DebugSleighInstructionParse` to check how bits are being decoded, very useful when you screw up some instruction definitions.
Unfortunately, you bump into a lot of jargon heavy error messages. The first time you hear about "Interior ellipsis in pattern", you sure have no idea what's that about. Now repeat that experience for several messages.
Then the hardest challenge is how to even test the module outside of some quick disassemblies. There's `pcodetest` but the setup is cumbersome and it seems more about validating instruction decoding rather than semantics. I might just write my own validation using pcode emulation and compare the register state against another emulator's instruction trace...
Pcodetest is more about validating the implementation of the instruction, sure it has to decode, but the benefit is most a base level set of logic that can be emulated. And definitely not a fan of the setup to get it going (also only helpful if you have a semi recent C compiler)
Oh nice, it wasn't clear from the test suite if that was the case, I'll give it a closer look.
Judging from the python scripts, it seems to expect a whole binutils toolchain (so not just compiler but also objdump, readelf...) and that would be a blocker for me.
Compiler (gcc) and maybe assembler (as) are used. I think the other binutils executables are unused but still built-in to their logic. Due to it's age and being removed from gcc, I was unable to cleanly setup pcodetest for 80960 (had to hack it all together and scripted their java portion to work with hack), but was super useful for improving tricore (pcodetest wasn't released when I submitted original PR) and writing risc-v.
I ended up getting lucky and finding somebody else's project for the same CPU, that I was able to build on to make something that worked. And by doing that I was eventually able to figure out why I couldn't even get off the ground.