Is that the best TCC could have done here? It's easy to write this in C, which is one obvious reason for TCC to do it, but it's not at all clear this is the best way.
What TCC is actually storing in TokenSym (which you've named "Node" in this context) is four pointers to other information about symbols, a unique ID, and a string.
If we used a compact inline string (like CompactString) we can squeeze all this into 60 bytes unless the string is more than 24 bytes in length such as TCC's own warn_implicit_function_declaration or is_compatible_unqualified_types which would need a heap allocation like they have today, for the string itself.
If we do that we can build a linear probed open addressed hash table and avoid paying for the extra indirection any time the symbol names are 24 bytes or shorter in length, I'd expect this is markedly better.
But it's a lot of work in C, so it isn't a surprise TCC doesn't do it.
What TCC is actually storing in TokenSym (which you've named "Node" in this context) is four pointers to other information about symbols, a unique ID, and a string.
If we used a compact inline string (like CompactString) we can squeeze all this into 60 bytes unless the string is more than 24 bytes in length such as TCC's own warn_implicit_function_declaration or is_compatible_unqualified_types which would need a heap allocation like they have today, for the string itself.
If we do that we can build a linear probed open addressed hash table and avoid paying for the extra indirection any time the symbol names are 24 bytes or shorter in length, I'd expect this is markedly better.
But it's a lot of work in C, so it isn't a surprise TCC doesn't do it.