They kinda already are - source code for highly complex open source software is ...

They kinda already are - source code for highly complex open source software is already in the training datasets. The problem is that tutorials are much more descriptive (why the code is doing something, how does this particular function work etc. -down to a level of a single line of code), which probably means it’s much easier to interpret for llm-s, therefore weighted higher in responses.