Aren't they technically the same? GPT picks the next token given the state of current context, based on probabilities and a random factor. That is mathematically equivalent to a Markov chain, isn't it?
Markov chains don't account for the full history. While all LLMs do have a context length, this is more a practical limitation based on resources rather than anything implicit in the model.