This is rarely the correct thing to do. Users don't particularly like it if you ...

This is rarely the correct thing to do. Users don't particularly like it if you refuse to process a document because it has an error somewhere in there.

Even for identifiers you probably want to do all kinds of normalization even beyond the level of UTF-8 so things like overlong sequences and other errors are really not an inherent security issue.