as long as you are just using static strings and native types it amounts to a pointer/index bump and a load/store per item. Lets imagine you have the format string, priority number, system id, and 7 pieces of data in the payload. That would be 10 items, so like 40 cycles? I can see the 18ns the paper gets.
I had no doubt the 7ns number is heavily cooked.