Indeed, if you use only 48 bits, you could also parallelize using the FP hardware - the mantissa is 52 bits, so if you use 48 bit limbs, you have 16 rounds before carry. Which is much less than 16 (or even 13) bits, but for processors which have distinct FP vs. integer adders, and that can issue them in parallel - you might get a speed boost.