It's amazing that the 17 instructions at the end to tidy up all the carries, which looks like it has to be done serially, is still faster. But I guess each register's carry bits are independent from the ones that get carries added from the last register, so it could be ...
mov W, E
mov V, D
mov U, C
mov T, B
shr W, 51
shr V, 51
shr U, 51
shr T, 51
add D, W
add C, V
add B, U
add A, T