This re-write was needed to fix deficiencies in the existing ldexp,
which was failing to compute correct results for both f16 and f80.
It would be nice to add a fast multiplication-based fallback in the
future for targets that have a hardware FPU, but this implementation
should be much faster than the existing for targets without one.