Improve GCM performance by factor 2-3 by shifting full 32/64 bit words