chapoly: Process two Poly1305 blocks in parallel in SSSE3 driver
authorMartin Willi <martin@revosec.ch>
Tue, 7 Apr 2015 09:28:51 +0000 (11:28 +0200)
committerMartin Willi <martin@revosec.ch>
Sun, 12 Jul 2015 11:25:50 +0000 (13:25 +0200)
commitfe5d6eaa9f53513e0d4ae335a51bbb31a0d81c7f
treee2c0e3e0645d20db02bb733fbea196f0aee8ec1e
parentb499777cbf9be346fda52c6c449040f6bfb24e6b
chapoly: Process two Poly1305 blocks in parallel in SSSE3 driver

By using a derived key r^2 we can improve performance, as we can do loop
unrolling and slightly better utilize SIMD instructions.

Overall ChaCha20-Poly1305 performance increases by ~12%.

Converting integers to/from our 5-word representation in SSE does not seem
to pay off, so we work on individual words.
src/libstrongswan/plugins/chapoly/chapoly_drv_ssse3.c