CryptoDB
Erdem Alkim
Publications
Year
Venue
Title
2022
TCHES
Multi-Parameter Support with NTTs for NTRU and NTRU Prime on Cortex-M4
Abstract
We propose NTT implementations with each supporting at least one parameter of NTRU and one parameter of NTRU Prime. Our implementations are based on size-1440, size-1536, and size-1728 convolutions without algebraic assumptions on the target polynomial rings. We also propose several improvements for the NTT computation. Firstly, we introduce dedicated radix-(2, 3) butterflies combining Good–Thomas FFT and vector-radix FFT. In general, there are six dedicated radix-(2, 3) butterflies and they together support implicit permutations. Secondly, for odd prime radices, we show that the multiplications for one output can be replaced with additions/subtractions. We demonstrate the idea for radix-3 and show how to extend it to any odd prime. Our improvement also applies to radix-(2, 3) butterflies. Thirdly, we implement an incomplete version of Good–Thomas FFT for addressing potential code size issues. For NTRU, our polynomial multiplications outperform the state-of-the-art by 2.8%−10.3%. For NTRU Prime, our polynomial multiplications are slower than the state-of-the-art. However, the SotA exploits the specific structure of coefficient rings or polynomial moduli, while our NTT-based multiplications exploit neither and apply across different schemes. This reduces the engineering effort, including testing and verification.
2020
TCHES
ISA Extensions for Finite Field Arithmetic: Accelerating Kyber and NewHope on RISC-V
📺
Abstract
We present and evaluate a custom extension to the RISC-V instruction set for finite field arithmetic. The result serves as a very compact approach to software-hardware co-design of PQC implementations in the context of small embedded processors such as smartcards. The extension provides instructions that implement finite field operations with subsequent reduction of the result. As small finite fields are used in various PQC schemes, such instructions can provide a considerable speedup for an otherwise software-based implementation. Furthermore, we create a prototype implementation of the presented instructions for the extendable VexRiscv core, integrate the result into a chip design, and evaluate the design on two different FPGA platforms. The effectiveness of the extension is evaluated by using the instructions to optimize the Kyber and NewHope key-encapsulation schemes. To that end, we also present an optimized software implementation for the standard RISC-V instruction set for the polynomial arithmetic underlying those schemes, which serves as basis for comparison. Both variants are tuned on an assembler level to optimally use the processor pipelines of contemporary RISC-V CPUs. The result shows a speedup for the polynomial arithmetic of up to 85% over the basic software implementation. Using the custom instructions drastically reduces the code and data size of the implementation without introducing runtime-performance penalties at a small cost in circuit size. When used in the selected schemes, the custom instructions can be used to replace a full general purpose multiplier to achieve very compact implementations.
2020
TCHES
Cortex-M4 optimizations for {R,M} LWE schemes
📺
Abstract
This paper proposes various optimizations for lattice-based key encapsulation mechanisms (KEM) using the Number Theoretic Transform (NTT) on the popular ARM Cortex-M4 microcontroller. Improvements come in the form of a faster code using more efficient modular reductions, optimized small-degree polynomial multiplications, and more aggressive layer merging in the NTT, but also in the form of reduced stack usage. We test our optimizations in software implementations of Kyber and NewHope, both round 2 candidates in the NIST post-quantum project, and also NewHope-Compact, a recently proposed variant of NewHope with smaller parameters. Our software is the first implementation of NewHope-Compact on the Cortex-M4 and shows speed improvements over previous high-speed implementations of Kyber and NewHope. Moreover, it gives a common framework to compare those schemes with the same level of optimization. Our results show that NewHope- Compact is the fastest scheme, followed by Kyber, and finally NewHope, which seems to suffer from its large modulus and error distribution for small dimensions.
2020
TCHES
Polynomial Multiplication in NTRU Prime: Comparison of Optimization Strategies on Cortex-M4
📺
Abstract
This paper proposes two different methods to perform NTT-based polynomial multiplication in polynomial rings that do not naturally support such a multiplication. We demonstrate these methods on the NTRU Prime key-encapsulation mechanism (KEM) proposed by Bernstein, Chuengsatiansup, Lange, and Vredendaal, which uses a polynomial ring that is, by design, not amenable to use with NTT. One of our approaches is using Good’s trick and focuses on speed and supporting more than one parameter set with a single implementation. The other approach is using a mixed radix NTT and focuses on the use of smaller multipliers and less memory. On a ARM Cortex-M4 microcontroller, we show that our three NTT-based implementations, one based on Good’s trick and two mixed radix NTTs, provide between 32% and 17% faster polynomial multiplication. For the parameter-set ntrulpr761, this results in between 16% and 9% faster total operations (sum of key generation, encapsulation, and decapsulation) and requires between 15% and 39% less memory than the current state-of-the-art NTRU Prime implementation on this platform, which is using Toom-Cook-based polynomial multiplication.
Program Committees
- PKC 2019
Coauthors
- Erdem Alkim (4)
- Yusuf Alper Bilgin (1)
- Murat Cenk (1)
- Dean Yun-Li Cheng (1)
- Chi-Ming Marvin Chung (1)
- Hülya Evkan (2)
- François Gérard (1)
- Leo Wei-Lun Huang (1)
- Vincent Hwang (2)
- Norman Lahr (1)
- Ching-Lin Trista Li (1)
- Ruben Niederhagen (2)
- Richard Petri (1)
- Cheng-Jhih Shih (1)
- Julian Wälde (1)
- Bo-Yin Yang (2)