Rui Hou

CryptoDB

Rui Hou

Publications

Year

Venue

Title

2024

TCHES

Faster NTRU-based Bootstrapping in less than 4 ms Abstract

Zhihao Li Xianhui Lu Zhiwei Wang Ruida Wang Ying Liu Yinhang Zheng Lutan Zhao Kunpeng Wang Rui Hou

Bootstrapping is a critical technique in constructing fully homomorphic encryption (FHE), which serves to refresh the noise in FHE ciphertexts, enabling an arbitrary number of homomorphic operations. Among published results, the TFHE-rs library [Zam22] offers the fastest bootstrapping implementation on CPU platforms by taking advantage of AVX-512 instructions.In this paper, we improve the efficiency of the bootstrapping algorithm based on the NTRU problem. First, we introduce the approximate gadget decomposition method tailored for NTRU ciphertext, reducing the number of NTT operations required for external products. Second, by integrating the approximate decomposition and key unrolling techniques, we improve the performance of CMux-based blind rotation. Third, for the automorphism-based blind rotation method, we present a hybrid window size technique that reduces the number of automorphisms by 34% compared to recent work [XZD+23](in Crypto23).Subsequently, we implement the proposed bootstrapping algorithm on the CPU platform with AVX instructions. Experimental results demonstrate that our method only takes 3.8ms, which achieves a 1.8× speedup compared to the TFHE-rs library. Finally, we propose an efficient FPGA accelerator based on the CMux method, which not only achieves the best performance but also exhibits high throughput advantages. Our accelerator can improve performance by 2x compared to state-of-the-art FPGA implementations (e.g., FPT).

2024

TCHES

Elastic MSM: A Fast, Elastic and Modular Preprocessing Technique for Multi-Scalar Multiplication Algorithm on GPUs Abstract

Xudong Zhu Haoqi He Zhengbang Yang Yi Deng Lutan Zhao Rui Hou

Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacypreserving applications such as cryptocurrencies and verifiable computation. Although state-of-the-art zkSNARKs are highly efficient for the verifier, the computational overhead for the prover is still orders of magnitude too high to warrant use in many applications. This overhead arises from several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and especially the multi-scalar multiplication (MSM) which constitutes the largest proportion. Therefore, further efficiency improvements are needed.In this paper, we focus on comprehensive optimization of running time and storage space required by the MSM algorithm on GPUs. Specifically, we propose a novel, modular and adaptive parameter configuration technique—elastic MSM to enable us to adjust the scale of MSM according to our own wishes by performing a corresponding amount of preprocessing. This technique enables us to fully unleash the potential of various efficient parallel MSM algorithms. We have implemented and tested elastic MSM over three prevailing parallel Pippenger algorithms on GPUs. Across various preprocessing space limitations (across various MSM scales), our constructions achieve up to about 1.90×, 1.08× and 1.36× (2.58×, 1.39× and 1.91×) speedup versus three state-of-the-art parallel Pippenger algorithms on GPUs, respectively.From another perspective, elastic MSM could also be regarded as a preprocessing technique over the well-known Pippenger algorithm, which is modular and could be used to accelerate almost all the most advanced parallel Pippenger algorithms on GPUs. Meanwhile, elastic MSM provides an adaptive trade-off between the running time and the extra storage space needed by parallel Pippenger algorithms on GPUs. This is the first preprocessing technique to retain the improved MSM computation brought by preprocessing under varying storage space limitations. Specifically, across various preprocessing space limitations (across various MSM scales), our constructions achieve up to about 192× and 223× (159× and 174×) speedup versus two state-ofthe- art preprocessing parallel Pippenger algorithms on GPUs, respectively.