International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

GPU Acceleration for FHEW/TFHE Bootstrapping

Authors:
Yu Xiao
Feng-Hao Liu
Yu-Te Ku
Ming-Chien Ho
Chih-Fan Hsu
Ming-Ching Chang
Shih-Hao Hung
Wei-Chao Chen
Download:
DOI: 10.46586/tches.v2025.i1.314-339
URL: https://tches.iacr.org/index.php/TCHES/article/view/11931
Search ePrint
Search Google
Abstract: Fully Homomorphic Encryption (FHE) allows computations to be performed directly on encrypted data without decryption. Despite its great theoretical potential, the computational overhead remains a major obstacle for practical applications. To address this challenge, hardware acceleration has emerged as a promising approach, aiming to achieve real-time computation across a wider range of scenarios. In line with this, our research focuses on designing and implementing a Graphic Processing Unit (GPU)-based accelerator for the third generation FHEW/TFHE bootstrapping scheme, which features smaller parameters and bootstrapping keys particularly suitable for GPU architectures compared to the other generations.In summary, our accelerator offers improved efficiency, scalability, and flexibility for extensions, e.g., functional bootstrapping (Liu et al., Asiacrypt 2022), compared to current state-of-the-art solutions. We evaluate our implementation and demonstrate substantial speedup in the single-GPU setting, our bootstrapping achieves an 18x - 20x speedup compared to a 64-thread server-class CPU; by using 8 GPUs, the throughput can be further improved by 7x compared to the single-GPU implementation, confirming the scalability of our design. Furthermore, compared to the SoTA GPU solution TFHE-rs, we achieve a maximum speedup of 1.69x in AND gate evaluation. Finally, we benchmark several private machine learning applications, showing real-time solutions for (1) encrypted neural network inference for MNIST in 0.04 seconds per image, which is the fastest implementation to our knowledge.(2) private decision trees in 0.38 seconds for Iris dataset, where as prior 16 cores CPU implementation (Lu et al., IEEE S&P 2021) required 1.87 seconds; These results highlight the effectiveness and efficiency of our GPU-acceleration in real-world applications.As a technical highlight, we design a novel parallelization strategy tailored for FHEW/TFHE bootstrapping, allowing an automated optimization that partitions bootstrapping into multiple GPU thread blocks. This is necessary for FHEW/TFHE bootstrapping with scalable parameters, where the whole bootstrapping process may not fit into a single thread block. With this, our accelerator can support a broader range of parameters, making it ideal for upcoming privacy-preserving applications.
BibTeX
@article{tches-2024-34873,
  title={GPU Acceleration for FHEW/TFHE Bootstrapping},
  journal={IACR Transactions on Cryptographic Hardware and Embedded Systems},
  publisher={Ruhr-Universität Bochum},
  volume={2025},
  pages={314-339},
  url={https://tches.iacr.org/index.php/TCHES/article/view/11931},
  doi={10.46586/tches.v2025.i1.314-339},
  author={Yu Xiao and Feng-Hao Liu and Yu-Te Ku and Ming-Chien Ho and Chih-Fan Hsu and Ming-Ching Chang and Shih-Hao Hung and Wei-Chao Chen},
  year=2024
}