CryptoDB
Maxim Deryabin
ORCID: 0000-0002-6761-3667
Publications
Year
Venue
Title
2025
TCHES
REED: Chiplet-based Accelerator for Fully Homomorphic Encryption
Abstract
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they encounter common challenges associated with large-scale chip design, such as inflexibility, low yield, and high manufacturing costs. In this paper, we present the first-of-its-kind multi-chiplet-based FHE accelerator ‘REED’ for overcoming the limitations of prior monolithic designs. To utilize the advantages of multi-chiplet structures while matching the performance of larger monolithic systems, we propose and implement several novel strategies in the context of FHE. These include a scalable chiplet design approach, an effective framework for workload distribution, a custom inter-chiplet communication strategy, and advanced pipelined Number Theoretic Transform and automorphism design to enhance performance.Our instruction-set and power simulations experiments with a prelayout netlist indicate that REED 2.5D microprocessor consumes 96.7mm2 chip area, 49.4Waverage power in 7nm technology. It could achieve a remarkable speedup of up to 2,991x compared to a CPU (24-core 2xIntel X5690) and offer 1.9x better performance, along with a 50% reduction in development costs when compared to state-of-the-art ASIC FHE accelerators. Furthermore, our work presents the first instance of benchmarking an encrypted deep neural network (DNN) training. Overall, the REED architecture design offers a highly effective solution for accelerating FHE, thereby significantly advancing the practicality and deployability of FHE in real-world applications.
2024
CRYPTO
Exploring the Advantages and Challenges of Fermat NTT in FHE Acceleration
Abstract
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a \emph{multiplier-less} number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation.
We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient multivariate polynomial multiplication. For practical homomorphic encryption application benchmarks, the hardware accelerator achieves a 1,200$\times$ speed-up compared to software implementations. Finally, we conclude the paper by discussing the advantages and limitations of the proposed polynomial multiplication method.
2023
EUROCRYPT
Efficient FHEW Bootstrapping with Small Evaluation Keys, and Applications to Threshold Homomorphic Encryption
Abstract
There are two competing approaches to bootstrap the FHEW fully homomorphic encryption scheme (Ducas and Micciancio, Eurocrypt 2015) and its variants: the original AP/FHEW method, which supports arbitrary secret key distributions, and the improved GINX/TFHE method, which uses much smaller evaluation keys, but is directly applicable only to binary secret keys, restricting the scheme's applicability.
In this paper, we present a new bootstrapping procedure for FHEW-like encryption schemes that achieves the best features of both methods: support for arbitrary secret key distributions at no additional runtime costs, while using small evaluation keys. (Support for arbitrary secret keys is critical in a number of important applications, like threshold and some multi-key homomorphic encryption schemes.) As an added benefit, our new bootstrapping procedure results in smaller noise growth than both AP and GINX, regardless of the key distribution.
Our improvements are both theoretically significant (offering asymptotic savings, up to a $O(log n)$ multiplicative factor, either on the running time or public evaluation key size), and practically relevant. For example, for a concrete 128-bit target security level, we show how to decrease the evaluation key size of the best previously known scheme by more than 30\%, while also slightly reducing the running time. We demonstrate the practicality of the proposed methods by building a prototype implementation within the PALISADE/OpenFHE open-source homomorphic encryption library. We provide optimized parameter sets and implementation results showing that the proposed algorithm has the best performance among all known FHEW bootstrapping methods in terms of runtime and key size. We illustrate the benefits of our method by sketching a simple construction of threshold homomorphic encryption based on FHEW.
2023
TCHES
ModHE: Modular Homomorphic Encryption Using Module Lattices: Potentials and Limitations
Abstract
The promising field of homomorphic encryption enables functions to be evaluated on encrypted data and produce results for the same computations done on plaintexts. It, therefore, comes as no surprise that many ventures at constructing homomorphic encryption schemes have come into the limelight in recent years. Most popular are those that rely on the hard lattice problem, called the Ring Learning with Errors problem (RLWE). One major limitation of these homomorphic encryption schemes is that in order to securely increase the maximum multiplicative depth, they need to increase the polynomial-size (degree of the polynomial ring) thereby also ncreasing the complexity of the design. We aim to bridge this gap by proposing a homomorphic encryption (HE) scheme based on the Module Learning with Errors problem (MLWE), ModHE that allows us to break the big computations into smaller ones. Given the popularity of module lattice-based post-quantum schemes, it is an evidently interesting research endeavor to also formulate module lattice-based homomorphic encryption schemes. While our proposed scheme is general, as a case study, we port the well-known RLWE-based CKKS scheme to the MLWE setting. The module version of the scheme completely stops the polynomial-size blowups when aiming for a greater circuit depth. Additionally, it presents greater opportunities for designing flexible, reusable, and parallelizable hardware architecture. A hardware implementation is provided to support our claims. We also acknowledge that as we try to decrease the complexity of computations, the amount of computations (such as relinearizations) increases. We hope that the potential and limitations of using such a hardware-friendly scheme will spark further research.
Coauthors
- Aikata Aikata (3)
- Rakyong Choi (1)
- Maxim Deryabin (4)
- Jieun Eom (1)
- HyungChul Kang (1)
- Andrey Kim (2)
- Sunmin Kwon (3)
- Yongwoo Lee (2)
- Ahmet Can Mert (3)
- Daniele Micciancio (1)
- Anisha Mukherjee (2)
- Sujoy Sinha Roy (3)
- Donghoon Yoo (1)