Efficient FPGA-based multipliers for F_{3^97} and F_{3^{6*97}}
In this work we present a new structure for multiplication in finite fields. This structure is based on a digit-level LFSR (Linear Feedback Shift Register) multiplier in which the area of digit-multipliers are reduced using the Karatsuba method. We compare our results with the other works in the literature for F_{3^97}. We also propose new formulas for multiplication in F_{3^{697}}. These new formulas reduce the number of F_{3^97}-multiplications from 18 to 15. The fields F_{3^{97}} and F_{3^{697}} are relevant in the context of pairing-based cryptography.
💡 Research Summary
The paper addresses the need for fast arithmetic in the ternary extension fields F₃⁹⁷ and F₃⁶·⁹⁷, which are essential for pairing‑based cryptographic protocols. The authors propose a hardware architecture that combines a digit‑level Linear Feedback Shift Register (LFSR) multiplier with the Karatsuba multiplication algorithm to reduce the area of the digit‑level multipliers.
In the first part, the authors describe the representation of elements in F₃⁹⁷ using a polynomial basis with the irreducible polynomial f(x)=x⁹⁷+x¹⁶+2. Each ternary coefficient is encoded as a two‑bit vector, allowing addition, multiplication, and negation to be implemented with only two LUTs per operation. The digit‑level LFSR multiplier processes the operands by splitting them into D‑bit “digits”. In each clock cycle the most significant digit of the second operand is multiplied by all digits of the first operand, the partial product is shifted by D bits (equivalent to multiplication by xᴰ), and the result is reduced modulo f(x) through the feedback network.
The novelty lies in replacing the naïve O(D²) digit‑multiplier with a Karatsuba‑based implementation. For two‑term polynomials the Karatsuba formula reduces the number of base field multiplications from four to three, and recursive application yields an O(D¹·⁵⁹) complexity. The authors explore several digit sizes (D = 2, 4, 7, 14) and various hybrid combinations of classical (C) and Karatsuba (K) methods, denoted KC, KKC, etc. Table 1 reports the synthesis results on a Xilinx XC2VP20‑6FF896 device: the best trade‑off (D = 7, KKC) uses 4006 slices, runs at 72 MHz, and needs only seven clock cycles. Compared with the prior work
Comments & Academic Discussion
Loading comments...
Leave a Comment