On the Performance of GPU Public-Key Cryptography



Graphics processing units (GPUs) have become increasingly popular over the last years as a cost-effective means of accelerating various computationally intensive tasks. We study the particular case of modular exponentiation, the crucial operation behind most modern public-key cryptography algorithms. We focus our attention on the NVIDIA GT200 architecture, currently one of the most popular for general purpose GPU computation.

We report our efforts to run modular exponentiation faster than any other method we were aware of for GPUs. Part of our performance advantage results from a different interleaving of the Montgomery multiplication, which was neglected in previous literature. The other part comes from carefully exploring general techniques, like loop unrolling and inline PTX assembly. Our throughput results, at over 20000 RSA-1024 decryptions per second or 41426 512-bit modular exponentiations per second, present a significant speedup over previous GPU implementations, without any significant latency penalty.

Lastly, we evaluate our results in light of several popular metrics, namely performance/price and performance/watt ratios. We find that, while current GPUs generally perform better than CPUs, they show worse performance/watt ratios.


CUDA, GPGPU, Montgomery multiplication, Modular exponentiation, RSA




International Conference on Application-Specific Systems, Architectures and Processors, January 2011

PDF File

Cited by

Year 2015 : 4 citations

 Moon Sung Lee, Yongje Lee, Jung Hee Cheon, Yunheung Paek. "Accelerating bootstrapping in FHEW using GPUs." In IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2015. pp. 128--135.

 Emmart, N and Weems, C. "Pushing the Performance Envelope of Modular Exponentiation Across Multiple Generations of GPUs." In IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2015. pp. 166--176.

 Yang Yang, Zhi Guan, Huiping Sun, Zhong Chen. "Accelerating RSA with Fine-Grained Parallelism Using GPU." In Information Security Practice and Experience - 11th International Conference, ISPEC 2015, Beijing, China, May 5-8, 2015, Proceedings, pp 454-468.

 Ryosuke Sakai, Koji Nakano, and Yasuaki Ito. "Accelerating RSA encryption using GPUs." In Bulletin of Networking, Computing, Systems, and Software. Volume 4, Number 1, pages 69–73, January 2015.

Year 2014 : 3 citations

 Mohammed Fadhil, Heba; Issam Younis, Mohammed. Parallelizing RSA Algorithm on Multicore CPU and GPU. International Journal of Computer Applications, vol. 87, issue 6, pp. 15-22-

 F Zheng, W Pan, J Lin, J Jing, Y Zhao, Exploiting the Floating-Point Computing Power of GPUs for RSA, Information Security, 2014

 Fangyu Zheng, Wuqiong Pan , Jingqiang Lin, Jiwu Jing, Yuan Zhao. "Exploiting the Potential of GPUs for Modular Multiplication in ECC" In Information Security Applications - 15th International Workshop, WISA 2014, Jeju Island, Korea, August 25-27, 2014. Revised Selected Papers.

Year 2013 : 2 citations

 Gorawski, Marcin, Michal Lorek, and Anna Gorawska. "CUDA Powered User-Defined Types and Aggregates." In 2013 27th International Conference on Advanced Information Networking and Applications Workshops (WAINA), 2013.

 N Emmart, C Weems, Toward Automatic Optimized Code Generation for Multiprecision Modular Exponentiation on a GPU, Proceedings of the 2013 IEEE 27th International …, 2013

Year 2012 : 1 citations

 Henry, Ryan, and Ian Goldberg. "Solving discrete logarithms in smooth-order groups with CUDA." In Workshop Record of SHARCS, pp. 101-118. 2012.