



# A High-Speed Unified Hardware Architecture for AES and the SHA-3 Candidate Grøstl

#### Marcin Rogawski Kris Gaj

Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu Department of ECE, Volgenau School of Engineering, George Mason University, Fairfax, VA, USA

15th EUROMICRO Conference on Digital System Design – DSD'12









2 AES and Grøstl similarities









Introduction Related work Methodology



## Motivation: SHA-3 Contribution

- SHA-3 Competition and evaluation criteria (security, software and hardware performance, flexibility)
- Final round candidates: Blake, Grøstl, JH, Keccak, Skein

NIST webpage:

"NIST also plans to (...) select the SHA-3 winner later in 2012."



Introduction AES and Grøstl similarities

Coprocessor

Conclusions

Results

Introduction Related work Methodology



### Motivation: Application to IPSec

| Protocol | Security Service         | Supported Algorithms       |
|----------|--------------------------|----------------------------|
|          | Provided                 |                            |
| ESP      | confidentiality, in-     | e.g.: (AES-CTR or AES-CBC) |
|          | tegrity, and data origin | and (HMAC-SHA or AES-      |
|          | authentication           | XCBC-MAC)                  |
| AH       | integrity and data ori-  | e.g.: HMAC-SHA or AES-     |
|          | gin authentication       | XCBC-MAC                   |
| IKE      | negotiates connection    | e.g.: DH and AES-PRNG      |
|          | parameters               |                            |



Introduction AES and Grøstl similarities Coprocessor

Introduction Related work Methodology



# Motivation: SHA-3 Competition (round 3) -Comprehensive studies

- Software benchmarking
  - General CPUs: eBASH Bernstein and Lange
  - Microcontrollers: XBX Wenzel-Benner and Gräf

Results

#### Hardware benchmarking

- ASICs: Guo et al. DSD'11, DATE'12, Gürkaynak et al. SHA-3'12
- **FPGA high-speed:** Homsirikamol et al. CHES'11, Mahboob et al. SHA-3'12
- FPGA high-speed: (embedded-resources) Sharif et al. ECRYPT II'11, Shahid et al. FPT'11
- **FPGA low-area:** Kerckhof et al. CARDIS'11, Kaps et al. Indocrypt'11, Jungk et al. ReConFig'11



Introduction Related work Methodology



#### Motivation: SHA-3 candidates - Unique Features

#### Hash function mode of operation:

• Skein in tree hash mode: Schorr et al. ReConFig'10

#### Hash function and block cipher in a single core:

- Skein and Threefish: At et al. NTMS'12
- Fugue (round 2) and AES: Järvinen SHA-3'10
- Grøstl-0 (round 2) and AES: Järvinen SHA-3'10
- Grøstl (round 3) and AES: [This work]



Introduction Related work Methodology



# Methodology 1/2

- Grøstl Homsirikamol et al. (CHES'11), AES -(Cryptographic Engineering ch.10) - starting points
- http://cryptography.gmu.edu/athena/index.php?id= source\_codes
- A Coprocessor for authenticated encryption based on HMAC-Grøstl and AES-CTR mode
- FIFO-based interface
- Long messages analysis comparison to existing designs
- Short messages analysis application for IPSec (up to 1536 bytes)



#### Introduction

AES and Grøstl similarities Coprocessor Results Conclusions Introduction Related work Methodology



# Methodology 2/2

- High-speed FPGA devices: Altera (Stratix III and
  - Štratix IV) and Xilinx (Virtex-5 and Virtex-6)
- Low-cost 65nm Altera Cyclone III (previous work comparison)
- Altera Quartus 11.1 and Xilinx ISE 13.1

#### ATHENa - Automated Tool for Hardware EvaluatioN

http://cryptography.gmu.edu/athena



Benchmarking open-source tool, written in Perl, aimed at an AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms

Currently under development at George Mason University.



AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



#### Block cipher in Counter mode





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



Hash-based message authentication code (HMAC)





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



#### AES and Grøstl rounds



#### Tweaks on Grøstl (P and Q differences):

http://www.groestl.info/Round3Mods.pdf



AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



#### Grøstl's hardware architectures

Parallel Architecture

Basic Iterative Architecture Jarvinen 2010 **Quasi-Pipelined Architecture** 





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



#### AES-128 and Grøstl-256

|               | Grøstl-256       | AES-128             |
|---------------|------------------|---------------------|
| functionality | hash function    | block cipher        |
| rounds        | 10               | 10                  |
| block size    | 512              | 128 <sup>1</sup>    |
| finalization  | yes              | no <sup>2</sup>     |
| data pipes    | double (P and Q) | single <sup>3</sup> |

#### Comments:

 $^1$  four instances of AES in parallel needed (non-feedback modes only),  $^2$  Grøstl's output transformation (finalization) will affect short messages throughput,  $^3$  Grøstl's quasi-pipelined architecture has to be used.



AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



#### AES/Grøstl together



M.Rogawski, K. Gaj

A High-Speed Unified Hardware Architecture ...



AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



### AES/Grøstl together - SubBytes



M.Rogawski, K. Gaj

A High-Speed Unified Hardware Architecture ... 15



AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



### AES/Grøstl together - ShiftRows/ShiftBytes





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



### AES/Grøstl together - AddRoundKey/AddConstant





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



#### AES/Grøstl together - MixColumn/MixBytes





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



#### AES/Grøstl together - AES last round





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



### AES/Grøstl together - 3rd pipeline stage





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



#### AES/Grøstl together - The Counter from AES-CTR





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



### AES/Grøstl together - The AES Key Expansion Unit





AES and Grøstl applications AES and Grøstl similarities AES and Grøstl differences AES and Grøstl together



### AES/Grøstl together - Proposed Design





Inner Pipelining High-level scheduling



# Inner Pipelining (Receiver side)





Inner Pipelining High-level scheduling



### Core Interface





Inner Pipelining High-level scheduling



### High-level scheduling



DSD'12



Inner Pipelining High-level scheduling



# Throughput for long messages

$$throughput = \frac{blocksize}{T * (Time_{HE}(N+1) - Time_{HE}(N))}$$
(1)

$$throughput_{long} = \frac{blocksize}{cycles * T}$$
(2)

#### HMAC-Grøstl-256 and AES-128-CTR core parameters:

block size = 512, cycles = 31  $Time_{HE}(i)$  - time for Hash/Encryption process of *i*-th block of data



Inner Pipelining High-level scheduling



#### HMAC-Grøstl - Throughput for short messages



#### Five additional blocks come from:

Two HMAC-Key injections [1-2], first hashing result [3], Grøstl finalization operation [4-5].



Comparison to Järvinen design **High-Speed FPGA results** 



Comparison to Järvinen on Cyclone III - Area





Comparison to Järvinen design High-Speed FPGA results



Comparison to Järvinen on Cyclone III - Frequency





Comparison to Järvinen design High-Speed FPGA results



Comparison to Järvinen on Cyclone III - Throughput





Comparison to Järvinen design High-Speed FPGA results



Comparison to Järvinen on Cyclone III - Throughput/Area





Comparison to Järvinen design High-Speed FPGA results



## High-Speed FPGA - Area





Comparison to Järvinen design High-Speed FPGA results



### High-Speed FPGA - Frequency





Comparison to Järvinen design High-Speed FPGA results



# High-Speed FPGA - Throughput for long messages





Comparison to Järvinen design High-Speed FPGA results



#### HMAC-Grøstl for short messages





Conclusions



# Conclusions

- Coprocessor with 3 pipeline stages pays relatively small penalty in terms of extra circuitry (+31% in Virtex-5) and throughput drop (-29% in Virtex-5)
- Proposed coprocessor can be used directly for IPSec HMAC-Grøstl with AES-CTR and possibly any non-feedback mode
- It can be implemented even on the smallest device in high-speed families from Altera (Stratix-III, Stratix-IV) and Xilinx (Virtex-5 and Virtex-6)
- Altera Cyclone III-based coprocessor outperform alternative design by 57% and 11% in case of authenticated encryption (ESP) and authentication (AH), respectively
- Grøstl's similarity to AES will be beneficiary in hardware in case of "SHA-3 ← Grøstl"





Conclusions





#### Our resources:

CERG: https:/cryptography.gmu.edu ATHENa: https:/cryptography.gmu.edu/athena ATHENaDB: https://cryptography.gmu.edu/athenadb



# Backup slides

Introduction AES and Grøstl similarities Coprocessor Results Conclusions

Conclusions



Backup slides from here



Conclusions



#### AES/Grøstl together - Shared MixBytes





Comparison to Groestl and 2xAES-CTR Separately on Cyclone III - Throughput





Conclusions



Comparison to Groestl and 2xAES-CTR Separately on Cyclone III - Area





Conclusions



Comparison to Groestl and 2xAES-CTR Separately on Cyclone III - Throughput/Area

