Efficient Formally Secure Compilers to a Tagged Architecture

Cătălin Hrițcu
Inria Paris
Computers are insecure

• devastating low-level vulnerabilities

• programming languages, compilers, and hardware architectures
  – designed in an era of scarce hardware resources
  – too often trade off security for efficiency

• the world has changed (2016 vs 1972)
  – security matters, hardware resources abundant
  – time to revisit some tradeoffs
Hardware architectures

• Today’s processors are mindless bureaucrats
  –
  –
  –

• Software bears most of the burden for security

• Manufacturers have started looking for solutions
  – 2015: Intel Memory Protection Extensions (MPX) and Intel Software Guard Extensions (SGX)
  – 2016: Oracle Silicon Secured Memory (SSM)
Unsafe low-level languages

• C (1972) and C++ **undefined behavior**
  – including buffer overflows, checks too expensive
  – compilers optimize aggressively assuming undefined behavior will simply not happen

• **Programmers bear the burden for security**
  – just write secure code ... all of it

---

**[PATCH] CVE-2015-7547 — glibc getaddrinfo() stack-based buffer overflow**

- **Date:** Tue, 16 Feb 2016
- **Subject:** [PATCH] CVE-2015-7547 — glibc getaddrinfo() stack-based buffer overflow
- **Authentication-results:** sourceware.org; auth=none

The glibc project thanks the Google Security Team and Red Hat for reporting the security impact of this issue, and Robert Holiday of Ciena for reporting the related bug 18665.
Safer high-level languages

- **memory safe** (at a cost)

- **useful abstractions** for writing secure code:
  - GC, type abstraction, modules, immutability, ...

- **not immune to low-level attacks**
  - large runtime systems, in C++ for efficiency
  - **unsafe interoperability with low-level code**
    - libraries often have large parts written in C/C++
    - enforcing abstractions all the way down too expensive
Efficient Secure Compilation to Micro-Policies

2nd part of this talk (more speculative)

1. Secure semantics for low-level languages
2. Secure interoperability with lower-level code
   • Formally: fully abstract compilation
     – holy grail, enforcing abstractions all the way down
     – currently this would be way too expensive

• Key enabling technology: micro-policies
  – hardware-accelerated tag-based monitoring

1st part of this talk
MICRO-POLICIES
Micro-Policies team

- **Formal methods & architecture & systems**
- **Current team:**
  - *Inria*: Cătălin Hrițcu, Yannis Juglaret
  - *UPenn*: Arthur Azevedo de Amorim, André DeHon, Benjamin Pierce, Nick Roessler, Antal Spector-Zabusky
  - *Portland State*: Andrew Tolmach
  - *MIT*: Howard E. Shrobe, Stelios Sidiropoglou-Douskos
  - *Industry*: Draper Labs, Bluespec Inc
- **Spinoff of past project:**
  DARPA CRASH/SAFE (2011-2014)
Micro-policies

• add **large tag** to each machine word

<table>
<thead>
<tr>
<th>word</th>
<th>tag</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>

(protected metadata)

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

• words in memory and registers are all tagged

<table>
<thead>
<tr>
<th>pc</th>
<th>tag</th>
</tr>
</thead>
<tbody>
<tr>
<td>r0</td>
<td>tag</td>
</tr>
<tr>
<td>r1</td>
<td>tag</td>
</tr>
<tr>
<td>r2</td>
<td>tag</td>
</tr>
<tr>
<td>mem[0]</td>
<td>tag</td>
</tr>
<tr>
<td>mem[1]</td>
<td>tag</td>
</tr>
<tr>
<td>mem[2]</td>
<td>tag</td>
</tr>
<tr>
<td>mem[3]</td>
<td>tag</td>
</tr>
</tbody>
</table>

*Conceptual model, our hardware implements this efficiently*
Tag-based instruction-level monitoring

<table>
<thead>
<tr>
<th>pc</th>
<th>tpc</th>
</tr>
</thead>
<tbody>
<tr>
<td>r0</td>
<td>tr0</td>
</tr>
<tr>
<td>r1</td>
<td>tr1</td>
</tr>
<tr>
<td>r2</td>
<td>tr2</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>mem[0]</th>
<th>tm0</th>
</tr>
</thead>
<tbody>
<tr>
<td>mem[1]</td>
<td>tm1</td>
</tr>
<tr>
<td>mem[2]</td>
<td>tm2</td>
</tr>
<tr>
<td>mem[3]</td>
<td>tm3</td>
</tr>
</tbody>
</table>

decode(mem[1]) = add r0 r1 r2

add

monitor

allow (tpc', tr0')
Tag-based instruction-level monitoring

decode(mem[1]) = store r0 r1

bad action stopped!
Efficiently executing micro-policies

lookup → zero overhead hits!

hardware cache
Efficiently executing micro-policies

<table>
<thead>
<tr>
<th>op</th>
<th>tpc</th>
<th>t1</th>
<th>t2</th>
<th>t3</th>
<th>tci</th>
</tr>
</thead>
<tbody>
<tr>
<td>tpc’</td>
<td>tr</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

lookup → misses trap to software

<table>
<thead>
<tr>
<th>op</th>
<th>tpc</th>
<th>t1</th>
<th>t2</th>
<th>t3</th>
<th>tci</th>
</tr>
</thead>
<tbody>
<tr>
<td>tpc’</td>
<td>tr</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>tpc’</td>
<td>tr</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>tpc’</td>
<td>tr</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>tpc’</td>
<td>tr</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

hardware cache
Micro-policies are cool!

- **low level + fine grained**: unbounded per-word metadata, checked & propagated on each instruction
- **efficient**: accelerated using hardware caching
- **expressive**: can enforce large number of policies
- **flexible**: tags and monitor defined by software
- **secure**: simple enough to formally verify security
- **real**: FPGA implementation on top of RISC-V CPU

DRAPER  bluespec®
Expressiveness

- information flow control (IFC)
- monitor self-protection
- compartmentalization
- dynamic sealing
- heap memory safety
- code-data separation
- control-flow integrity (CFI)
- taint tracking
- ...

Verified (in Coq)

Evaluated (<10% runtime overhead)
Flexibility (by example)

• **Heap memory safety** micro-policy prevents
  – **spatial violations**: reading/writing out of bounds
  – **temporal violations**: use after free, invalid free
  – for **heap-allocated data**

• **Pointers become unforgeable capabilities**
  – can only obtain a valid pointer to a heap region
    • by allocating that region or
    • by copying/offsetting an existing pointer to that region
Memory safety micro-policy

\[ p \leftarrow \text{malloc} \ k \]
\begin{align*}
\text{fresh } & \ c \\
\text{(e.g. } & \ ++c) \\
\end{align*}
\[ p = \text{A8F0}@\text{ptr}(c) \]
\[ q \leftarrow p + k \]
\[ \text{free } p \]

\[ T_v ::= i \mid \text{ptr}(c) \]
\[ T_m ::= M(c, T_v) \mid F \]

\begin{tabular}{|c|c|}
\hline
0 & \(0@M(c,i)\) \\
1 & \(0@M(c,i)\) \\
2 & \ldots \ \\
k-1 & \(0@M(c,i)\) \\
k & \(7@M(c',i)\) \\
\hline
\end{tabular}
## Memory safety micro-policy

### Micro-Policies
- + can adapt to new threats
- + expressive enough for efficient secure compilation

### Oracle Silicon Secured Memory (2016)
- similar, but with only 16 colors

### Intel MPX cannot detect this

### Tags on values
\[ T_v ::= i \mid \text{ptr}(c) \]

### Tags on memory
\[ T_m ::= M(c,T_v) \mid F \]

### Example
- Freeing a region

<table>
<thead>
<tr>
<th>0</th>
<th>1</th>
<th>( \cdots )</th>
<th>k-1</th>
<th>k</th>
</tr>
</thead>
<tbody>
<tr>
<td>( \text{7@F} )</td>
<td>( \text{0@F} )</td>
<td>( \cdots )</td>
<td>( \text{0@F} )</td>
<td>( \text{7@M(c',i)} )</td>
</tr>
</tbody>
</table>

- Accessing a region

\[ q \leftarrow p + k \]

- Using an invalid pointer

\[ c \neq c' \]

\[ !q \leq 42 \]

- Out of bounds

- Use after free

- Intel MPX cannot detect this
SECURE COMPILATION
Joint work with Yannis Juglaret
Secure compilation

• **Goal**: to build the *first efficient secure compilers for realistic programming languages*

1. Secure semantics for low-level languages
   – C with memory safety and compartmentalization

2. Secure interoperability with lower-level code
   – ASM, C, ML, and F* (verification system for ML)
   – problems are quite different at different levels

• Formally: **fully abstract compilation**
   – enforcing abstractions all the way down
Benefits: can reason about security in the source language; forget about compiler, linker, loader, runtime system, and (to some extent) low-level libraries
Very long term vision

- **F***
  - **F*** component
  - **SecF***

- **ML**
  - **ML component**
  - **SecML**

- **C variants**
  - **safe C component**
  - **legacy C component**

- **ASM (RISC-V+ P)**
  - **compiled F*** component**
  - **compiled ML component**
  - **compiled safe C component**
  - **compiled legacy C comp**
  - **ASM component**

**compartmentalization boundaries**
Low-level compartmentalization

• Break up software into **mutually distrustful components** running with **minimal privileges** & interacting only via **well-defined interfaces**
• **Limit the damage** of control hijacking attacks to just the C or ASM components where they occur
• Not a new idea, already deployed in practice:
  – process-level privilege separation
  – software-fault isolation
• Micro-policies can give us **better interaction model**
• We also aim to **show security formally**
Compartmentalized C

- Want to add components with typed interfaces to C
- Compiler (e.g. CompCert), linker, loader propagate interface information to low-level memory tags
  - 
  - 
- Micro-policy enforcing:
  - component isolation
  - procedure call discipline (entry points)
  - stack discipline for returns (linear return capabilities)
  - type safety on cross-component interaction

Compartmentalization micro-policy

memory

<table>
<thead>
<tr>
<th></th>
<th>Jal r</th>
<th>...</th>
<th>...</th>
<th>@EntryPoint</th>
</tr>
</thead>
<tbody>
<tr>
<td>C₁</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| Store rₐ ⋆rₘ |       |     |     |             |
| C₂            |       |     |     |             |
| Load ⋆rₘ rₐ |       |     |     |             |
| Jump rₐ      |       |     |     |             |

registers

<table>
<thead>
<tr>
<th>pc</th>
<th>...</th>
<th>r</th>
</tr>
</thead>
</table>

<table>
<thead>
<tr>
<th>changed color</th>
</tr>
</thead>
<tbody>
<tr>
<td>@Ret n</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>pc</th>
<th>rₐ</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>@Ret n</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>pc</th>
<th>rₐ</th>
<th>rₘ</th>
</tr>
</thead>
<tbody>
<tr>
<td>@(n+1)</td>
<td>pc</td>
<td>rₐ</td>
</tr>
<tr>
<td>@Ret n</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>pc</th>
<th>rₐ</th>
<th>rₘ</th>
</tr>
</thead>
<tbody>
<tr>
<td>@(n+1)</td>
<td>pc</td>
<td>rₐ</td>
</tr>
<tr>
<td>@Ret n</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>pc</th>
<th>rₐ</th>
<th>rₘ</th>
</tr>
</thead>
<tbody>
<tr>
<td>@n</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

PC stack level linear return capability

<table>
<thead>
<tr>
<th>pc</th>
<th>rₐ</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>@n</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

\(\text{invariant:} \) at most one return capability per call stack level

<p>| | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>@Ret n</td>
<td></td>
</tr>
</tbody>
</table>

cross-component call only allowed at EntryPoint

increment

\(\text{cross-component return only allowed via return capability} \)

loads and stores to the same component always allowed
Secure compartmentalization property

∀ compromise scenarios.

∀ low-level attack from compromised $C_2$, $C_4$, $C_5$
∃ high-level attack from some fully defined $A_2$, $A_4$, $A_5$

[Beyond full abstraction, Juglaret, Hritcu, et al, draft’16]
Protecting higher-level abstractions

• **ML abstractions we want to enforce with micro-policies**
  – types, value immutability, opaqueness of closures, parametricity (dynamic sealing), GC vs malloc/free, ...

• **F*: enforcing full specifications using micro-policies**
  – some can be turned into **contracts**, checked dynamically
  – fully abstract compilation of F* to ML trivial for ML interfaces (because F* allows and tracks effects, as opposed to Coq)

• **Limits of purely-dynamic enforcement**
  – functional purity, termination, relational reasoning
  – push these limits further and combine with static analysis
Secure compilation

• Solving conceptual challenges
  – Secure micro-policy composition
  – Higher-level micro-policies (for C and ML)
  – Formalizing security properties (i.e. attacker models)

• Building the first efficient secure compilers for realistic programming languages
  – C (CompCert): memory safety & compartmentalization
  – ML and F*: protecting higher-level abstractions

• Measuring & lowering the cost of secure compilation
  – better hardware, hybrid enforcement (static + dynamic), weaker properties (robust compilation)

• Showing formally that these compilers are indeed secure
Conclusion

• There is a pressing practical need for ...
  – more secure languages providing strong abstractions
  – more secure compiler chains protecting these abstractions
  – more secure hardware making the cost of all this acceptable
  – clear attacker models & strong formal security guarantees

• Building the first efficient secure compilers for realistic programming languages (C, ML, F*)

• Targeting micro-policies = new mechanism for hardware-accelerated tag-based monitors

Thank you!