Traditionally, pointers are simply a number containing the address of a location in memory. CHERI capabilities augment this by adding several pieces of metadata:
- bounds, the range within which the pointer is allowed to be moved
- permissions, how the pointer is allowed to be used (read, write, execute)
- object type
- validity tag, a single bit which controls whether the pointer can be used
Rust defines a type, usize
, which is an integer type the same size as a pointer.
Because capabilities significantly change the representation and behavior of pointers, any Rust port to a CHERI-enabled architecture will have to change the definition of usize.
The definition of usize
serves a number of uses:
- array indexing (
data[4]
) - representing sizes of objects (
std::mem::size_of::<T>()
) - storing addresses for more complex pointer arithmetic (
pointer as usize&!1
)
There are two reasonable approaches to solving this:
- make
usize
128 bits wide to match the in-memory size of a capability - make
usize
64 bits wide to match the range of addresses a capability can reference
Solution 1 was explored by Nicholas Sim as part of a masters project, with the conclusion that it lead to compromised performance and was not ideal.
In use cases 1 and 2 it doubles the amount of storage used for no gain, increasing memory overhead and wasting processor time.
In use case 3 it allows “round trip” pointer casts (pointer as usize as *const T
) to work, which is nice, but may result in strange behavior in any code that assumes that a pointer cast to usize
only contains an address.
Solution 2 breaks the assumption that round trip casts can be performed safely (the resulting capability will be invalid), but has the advantage of avoiding all of the pitfalls of solution 1 (performance impact and broken assumptions in existing code).
Our fork of the Rust compiler implements solution 2.
Pointer types (*const T
, *mut T
) are 128 bits wide and represented using CHERI capabilities.
usize
is a 64 bit wide unsigned integer type.
Casting a capability to usize
, &data as *const _ as usize
, will get the address of the memory being pointed to, and discard any metadata.
Casting a usize
to a capability, 0xdead_beef as *const T
, will produce an invalid capability (the validity tag will be unset, dereferencing will trigger an exception).
Discussion about how best to solve this problem in Rust proper is ongoing. In August 2023 we advanced a “pre-RFC” proposing our approach as a solution, which lead to some interesting discussion, especially on Zulip. This issue has previously been talked about on a number of occasions (the pre-RFC includes a list of previous discussions).
One recurring source of concern is the potential for breakage of existing Rust programs. Likely sources of bugs identified so far include:
- casts from
usize
to pointer result in run-time crashes, not compile time errors - various pieces of code, including some in the compiler itself, assume
size_of::<usize>() == size_of::<*const T>()
To help address some of this concern, we organised an experiment using the Rust community's Crater tool.
Crater compiles a very large number of Rust projects (about 380,000) from around the community using a given build of the Rust compiler.
Our experiment used this to apply an already existing lint, fuzzy_provenance_casts
, which detects casts from usize
to pointer types that could later result in exceptions.
We found that less than 1% of the projects tested had potential to generate CHERI exceptions due to casting.
The full analysis is available on GitHub.