For whatever reason1, I wanted to maintain a Vec
of raw pointers to the values of entries in a HashMap<K, Mutex<MaybeUninit<T>>>
. However, occasionally the program would crash with a segmentation fault, or some other error, when I'd try to write some data to the MaybeUninit
value. The gist of what I was doing was something like this:
let mut map = new;
let mut ptrs = Vec new;
for idx in 0.100
for in ptrs.iter .enumerate
You might already see what the issue is with this snippet. I didn't.
Running this code produced a few different results:
- Poisoning
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', src/main.rs:259:14
- Use after free
program(18501,0x100afc580) malloc: Incorrect checksum for freed object 0x14d6069b8: probably modified after being freed.
Corrupt value: 0x14d606b70
program(18501,0x100afc580) malloc: *** set a breakpoint in malloc_error_break to debug
[1] 18501 abort cargo run
- Segmentation fault
[1] 18159 segmentation fault cargo run
Unfortunately, it wasn't immediately obvious why my unsafe block was unsound. Fortunately, it is easy enough to validate that a pointer points to the correct location in memory. Modifying the code above, I came up with:
for idx in 0.100
for in ptrs.iter .enumerate
And the output:
thread 'main' panicked at 'left = 5349861848, right = 5351977768', src/main.rs:263:9
So I have a list of potentially dangling pointers. It took me a while to work out why my original pointers were invalid. And then I remembered that Vec
has a with_capacity
function so that you can minimize allocations by creating a vec with enough capacity to hold some amount of data. The Capacity and reallocation section in the docs state:
The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector. This is not to be confused with the length of a vector, which specifies the number of actual elements within the vector. If a vector’s length exceeds its capacity, its capacity will automatically be increased, but its elements will have to be reallocated.
And
Vec does not guarantee any particular growth strategy when reallocating when full, nor when reserve is called. The current strategy is basic and it may prove desirable to use a non-constant growth factor. Whatever strategy is used will of course guarantee O(1) amortized push.
A few lines of code revealed how capacity increases. There's probably documentation about this somewhere, but I coudn't find it.
let mut list = Vec new;
let mut capacity = list.capacity;
println!;
for i in 0..10000
The above yields:
0
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
So capacity increases exponentially. There is an explanation in the source for why the capacity starts at 4 once allocated:
// Tiny Vecs are dumb. Skip to:
// - 8 if the element size is 1, because any heap allocators is likely
// to round up a request of less than 8 bytes to at least 8 bytes.
// - 4 if elements are moderate-sized (<= 1 KiB).
// - 1 otherwise, to avoid wasting too much space for very short Vecs.
For a HashMap:
0
3
7
14
28
56
112
224
448
896
1792
3584
7168
14336
Maybe similar reasons?
In the first snippet above, I'm inserting 100 entries. So the map is reallocating 5 times after the first allocation. If I had a HashMap with the correct initial capacity, that should solve the problem:
let mut map = with_capacity;
let mut ptrs = Vec new;
for idx in 0.100
for in ptrs.iter .enumerate
That doesn't panic. But this only helps if I already know how many items would be inserted. In the code I was working on, there was no way of knowing.
Box
is "the simplest form of heap allocation in Rust." And is also the solution to my problem. Box is a pointer to a heap-allocated value. If I had a HashMap<K, Box<Mutex<MaybeUninit<T>>>
, then I can get a raw pointer from the Box to the value, instead of a pointer to an item in the map. This would be less efficient because every insert is now an allocation, however.
This also doesn't panic:
let mut map = new;
let mut ptrs = Vec new;
for idx in range
for in ptrs.iter .enumerate
In this snippet, instead of getting a pointer to the value in the map, I have to dereference the value in the map twice so that I can get the location of the boxed value.
To fix the original code, I only needed to change the type being pushed to the HashMap:
let mut map = new;
let mut ptrs = Vec new;
for idx in range
for in ptrs.iter .enumerate
There was a real reason ↩