A simple, arena-backed, generic dynamic array for C

habibur · on Oct 6, 2023

And if you are not using arena -- realloc() is your best bet. And no, it won't move your memory around into larger areas every time you call it. Most of the time it will simply bump up the allocation end point. This happens more frequently than you think.

In my code I don't use allocated_length and used_length seperately in dynamic vectors. Simply realloc() every time one element is added. No measurable difference in speed, and code is much more clean.

abainbridge · on Oct 6, 2023

> no, it won't move your memory around into larger areas every time you call it. Most of the time it will simply bump up the allocation end point. This happens more frequently than you think.

I just tried some simple "I want to grow my block of memory" test cases. A malloc of 1000 bytes couldn't be increased by a single byte without being moved to a new location. An allocation of 1 million bytes couldn't be increased by 1% without being moved to a new location.

https://godbolt.org/z/Gr4bchxv5

habibur · on Oct 6, 2023

    #include <stdio.h>
    #include <stdlib.h>

    int main(){
    int same=0;
    int moved=0;
    char* temp=malloc(16);
    for(int i=0; i<10000; i++){
        char* temp2=realloc(temp,(i+2)*16);
        if(temp!=temp2){
            moved++;
            temp=temp2;
        }
        else same++;
    }
    printf("ptr moved %d, reused %d times\n",moved,same);
    }

Tried the above. Ptr moved 7 times and is reused 9993 times.

But, what you want to test is what you identify as your need.

No point in explaining more.

nwellnhof · on Oct 6, 2023

> No measurable difference in speed, and code is much more clean.

This only works with some malloc implementations. There are platforms where this will result in quadratic behavior, so don't do this if you want to write portable code.

jandrese · on Oct 6, 2023

Yeah, I was just thinking that line of thought was notorious in the past for some serious performance issues. It's one reason you sometimes see arena allocators doubling the memory every time they have to grow. Might be something to benchmark on your system before using. It's also possible those badly behaving allocators are in the past and this is fine to do now.

kevin_thibedeau · on Oct 6, 2023

They're not in the past. There are systems that don't even implement realloc().

gkfasdfasdf · on Oct 6, 2023

One of the benefits of the arena that the OP mentions though is that old pointers are still valid as long as the arena is still valid. That's not necessarily the case with realloc.

Interesting idea to skip explicitly tracking capacity with realloc though. That does seem convenient, and if the allocator is fast, sure, why not?

Someone · on Oct 6, 2023

> realloc() is your best bet. And no, it won't move your memory around into larger areas every time you call it.

That’s true.

> Most of the time it will simply bump up the allocation end point

That depends on your allocator and usage patterns. I would think that, in real world usage, it will reallocate fairly frequently, but don’t have and can’t find data on that.

Reason is that I would expect the allocator to, normally, allocate blocks tight together and most reallocations to grow the size by more than the granularity of the memory allocator (which typically is in the order of 16 bytes)

The latter often applies too if you “don't use allocated_length and used_length seperately in dynamic vectors. Simply realloc() every time one element is added”

Or am I wrong? How large is your initial alloc?

Edit: slab allocators may show such good behavior.

habibur · on Oct 6, 2023

> Or am I wrong? How large is your initial alloc?

Zero. Most of the time. When I don't know the exact requirement before hand.

> Reason is that I would expect the allocator to, normally, allocate blocks tight together and most reallocations to grow the size by more than the granularity of the memory allocator

Now look at it like this. Even if I wrote my own allocator with malloc(), free() and realloc(), I would have written it such a way that the first time I found a pointer getting realloc()-ed I would have added an extra 60% at its end assuming this pointer has a trend to grow. And for subsequent small increment calls use space at the end.

I don't see why C library's allocator writers wouldn't have thought the same. Even if on some platform it's not done for some reason, that can be replaced with your own implementation.

Add with it the fact that OS is also optimizing your memory usage, your allocs are virtual pages mapped to actual physical ram. And who knows how deep these caches go at processor level. You are basically caching over caching and so on.

This is similar to "reading a file in memory" because "accessing from disk is so slow". That's so much 90s thought. Your file will be cached in memory by the OS whenever you access it the first time and you have sufficient free RAM available for cache.

But one doesn't need to assume the above to be true. Simply measure and check for yourself. If your findings don't match your understandings, there are alternate explanations.

jstimpfle · on Oct 6, 2023

> Even if on some platform it's not done for some reason, that can be replaced with your own implementation.

Isn't this exactly what is done in the submitted post?

> But one doesn't need to assume the above to be true. Simply measure and check for yourself. If your findings don't match your understandings, there are alternate explanations.

The problem with this is that this costs extra work to assert, and to assert that it stays true over time.

I know it's the right thing to integrate tightly within a platform, especially when the application has to coexist with others (like in Desktop apps, unlike in Games). But when one doesn't exactly know the platform that the code is going to be running on, or when it needs to run on a multitude of platforms, this integration can be painful.

That's why I often prefer interfaces that give me a lot of control over what happens, or that specify exactly what the implementation does, including the performance characteristics.

lelanthran · on Oct 6, 2023

> And if you are not using arena -- realloc() is your best bet. And no, it won't move your memory around into larger areas every time you call it. Most of the time it will simply bump up the allocation end point. This happens more frequently than you think.

And, as far as I can tell on the most common implementations, `realloc` just does a pointer adjustment when you shrink[1] the array, so the next `realloc` to expand the array again simply does the pointer bump thing.

Over time[2], you effectively get a arena allocator from that single realloc()ed block just by expanding and shrinking the array.

[1] Just don't call `realloc (0, ...)`

[2] For my next project I'm going to make an arena-allocator interface that is just a thin wrapper around realloc.

PhilipRoman · on Oct 6, 2023

Would be interesting to check how often realloc actually reuses pointers (and for what allocation sizes this happens) in real world applications.

asicsp · on Oct 6, 2023

See also: An easy-to-implement, arena-friendly hash map https://nullprogram.com/blog/2023/09/30/

Discussion: https://news.ycombinator.com/item?id=37720976

fovc · on Oct 6, 2023

What’s the difference between this and other dynamic array implementations? I thought they were all data/length/capacity headers with exponential resizing.

I know when using realloc the old buffer is recycled unlike with this arena, but that seems like more of an arena vs. Malloc distinction, no?

In contrast the hash map discussion was more interesting to me because it’s a different data structure that’s explicitly arena friendly

fovc · on Oct 6, 2023

> On average they consume about twice the memory of a fixed array of the same size.

Not to split hairs, but that’s just the “unfreed” memory overhead.

If you have 2^n+1 elements, you’ve provisioned 2^(n+1) slots and leaked ~2^(n+1) on top, making the actual overhead there closer to 4x.

I think that means the average overhead is more like 3x (sometimes 4x, sometimes 2x).

yobertz · on Oct 6, 2023

I think I would make grow() like this to prevent the mistake of accidentally laying out the array fields in the wrong order.

  void grow(void **ptr, ptrdiff_t *cap, ptrdiff_t *len, arena *a) ...

  #define push(s, a) ... grow((void **)&s->data, &s->cap, &s->len)

jstimpfle · on Oct 6, 2023

I don't think you're allowed to access/modify a typed pointer through a void **.

jstimpfle · on Oct 6, 2023

The memcpy() from the "specific" slice to the "generic" slice in grow() is technically UB I think. Those slice structures are not the same types.

Furthermore, a void pointer can be implemented "differently" than a typed pointer. The size of pointers can (theoretically) be different for different types. Any typed pointer roundtrips safely to a void pointer and back, but the inverse doesn't necessarily hold.

Then again, not sure which implementations might make problems here. For example, keeping a typed version of a void pointer returned by malloc() and using it to free() later is common practice.

PhilipRoman · on Oct 6, 2023

Memcpy between structures is quite safe (as opposed to simply casting between pointers which is what usually triggers the type based aliasing rules)

jstimpfle · on Oct 6, 2023

Casting to a void pointer is perfectly fine though, so maybe that should be the preferred way to pass the pointer to the allocation. (I would also be interested to know how casting the void pointer returned from generic allocation routines to some typed pointer is justified).

PhilipRoman · on Oct 6, 2023

C (at least c11, probably 99 as well) has a concept called "effective type". Quoting 6.5, paragraph 6:

If a value is store into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. (the same applies to memcpy and memmove) For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

Paragraph 7 then lists all valid ways that an object can be accessed. TL;DR the lvalue must be [qualified/signed/unsigned] effective type of the object OR char OR union including the correct type.

Something does feel missing though as while this explains memory obtained from routines like malloc which initially has no effective type, I cannot find any rule saying it is permissible to treat char arrays as structures (their effective type is an array of char, so they cannot be said to "have no declared type").

amalcon · on Oct 6, 2023

Memcpy is IIRC one of two ways to do type-punning (reinterpretation of bytes as a different type) without undefined behavior in C, and the only one in C++. The other one in C is using a union type (which results in undefined behavior in C++).

The behavior is implementation-defined, for obvious reasons, but that's not the same as undefined.

maccard · on Oct 6, 2023

I appreciate the simplicity of this, and it's a great way to learn how memory management works under the hood. It's also a fun series.

That said, everything in this article is why I prefer C++ to C. All of the scoping is automatic, you get actual type safety, you remove a whole class of bugs around passing in the wrong arena.