-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add valgrind memcheck hooks #27
Conversation
Damnit it turns out there's a bug in Valgrind that hasn't been fixed for over a decade (!!!): https://bugs.kde.org/show_bug.cgi?id=233298 Some discussion here: https://stackoverflow.com/questions/30895018/custom-allocator-valgrind-shows-7-allocs-0-frees-no-leaks I might just fork valgrind and apply the suggested patch |
Codecov Report
@@ Coverage Diff @@
## main #27 +/- ##
==========================================
+ Coverage 69.09% 69.32% +0.22%
==========================================
Files 3 3
Lines 233 238 +5
Branches 56 56
==========================================
+ Hits 161 165 +4
- Misses 54 55 +1
Partials 18 18
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds Valgrind hooks to our library to allow Valgrind to track blocks allocated by our pool allocator. I include <valgrind/*.h>
header files, which defines macros that communicate with Valgrind if running under a Valgrind-managed context. This works by inserting magic nop instruction sequences in the machine code that no known compiler generates. Since these are nops, they do nothing when the code is run normally; however, the Valgrind runtime scans for them and dynamically translates them into client requests to talk to Valgrind services. We can opt out of this by compiling this library with -DNVALGRIND
.
Valgrind has a bunch of features, but the core features we're using it for is its memcheck tool, which checks for bad memory accesses and tracks memory leaks.
To track memory errors, Valgrind distinguishes between three memory states: defined, undefined, and inaccessible. Reads and writes on defined memory are ok; reads on undefined memory is considered an error, while writes on undefined memory is ok and promotes it to defined memory; neither reads nor writes on inaccessible memory are ok. These states can be manually adjusted using the VALGRIND_MAKE_MEM_*
macros.
To track memory leaks, Valgrind must also know about what constitutes an allocation. For these, I use the VALGRIND_{MALLOC,FREE}LIKE_BLOCK
macros. These tell Valgrind that a block of memory is being allocated or freed. At the end of execution, Valgrind's leak-check will tell me if there are any allocated blocks that haven't been freed. For instance, if I comment out an SSM_DROP
, they will still tell me that I'm leaking memory. I play some small tricks with the memory states, manually invoking the VALGRIND_MAKE_MEM_*
macros, to get around the fact that the allocator actually relies on the buffers being pre-defined.
I previously attempted using the VALGRIND_MEMPOOL_
macros, which are specialized for memory pools like ours. After fussing with them for a bit I gave up because there were some bugs which I documented here (https://sourceforge.net/p/valgrind/mailman/message/37608944). But even if those are fixed promptly, I don't expect the updated Valgrind version to be widely available for a little while. Instead, the VALGRIND_*LIKE_BLOCK
macros are less complicated, though less specialized toward memory pools. Perhaps they will give worse diagnostics, but they do the job so far.
@@ -49,15 +49,15 @@ void ssm_throw(enum ssm_error reason, const char *file, int line, | |||
} | |||
|
|||
#define MAX_PAGES 2048 | |||
static void *pages[MAX_PAGES]; | |||
static uint8_t *pages[SSM_MEM_PAGE_SIZE][MAX_PAGES]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mem pool pages are now allocated out of a static megapool now, so we don't have to worry about freeing them afterwards.
*/ | ||
void ssm_mem_init(void *(*alloc_page_handler)(void), | ||
void *(*alloc_mem_handler)(size_t), | ||
void (*free_mem_handler)(void *, size_t)); | ||
|
||
/** @brief Tears down the underlying allocator system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this endpoint to give us the opportunity to deallocate pages in the future.
We might not be able to do this very efficiently, but that's ok. In my view, this exists primarily for testing purposes.
// Tell Valgrind that we have allocated m as a chunk of pool. | ||
// | ||
// The memory range [m..m+size] is now considered DEFINED. | ||
VALGRIND_MALLOCLIKE_BLOCK(m, size, 0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how we tell Valgrind that we have allocated something.
This appears to also work even if the underlying page is malloc'ed (at least leak-checking still seems to work). Valgrind seems to get confused over memory permissions, but we manage that manually (using VALGRIND_MAKE_MEM_
macros).
ssm_time_t stop_at = stop_at_s == 0 ? SSM_NEVER : stop_at_s * SSM_SECOND; | ||
printf("%s: simulating up to %lu seconds\n", prog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this print statement here to spook printf
into "leaking" memory earlier; under the hood, it actually keeps around a 1024B buffer that it doesn't free until the very end, so we make sure it gets counted by the leak checkpoint take during ssm_mem_init()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that printf will allocate a buffer and not free it until the program terminates? Since you are calling printf here, it will allocate a buffer, and then before it has a chance to free its buffer, you will check for memory leaks, which will include printf's buffer. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi John, LGTM! Also added a couple of questions.
ssm_time_t stop_at = stop_at_s == 0 ? SSM_NEVER : stop_at_s * SSM_SECOND; | ||
printf("%s: simulating up to %lu seconds\n", prog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that printf will allocate a buffer and not free it until the program terminates? Since you are calling printf here, it will allocate a buffer, and then before it has a chance to free its buffer, you will check for memory leaks, which will include printf's buffer. Is this correct?
@@ -133,8 +136,7 @@ int main(int argc, char *argv[]) { | |||
|
|||
ssm_program_exit(); | |||
|
|||
for (size_t p = 0; p < allocated_pages; p++) | |||
free(pages[p]); | |||
ssm_mem_destroy(free_page); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now ssm_mem_destroy doesn't free any memory, right? Is this okay because the pools of pages are statically allocated/don't need to be freed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ssm-runtime's allocator doesn't actually care how the pools are allocated (they could also be dynamically allocated).
ssm_mem_destroy
is supposed to handle freeing these pages, but I haven't implemented that just yet.
Exactly |
Addresses #18
Some documentation I followed:
#include
valgrind headersThis is still a work-in-progress. I need to validate that everything is working---at first glance there appear to be some errors (perhaps false positives?) reported from existing test cases. But will be pretty epic if we can get this working.