DISCLAIMER: I originally posted this article on Microblink.
For all of the losses it has inflicted, this pandemic has at least made us more conscious about our personal hygiene.
We’re spraying spaces, surfaces and our hands way more often, so why not sanitize our code while we’re at it? After all, software runs the world, and bugs that cause programs to malfunction can cause serious damage – much like their viral counterparts.
If you’re developing in C and C++, you know this all too well. It’s easy to allocate a piece of memory and forget to free it later, or accidentally write past the memory buffer. These issues are extremely hard to find without proper tools and often cause sporadic, sudden crashes.
Using sanitizers as you’re building and testing your program can help you catch a great deal of issues in your source code early on, including memory leaks, buffer overflows and undefined behavior.
Today, we’ll be taking a look at three types of Clang sanitizers, how they’re used and what bugs they can help us nip in the bud.
Let’s spray away!
Cleaning up your address space with AddressSanitizer (ASan)
AddressSanitizer (ASan for short) is used for detecting use-after-free, double-free, buffer (stack, heap and global buffer) overflows and underflows, along with other memory errors.
It consists of both a compiler instrumentation module and a run-time library that inserts red zones around each set of bytes allocated with the malloc
function. It also poisons the freed bytes and keeps track of the call stack for each malloc
/free
pair.
This is what our code looks like without ASan:
And here’s what ASan does to it in order to detect address related bugs:
Using ASan is as simple as adding the -fsanitize=addres
s as both compiler and linker flag. You can also set a large number of run-time options via the ASAN_OPTIONS
environment variable (here is a full list of ASan flags you can use).
To truly grasp how useful this tool is, here’s a little test for you. Try finding a bug in the code snippet below:
1
2
3
4
5
char const * src{ "Hello world!" };
auto const dst{ std::make_unique< char[] >( std::strlen( src ) ) };
std::strcpy( dst.get(), src );
std::puts( dst.get() );
It’s hard, isn’t it? Now, imagine this bug was part of a much larger codebase. It’d take us ages to debug it by hand, whereas with ASan turned on, we can identify the hiding heap buffer overflow in seconds (see the demo):
1
2
3
==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000001c at pc 0x00000044b754 bp 0x7ffc9ea586d0 sp 0x7ffc9ea57e80
WRITE of size 13 at 0x60200000001c thread T0
#0 0x44b753 (/app/output.s+0x44b753)
ASan will also warn you if your program continues to use a pointer after it’s been freed (a common security vulnerability called a use-after-free error). Here’s an example program containing the bug:
1
2
3
4
5
6
7
8
9
10
11
12
constexpr std::size_t bodyOffset{ 96u };
char * message{ new char[ 1024 ] };
// fill in the message
char * bodyBegin{ message + bodyOffset };
delete [] message;
// many lines of code ...
std::puts( bodyBegin ); // heap-use-after-free bug
And here’s a report you get from ASan after testing the code:
1
2
==1==ERROR: AddressSanitizer: heap-use-after-free on address 0x6190000000e0 at pc 0x000000470a69 bp 0x7ffe047e6b90 sp 0x7ffe047e6340
READ of size 929 at 0x6190000000e0 thread T0
Now, you might be asking yourself: how much does this cost? Well, nothing comes free in today’s world, and unfortunately, the same is true for ASan.
Still, if you consider the amount of time this tool could save you, it’s not that expensive at all. The average overhead is ~2x both performance-wise and on memory usage.
Speaking about performance, ASan’s main competitor, Valgrind, will impose a 10 - 100x higher slowdown on your program. That’s a solid improvement in our book.
Now that we have cleaned up our address space, let’s do some more sanitization of our memory.
Detecting uninitialized memory reads with MemorySanitizer (MSan)
You might think that AddressSanitizer covers all memory-related bugs, but that’s not the case. It doesn’t handle uninitialized memory reads, which is where another sanitizer called MemorySanitizer (MSan for short) comes in.
MSan will let you copy uninitialized memory and perform simple logical and arithmetical operations on it, but when you try to use it in a decision-making statement, you’ll get a red flag. In order to understand this better, take a look at the following code snippet (demo):
1
2
3
4
5
6
7
8
9
int *src{ new int[ 5 ] };
int dst[ 5 ];
std::memcpy( dst, src, 5 ); // copying uninitialized memory, OK
if ( src[ 0 ] ) // MSan warning: use-of-uninitialized-value
{
// more code ...
}
As you can see, MSan will warn us of any uninitialized value in use. That memory leak you might’ve noticed won’t get flagged up because that’s a part of what ASan deals with — it’s important that we don’t get confused by this.
To get to the origin of this value, use -fsanitize-memory-track-origins
flag along with the -fsanitize=memory
.
Now, let’s talk a bit about how MSan actually works. It implements a bit to bit shadow mapping, as shown in the figure below, where 1 means “poisoned” or uninitialized bit. This allows for very efficient computation of the shadow memory address. Given the application memory address ProductAddr
, computed ShaddowAddr
is ProductAddr & ShadowMask
, where ShadowMask
is a platform-specific constant.
Whenever access to one of the poisoned bits has any side effect (e.g. in branching), a warning will be raised. This additional bit introduces a 2.5x CPU and 2x memory overhead. The overhead will be a bit higher if memory origins are tracked too, 5x on CPU and 3x on memory, to be specific.
Catching undefined behavior with UndefinedBehaviorSanitizer (UBSan)
Last but not least, UndefinedBehaviorSanitizer (UBSan for short).
UBSan will catch signed integer overflow, use of null pointers, division by zero and other undefined behavior as you’re executing your program. Apart from -fsanitize=undefined
compiler flag, which checks for all kinds of bugs, there are many additional flags that can be helpful in finding more specific bugs, including:
1
2
3
4
5
6
7
8
9
10
-fsanitize=bounds
-fsanitize=vptr
-fsanitize=enum
-fsanitize=signed-integer-overflow
-fsanitize=null
-fsanitize=unsigned-integer-overflow
-fsanitize=return
-fsanitize=integer-divide-by-zero
-fsanitize=unreachable
-fsanitize=alignment
As you can see, UBSan deals with simple bugs like the one shown in the following example quite well (demo):
1
2
3
4
5
int main()
{
int m = std::numeric_limits< int >::max();
return m + 1;
}
And here’s the report we get:
1
runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
UBSan will demonstrate its power best in day-to-day development as well as in large codebases. Regarding performance, there’s roughly a 1.25x CPU overhead and no impact on memory usage at all (UBSan doesn’t affect address space layout).
Downsides to consider
Now that we’ve seen how sanitizers can help us build better code, we still need to mention a couple of downsides of using them.
First, unlike Valgrind that does all the checks with no need for source code recompilation, sanitizers require your code to be recompiled. This might take some time, especially if you work on a large codebase.
The second (and arguably more important downside), is the fact that ASan and MSan can’t work together. This means that you’ll need to perform multiple runs to test your software which, again, can take quite some time. Maybe this is the reason why sanitizers still don’t get much love from the C++ community.
Despite their drawbacks, we strongly recommend using sanitizers. They will catch issues that may look safe to competing tools and they won’t break your workflow with hefty slowdowns.
It makes perfect sense then to integrate them with your ‘fast’ development processes like continuous integration or pull request pipelines.
Let me know what you think about sanitizers! How often do you use them? Do you find them helpful?
Please, put the answers in the comment section below… and see you till the next blog post!