Stack cookies and RETGUARD

Stack cookies were developed in 1997, and published in StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks, by Crispan Cowan, Calton Pu, Dave Maier, Heather Hinton, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle and Qian Zhang in 1998, as part of Immunix. Implemented properly, it kills linear stack-based buffer overflows.

One year later, Aaron Grier, from the same paper, suggested to encrypt/decrypt the return address.

In July 1999, Vendicator released Stack Shield, with a shadow stack called Global Ret Stack, and a Ret Range Check, validating the return address against the one stored in the shadow stack.

In 2000, Hiroaki Etoh from IBM Japan published ProPolice, that added cools things like reordering the local variables in the stackframe depending on their types. This was later kind of upstreamed in gcc, in 2006.

In 2002, Microsoft added support for stack cookies, in Visual Studio .NET 2002, via the /GS flag.

In May 2003, OpenBSD 3.3 was released, with Propolice enabled by default. Six months later, in November 2003, it was also enabled in kernel-land, in OpenBSD 3.4.

In August 2004, Microsoft released Windows XP SP2, with stack cookies enabled system-wide.

In 2005, Richard Henderson from Red Hat implemented stack cookies in GCC, based on IBM’s stack smashing protection patch.

Stack cookies can be surprisingly hard to get right, as seen in

glibc’s CVE-2010-3192
Microsoft’s Windows having its SEH under the stack leading to trivial bypasses
Microsoft’s Windows’s XP having static cookies in kernel-land, used in metasploit’s MS06-40 exploit
glibc storing the cookie in the TLS, at %fs:0x28, which is mapped somewhere adjacent to the thread stack, thus making it overwriteable via large-enough stack-based buffer overflows. This technique has been know for more than a decade in the CTF community, and was published (at least) in 2018. As of 2023, this still hasn’t been fixed. llvm’s libc also stores the cookie in the TLS, but at least the TLS isn’t mapped near the stack/heap.

Code-Pointer Integrity proposed in 2014 an instrumentation which was merged into LLVM in 2015, namely -fsanitize=safe-stack in clang. The pass moves stack objects which are not guaranteed (mainly via ScalarEvolution) to be stack-smashing free into a separate stack.

In 2015, pipacs released RAP, providing amongst other things backward-edge protection, by keeping the secret used to encrypt the return address into a register, meaning that two leaks are needed. Moreover, the cookie is regularly changing in kernel-land, making it even harder to get it.

The 30^th of July 2017, a fun bug was fixed, in Theo de Raadt’s words:

A few optimizations later, a security requirement has been removed.

The issue was that compilers are trying to be clever, (rightly) assuming that a const static object will always be zero, even when placed in a .openbsd.randomdata segment, resulting in useless stack cookies and setjmp/longjmp checks.

Amusingly, the issue has been introduced the 1^st of September 2016, and was unnoticed for almost one year!

clang (and newer gcc at high -O) are unaware that objects placed in strange sections, such as __attribute__((section(".openbsd.randomdata"))), may be non-zero. In combination with “const” or “static” the compiler becomes even more sure nothing can influence the object and assumes the value will be 0. A few optimizations later, a security requirement has been removed.

In August 2017, Theo de Raadt announced that Todd Mortimer implemented return address protection, and called it RETGUARD:

The mechanism is like a userland ‘stackghost’ in the function prologue and epilogue. The preamble XOR’s the return address at top of stack with the stack pointer value itself. This perturbs by introducing bits from ASLR. The function epilogue undoes the transform immediately before the RET instruction.

Unfortunately, this means that if you’ve got a read primitive on the heap, since there are usually pointers to the code and to the stack, you defeats both ASLR and stack-cookies. Moreover, in kernel-land, if you’re able to leak leak stack and kernel addresses via side-channels, you get a stack-cookies bypass for free. Moreover, it’s vulnerable to partial overwrites.

One year later, RETGUARD was improved, again by Todd Mortimer, was added in June 2018 in -current and released in OpenBSD 6.5 in April 2019, while still being like Propolice, with some stuff on top of it. De Raadt gave a talk at the Calgary Unix Users Group, May 28, 2019, Alberta, Canada, and the slide 18 is a nice exposé on OpenBSD’s stack cookies’ history, introducing RETGUARD.

This is what it looks like in OpenBSD 6.5:

┌ (fcn) sym.siop_pci_attach
│  0xffffffff811d35c0      4c8b1d29e2a6.  mov r11, qword [obj.__retguard_3111]
│  0xffffffff811d35c7      4c331c24       xor r11, qword [rsp]
│  0xffffffff811d35cb      55             push rbp
│  0xffffffff811d35cc      4889e5         mov rbp, rsp
│  0xffffffff811d35cf      57             push rdi
│  0xffffffff811d35d0      56             push rsi
│  0xffffffff811d35d1      52             push rdx
│  0xffffffff811d35d2      57             push rdi
│  0xffffffff811d35d3      4153           push r11
│  0xffffffff811d35d5      4156           push r14
│  0xffffffff811d35d7      4989f6         mov r14, rsi
│  0xffffffff811d35da      488dbeb00200.  lea rdi, qword [rsi + 0x2b0] ; 688
│  0xffffffff811d35e1      48c7c150dc0f.  mov rcx, -0x7ef023b0
│  0xffffffff811d35e8      e863917a00     call sym.siop_pci_attach_common
│  0xffffffff811d35ed      85c0           test eax, eax
│  0xffffffff811d35ef      740d           je 0xffffffff811d35fe
│  0xffffffff811d35f1      4c89f7         mov rdi, r14
│  0xffffffff811d35f4      415e           pop r14
│  0xffffffff811d35f6      415b           pop r11
│  0xffffffff811d35f8      c9             leave
│  0xffffffff811d35f9      e9a292f2ff     jmp sym.siop_attach
│  0xffffffff811d35fe      415e           pop r14
│  0xffffffff811d3600      415b           pop r11
│  0xffffffff811d3602      c9             leave
│  0xffffffff811d3603      4c331c24       xor r11, qword [rsp]
│  0xffffffff811d3607      4c3b1de2e1a6.  cmp r11, qword [obj.__retguard_3111]
│  0xffffffff811d360e      740f           je 0xffffffff811d361f
│  0xffffffff811d3610      cc             int3
│  0xffffffff811d3611      cc             int3
│  0xffffffff811d3612      cc             int3
│  0xffffffff811d3613      cc             int3
│  0xffffffff811d3614      cc             int3
│  0xffffffff811d3615      cc             int3
│  0xffffffff811d3616      cc             int3
│  0xffffffff811d3617      cc             int3
│  0xffffffff811d3618      cc             int3
│  0xffffffff811d3619      cc             int3
│  0xffffffff811d361a      cc             int3
│  0xffffffff811d361b      cc             int3
│  0xffffffff811d361c      cc             int3
│  0xffffffff811d361d      cc             int3
│  0xffffffff811d361e      cc             int3
└  0xffffffff811d361f      c3             ret

In the prologue of the function, static random data is moved into r11, which is then used to xor [rsp]. In the epilogue, [rsp] is xor‘ed again with r11, and compared to the random data. If they’re not equal, the execution will land on an int3. If they are, the jump above the TRAPSLED is taken, and the function will return. Using r11 instead of xoring rsp directly is likely done to avoid murdering the CPU’s return predictor’s performances.

Since r11 is spilled on the stack (except in sometimes in leaf functions since march 2019), arbitrary reads allows for trivial bypasses: an attacker with arbitrary reads at arbitrary time can just leak the cookie and the xored value of the stack pointer to be able to return wherever they want. Or, since r11 is spilled on the stack, leaking the cookie value combined with a linear stack overflow is working too. But even if r11 wasn’t spilled, leaking the stack pointer’s value and part of the stack would be enough, as highlighted by Alexis Gacel, Ndeye Khady Ngom and Kai Lüke in IS561 Project Report ParisDakarTech: Backward-edge Protection: Improvements on SafeStack and RETGUARD. Amusingly, the paper is suggesting improvements to RETGUARD, by making it closer to PaX’ RAP.

Support for powerpc and powerpc64 in RETGUARD was added in October 2020.

Clang has ShadowCallStack since 2018, and gcc since February 2022 in gcc 12.0, on aarch64 only, x86 having been deemed broken by design. It’s basically a second stack stored somewhere in memory, only referenced by the register x18. Amusingly, the return address is also stored on the regular stack for compatibility with unwinders and speculation, but is otherwise unused. While reasonably strong for C programs, in C++ programs there will often be interesting objects on the data stack that could be overwritten: for example if an upper stack frame stores a std::string, one could partially overwrite the string’s data pointer and turn it into a read or write primitive. Anyway, OpenBSD doesn’t make use of this mitigation.

In 2023, mortimer@ added RETGUARD around syscalls, likely to “protect” the ret instructions from the end of the syscalls stubs.

In August 2023, thanks to Dave Hansen from Intel, Linux gained support for Intel’s hardware-based shadow stack in userland. While it isn’t perfect, and bypasses will likely be found and plugged, it’s no vulnerable to arbitrary reads, and has a way better granularity than RETGUARD.

In September 2023, De Raadt posted a long email on openbsd-tech@ about “Viable ROP-free roadmap for i386/armv8/riscv64/alpha/sparc64”:

On variable-sized instruction architectures, polymorphic RET and other control flow instructions can and will surface, but the available RET gadgets are seriously reduced and exploitation may not be possible.

Unfortunately, ROP is still a single arbitrary read away…

A few years ago some speculation reseachers I talked to pointed out that the stack-protector generated instructions to do the call into __stack_smash_handler(), and even many instructions inside the function itself, are fetched, decoded, issued, and their results are discarded. That’s a waste of cpu resources. It might be a slowdown because those execution slots are not used exclusively for straight-line speculation following the RET. Modern cpus also have complicated branch-target caches which may not be heuristically tuned to the stack protector approach.

On the other hand the RETGUARD approach uses an illegal instruction (of some sort), which is a speculation barrier. That prevents the cpu from heading off into an alternative set of weeds. It will go decode more instructions along the post-RET execution path.

He attached a patch for clang and gcc 4 (last release upstream in 2014), the later being the default compiler for OpenBSD for some architectures.

His list of unfinished work includes “Expose the Linux pepole to this surprising change. They would probably appreciate the performance increase, but try the ROP-free changes.”, which is hilarious since:

Every single Linux distribution has -fstack-protector-strong enabled by default since years. OpenSUSE has it since 2006.
The performances gains from using a trap instead of a proper annotation telling the compiler that the branch is unlikely to be taken are likely negligible. A better move would be to move to a compiler that isn’t 10 years old, to get a ton of modern optimizations, including removing stack cookies where they aren’t needed.
On “Linux”, on the glibc a stack cookie violation calls abort, which is likely optimized away by the compiler to an htl or equivalent. On musl, it was a trap (hlt on x86_64) from the beginning, when stack cookies support was added in 2012.

For people interested in micro-optimisations of security invariants in a post-specter/meltdown world, Pawel Wieczorkiewicz from grsecurity published a nice article on the topic the 8^th of March 2022.

It’s interesting to note that RETGUARD’s canaries are per-function, and not per task: their values for a given function will always be the same between threads in userland, or during the whole runtime in kernel-land. This allows an attacker able to leak the cookie of a single function to call it again and again, without needing an other leak.

A possible improvement for kernel-land would be to take inspiration from PaX’ RAP probabilistic back-edge control flow integrity: store the cookie in a register, change its value between every syscalls, tasks, and in selected infinite loops like event handlers, and don’t spill it on the stack. As for user-land, pax-future.txt 2.d.2 provides some hints.

It’s worth noting that since OpenBSD is storing cookies in a dedicated segment, they’re not bypassable via TCB overwrite, and they’re not overwriteable either, since the segment is read-only.

There are at most 4000 different cookies:

bool runOnFunction(Function &F) override {
	if (F.hasFnAttribute("ret-protector")) {
		// Create a symbol for the cookie
		Module *M = F.getParent();
		std::hash<std::string> hasher;
		std::string cookiename = "__retguard_" + std::to_string(hasher((M->getName() + F.getName()).str()) % 4000);

The whole int3 padding has an interesting justification:

The verification routine is constructed such that the binary space immediately before each ret instruction is padded with int3 instructions, which makes these return instructions difficult to use in ROP gadgets.

Unfortunately, as highlighted by pipacs in June 2018:

it adds a branch mis-prediction for every function
those ret instructions are used by the program, so there must be at least a valid path to them, making them at least usable in one gadget. Also, if an attacker can point rip to an arbitrary location, odds are that they already has an arbitrary read anyway.
See the ROP gadgets removal section for why this isn’t a great idea.

Twiz (@lazytyped) seems to agree:

Is it just me, or OBSD RETGUARD is just an infoleak away from a reliable bypass? (someone knows more?)

And Halvar Flake replied:

I think most people agree, but the discussion around mitigations has long left the realm of rationality.

Also, on OpenBSD’s improving stackghost, Twiz said:

Reminds of Stackghost, but stackghost is effective because the xoring is done at a higher privilege level (I’ve worked on a similar defense)

On a fun side note, -fstack-protector-strong is disabled for arm since 2014 on OpenBSD’s version of gcc, and llvm’s safe-stack on arm was completely useless until recently.

RETGUARD is only a small improvement over stack cookies: It adds cookies diversity, and makes the cookies read-only. Otherwise, it is roughly subject to the same limitations.