Stack cookies and RETGUARD
Stack cookies were developed in 1997, and published in StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks, by Crispan Cowan, Calton Pu, Dave Maier, Heather Hinton, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle and Qian Zhang in 1998, as part of Immunix. Implemented properly, it kills linear stack-based buffer overflows.
One year later, Aaron Grier, from the same paper, suggested to encrypt/decrypt the return address.
In July 1999, Vendicator released Stack Shield, with a shadow stack called Global Ret Stack, and a Ret Range Check, validating the return address against the one stored in the shadow stack.
In 2000, Hiroaki Etoh from IBM Japan published ProPolice, that added cools things like reordering the local variables in the stackframe depending on their types. This was later kind of upstreamed in gcc, in 2006.
In 2002, Microsoft added support for stack
cookies, in Visual Studio .NET
2002, via
the
/GS
flag.
In May 2003, OpenBSD 3.3 was released, with Propolice enabled by default. Six months later, in November 2003, it was also enabled in kernel-land, in OpenBSD 3.4.
In August 2004, Microsoft released Windows XP SP2, with stack cookies enabled system-wide.
In 2005, Richard Henderson from Red Hat implemented stack cookies in GCC, based on IBM’s stack smashing protection patch.
Stack cookies can be surprisingly hard to get right, as seen in
- glibc’s CVE-2010-3192
- Microsoft’s Windows having its SEH under the stack leading to trivial bypasses
- Microsoft’s Windows’s XP having static cookies in kernel-land, used in metasploit’s MS06-40 exploit
- glibc storing the cookie in the TLS, at
%fs:0x28
, which is mapped somewhere adjacent to the thread stack, thus making it overwriteable via large-enough stack-based buffer overflows. This technique has been know for more than a decade in the CTF community, and was published (at least) in 2018. As of 2023, this still hasn’t been fixed. llvm’s libc also stores the cookie in the TLS, but at least the TLS isn’t mapped near the stack/heap.
Code-Pointer Integrity
proposed in 2014 an instrumentation which was merged into LLVM in 2015, namely
-fsanitize=safe-stack
in clang. The pass moves stack objects which are
not guaranteed (mainly via ScalarEvolution
)
to be stack-smashing free into a separate stack.
In 2015, pipacs released RAP, providing amongst other things backward-edge protection, by keeping the secret used to encrypt the return address into a register, meaning that two leaks are needed. Moreover, the cookie is regularly changing in kernel-land, making it even harder to get it.
The 30th of July 2017, a fun bug was fixed, in Theo de Raadt’s words:
A few optimizations later, a security requirement has been removed.
The issue was that compilers are trying to be clever, (rightly) assuming
that a const static
object will always be zero, even when placed in a
.openbsd.randomdata
segment, resulting in useless stack cookies and
setjmp
/longjmp
checks.
Amusingly, the issue has been introduced the 1st of September 2016, and was unnoticed for almost one year!
clang (and newer
gcc
at high-O
) are unaware that objects placed in strange sections, such as__attribute__((section(".openbsd.randomdata")))
, may be non-zero. In combination with “const” or “static” the compiler becomes even more sure nothing can influence the object and assumes the value will be 0. A few optimizations later, a security requirement has been removed.
In August 2017, Theo de Raadt announced that Todd Mortimer implemented return address protection, and called it RETGUARD:
The mechanism is like a userland ‘stackghost’ in the function prologue and epilogue. The preamble XOR’s the return address at top of stack with the stack pointer value itself. This perturbs by introducing bits from ASLR. The function epilogue undoes the transform immediately before the RET instruction.
Unfortunately, this means that if you’ve got a read primitive on the heap, since there are usually pointers to the code and to the stack, you defeats both ASLR and stack-cookies. Moreover, in kernel-land, if you’re able to leak leak stack and kernel addresses via side-channels, you get a stack-cookies bypass for free. Moreover, it’s vulnerable to partial overwrites.
One year later,
RETGUARD was improved, again by Todd Mortimer, was added in June 2018 in
-current
and released in OpenBSD 6.5 in April 2019, while still being
like Propolice, with some stuff on top of it. De Raadt gave a
talk at the Calgary
Unix Users Group, May 28, 2019, Alberta, Canada, and the slide 18 is a nice
exposé on OpenBSD’s stack cookies’ history, introducing RETGUARD.
This is what it looks like in OpenBSD 6.5:
┌ (fcn) sym.siop_pci_attach
│ 0xffffffff811d35c0 4c8b1d29e2a6. mov r11, qword [obj.__retguard_3111]
│ 0xffffffff811d35c7 4c331c24 xor r11, qword [rsp]
│ 0xffffffff811d35cb 55 push rbp
│ 0xffffffff811d35cc 4889e5 mov rbp, rsp
│ 0xffffffff811d35cf 57 push rdi
│ 0xffffffff811d35d0 56 push rsi
│ 0xffffffff811d35d1 52 push rdx
│ 0xffffffff811d35d2 57 push rdi
│ 0xffffffff811d35d3 4153 push r11
│ 0xffffffff811d35d5 4156 push r14
│ 0xffffffff811d35d7 4989f6 mov r14, rsi
│ 0xffffffff811d35da 488dbeb00200. lea rdi, qword [rsi + 0x2b0] ; 688
│ 0xffffffff811d35e1 48c7c150dc0f. mov rcx, -0x7ef023b0
│ 0xffffffff811d35e8 e863917a00 call sym.siop_pci_attach_common
│ 0xffffffff811d35ed 85c0 test eax, eax
│ 0xffffffff811d35ef 740d je 0xffffffff811d35fe
│ 0xffffffff811d35f1 4c89f7 mov rdi, r14
│ 0xffffffff811d35f4 415e pop r14
│ 0xffffffff811d35f6 415b pop r11
│ 0xffffffff811d35f8 c9 leave
│ 0xffffffff811d35f9 e9a292f2ff jmp sym.siop_attach
│ 0xffffffff811d35fe 415e pop r14
│ 0xffffffff811d3600 415b pop r11
│ 0xffffffff811d3602 c9 leave
│ 0xffffffff811d3603 4c331c24 xor r11, qword [rsp]
│ 0xffffffff811d3607 4c3b1de2e1a6. cmp r11, qword [obj.__retguard_3111]
│ 0xffffffff811d360e 740f je 0xffffffff811d361f
│ 0xffffffff811d3610 cc int3
│ 0xffffffff811d3611 cc int3
│ 0xffffffff811d3612 cc int3
│ 0xffffffff811d3613 cc int3
│ 0xffffffff811d3614 cc int3
│ 0xffffffff811d3615 cc int3
│ 0xffffffff811d3616 cc int3
│ 0xffffffff811d3617 cc int3
│ 0xffffffff811d3618 cc int3
│ 0xffffffff811d3619 cc int3
│ 0xffffffff811d361a cc int3
│ 0xffffffff811d361b cc int3
│ 0xffffffff811d361c cc int3
│ 0xffffffff811d361d cc int3
│ 0xffffffff811d361e cc int3
└ 0xffffffff811d361f c3 ret
In the prologue of the function, static random data is moved into r11
, which is then used to xor
[rsp]
.
In the epilogue, [rsp]
is xor
‘ed again with r11
, and compared to the random
data. If they’re not equal, the execution will land on an int3
. If they
are, the jump above the TRAPSLED is taken, and the function will return.
Using r11
instead of xoring rsp
directly is likely done to avoid murdering
the CPU’s return predictor’s performances.
Since r11
is spilled on the stack (except in sometimes in leaf
functions since march
2019), arbitrary
reads allows for trivial bypasses: an attacker with arbitrary reads at
arbitrary time can just leak the cookie and the xored value of the stack
pointer to be able to return wherever they want. Or, since r11
is spilled on
the stack, leaking the cookie value combined with a linear stack overflow
is working too.
But even if r11
wasn’t spilled, leaking the stack pointer’s value and part of
the stack would be
enough, as highlighted by Alexis Gacel, Ndeye Khady Ngom and Kai Lüke in
IS561 Project Report ParisDakarTech:
Backward-edge Protection:
Improvements on SafeStack and RETGUARD.
Amusingly, the paper is suggesting improvements to RETGUARD, by making it
closer to PaX’ RAP.
Support for powerpc and powerpc64 in RETGUARD was added in October 2020.
Clang has ShadowCallStack
since 2018, and gcc since February 2022 in gcc 12.0,
on aarch64 only, x86 having been deemed broken by design.
It’s basically a second stack stored somewhere in memory, only referenced by the
register x18
. Amusingly, the return address is also stored on the regular
stack for compatibility with unwinders and speculation, but is otherwise unused.
While reasonably strong for C programs, in C++ programs there will often be interesting objects on the data stack that could be overwritten:
for example if an upper stack frame stores a std::string
, one could partially
overwrite the string’s data pointer and turn it into a read or write
primitive. Anyway, OpenBSD doesn’t make use of this mitigation.
In 2023, mortimer@
added
RETGUARD around syscalls, likely to “protect” the ret
instructions from the
end of the syscalls stubs.
In August 2023, thanks to Dave Hansen from Intel, Linux gained support for Intel’s hardware-based shadow stack in userland. While it isn’t perfect, and bypasses will likely be found and plugged, it’s no vulnerable to arbitrary reads, and has a way better granularity than RETGUARD.
In September 2023, De Raadt posted a long email on openbsd-tech@ about “Viable ROP-free roadmap for i386/armv8/riscv64/alpha/sparc64”:
On variable-sized instruction architectures, polymorphic RET and other control flow instructions can and will surface, but the available RET gadgets are seriously reduced and exploitation may not be possible.
Unfortunately, ROP is still a single arbitrary read away…
A few years ago some speculation reseachers I talked to pointed out that the stack-protector generated instructions to do the call into
__stack_smash_handler()
, and even many instructions inside the function itself, are fetched, decoded, issued, and their results are discarded. That’s a waste of cpu resources. It might be a slowdown because those execution slots are not used exclusively for straight-line speculation following the RET. Modern cpus also have complicated branch-target caches which may not be heuristically tuned to the stack protector approach.On the other hand the RETGUARD approach uses an illegal instruction (of some sort), which is a speculation barrier. That prevents the cpu from heading off into an alternative set of weeds. It will go decode more instructions along the post-RET execution path.
He attached a patch for clang and gcc 4 (last release upstream in 2014), the later being the default compiler for OpenBSD for some architectures.
His list of unfinished work includes “Expose the Linux pepole to this surprising change. They would probably appreciate the performance increase, but try the ROP-free changes.”, which is hilarious since:
- Every single Linux distribution has
-fstack-protector-strong
enabled by default since years. OpenSUSE has it since 2006. - The performances gains from using a trap instead of a proper annotation telling the compiler that the branch is unlikely to be taken are likely negligible. A better move would be to move to a compiler that isn’t 10 years old, to get a ton of modern optimizations, including removing stack cookies where they aren’t needed.
- On “Linux”, on the glibc a
stack cookie violation calls
abort
, which is likely optimized away by the compiler to anhtl
or equivalent. On musl, it was a trap (hlt
on x86_64) from the beginning, when stack cookies support was added in 2012.
For people interested in micro-optimisations of security invariants in a post-specter/meltdown world, Pawel Wieczorkiewicz from grsecurity published a nice article on the topic the 8th of March 2022.
It’s interesting to note that RETGUARD’s canaries are per-function, and not per task: their values for a given function will always be the same between threads in userland, or during the whole runtime in kernel-land. This allows an attacker able to leak the cookie of a single function to call it again and again, without needing an other leak.
A possible improvement for kernel-land would be to take inspiration from PaX’ RAP probabilistic back-edge control flow integrity: store the cookie in a register, change its value between every syscalls, tasks, and in selected infinite loops like event handlers, and don’t spill it on the stack. As for user-land, pax-future.txt 2.d.2 provides some hints.
It’s worth noting that since OpenBSD is storing cookies in a dedicated segment, they’re not bypassable via TCB overwrite, and they’re not overwriteable either, since the segment is read-only.
There are at most 4000 different cookies:
bool runOnFunction(Function &F) override {
if (F.hasFnAttribute("ret-protector")) {
// Create a symbol for the cookie
Module *M = F.getParent();
std::hash<std::string> hasher;
std::string cookiename = "__retguard_" + std::to_string(hasher((M->getName() + F.getName()).str()) % 4000);
The whole int3
padding has an interesting
justification:
The verification routine is constructed such that the binary space immediately before each
ret
instruction is padded withint3
instructions, which makes these return instructions difficult to use in ROP gadgets.
Unfortunately, as highlighted by pipacs in June 2018:
- it adds a branch mis-prediction for every function
- those
ret
instructions are used by the program, so there must be at least a valid path to them, making them at least usable in one gadget. Also, if an attacker can pointrip
to an arbitrary location, odds are that they already has an arbitrary read anyway. - See the ROP gadgets removal section for why this isn’t a great idea.
Twiz (@lazytyped) seems to agree:
Is it just me, or OBSD RETGUARD is just an infoleak away from a reliable bypass? (someone knows more?)
And Halvar Flake replied:
I think most people agree, but the discussion around mitigations has long left the realm of rationality.
Also, on OpenBSD’s improving stackghost, Twiz said:
Reminds of Stackghost, but stackghost is effective because the xoring is done at a higher privilege level (I’ve worked on a similar defense)
On a fun side note, -fstack-protector-strong
is disabled for arm since
2014
on OpenBSD’s version of gcc, and llvm’s safe-stack on arm was completely useless
until recently.
RETGUARD
is only a small improvement over stack cookies: It adds cookies
diversity, and makes the cookies read-only. Otherwise, it is roughly subject
to the same limitations.