pinsyscall

In February 2023, De Raadt announced pinsyscall:

Well, you can’t do #3 as easily anymore. I have introduced pinsyscall(2), which lets ld.so [dynamic programs] or crt0 [static programs] tell the kernel where the SPECIFIC execve entry point is, and any other entry point is invalid and kills the program.

Now the attacker must precisely know where that specific system call nstruction [sic] is.

It is very cheap code relative to the hurdle it provides.

In 2023, in his CanSecWest talk, De Raadt said that the goal of this mitigation was to make “poping shells harder”, in case an attacker knows where the libc is mapped, but doesn’t know where the execve stub is.

In December 2023, he added:

These changes attempt to disrupt methodologies commonly used in attacks. I make no claim these changes stop all methods. Combined with other behaviours we have (like libc random relinking), they will require an attacker to use other methods, which are hopefully more fragile. Increasing the unknown and requiring specific entry points increases the fragility and difficulty. Another benefit is that it requires unique methodology for OpenBSD, which requirements investment.

This change might annoy a beginner CTF player for 10 minutes, but absolutely doesn’t “disrupt methodologies commonly used in attacks” at all.

But he didn’t stop here:

A few years ago immediately after msyscall(2), nayden@ asked me if it was possible for the kernel to know validate the locations of system call, and I proceeded to tell him a bunch of reasons this was impossible, mostly relating to information not known, and cost complexity. But it hung around in my lower brain and I eventually had to do it.

A system call stub generally looks something like this:
xx:   b8 05 00 00 00          mov    $0x5,%eax
xx:   0f 05                   syscall 
This means “perform operation #5, which is open(2)”

Inside the kernel, we know the system call # and the address of the syscall instruction.

I add a non-LOAD ELF extension (program header and section header) called “openbsd.syscalls”. This is found in ld.so(1) and the libc.so library and in the system call stubs as .o files in libc.a for static binaries, and also in static binaries that are linked against this new libc.a.

(There is no new risk from having this (unmapped non-LOAD) information in the libc.so file, because an attacker with access to the file can already use a debugger to find the specific offsets. This format is just easier for the kernel and ld.so to handle)

It is an array of { offset, system call # }. For static binaries and ld.so(1), the kernel parses this array and creates a new array attached to the process which is indexed by the system call number, which has values: 0 (system call not allowed), 1 (allowed, and we don’t care about the address), or a specific offset inside the ELF binary where the system call instruction is for that specific system call number.

Like with msyscall(2) before, ld.so(1) does the same job of parsing the “openbsd.syscalls” in libc.so, and uses a new pinsyscall(2) system call to tell the kernel where the system calls are allowed to enter form.

Like msyscal(2) before, this results in 4 places that system calls can come from:

in the text of a static binary, because the kernel loaded a table for ONLY the system calls linked into the binary. It’s important to realize what this means, by example. The ping(1) binary does not call execve(2) or fork(2). So now you can’t ever call fork() or execve() because there is no “syscall” instruction for those two system calls. It also cannot call accept(2).

in the signal trampoline, we only accept sigreturn(2). sigreturn(2) never occurs anwhere else. This is a 2nd layer of SROP mitigation.

The syscall instructions inside ld.so(1) text can only call the system calls it has stubs for, and each stub can only call the specific system call it is intended to call.

in libc.so’s table, all the system call stub “syscall instructions” are registered.

There is no “2nd layer of SROP mitigation.”, it doesn’t change anything there, and an attacker can just jump into the signal trampoline to call sigreturn.

To fill the openbsd.syscalls tables, since all the syscalls have exported wrappers, OpenBSD can simply use the linker to enumerate the used ones, and pin them on a TOFU basis on first usage. So now, instead of using a pop eax; ret + int 0x80; ret gadget, an attacker has to jump to the correct syscall wrapper instead, which doesn’t complicate nor hinder anything.

Also, can you find what looks like an arbitrary kernel-write with partially controlled offset in the diff? Amusingly, this mistake was used a a challenge for the 6 ^th RealWorldCTF:

ICYMI, there was a CTF challenge in @RealWorldCTF where you’re asked to exploit Theo De Raadt’s embarrassing mistakes in his first draft of @openbsd syscall pinning exploit mitigation.

with the challenge available here if you want to give it a try.

In 2024, De Raadt sent an email to tech@, announcing that pinsyscalls(2) is ready to land in production, concluding with:

Together with library relinking, this makes some specific low-level attack methods unfeasable on OpenBSD, which will force the use of other methods. Hopefuly those other methods are more difficult, or also harmed by library relinking and other changes we’ve made.

This is all about removing avenues, and forcing attackers to use other methods which are hopefully more challenging.

without any details on what methods/attacks he’s talking about.

Fuschia does something similar, exposing its syscalls only via the VDSO, to control how many of them are exposed to a given process.

For static binaries, it’s a nice way to reduce the attack surface, a bit like pledge, but without having to write rules, albeit it can’t be changed/reduced during the process’ lifetime.

Except for static binaries where it could be useful, this whole idea is pretty useless :

An attacker able to perform ROP doesn’t need execve to do nasty things.
An attacker able to perform ROP won’t have any trouble crafting a read primitive to find where the execve stub is, either by walking the whole memory range, or more likely by simply reading relocations/symbols table, ……
An attacker able to perform ROP can simply use the libc stub, instead of issuing raw syscalls, …
An attacker able to devise where the libc is mapped is likely able to devise where the execve stub is.
013b59c36c7a6ab130d8af1e790b4cac017c6d6744d93519e8c27c7a63118d62

There are more easy bypasses, left as an exercise to the reader.

As said by Saagar Jha, member of the Shellphish CTF Team:

My understanding is that they’re trying to annoy people who can influence control flow such that they can create a controlled state for an arbitrary syscall by making it so that the legitimate syscall instructions in the address space are tied to their number. Personally I don’t think this is particularly worth the trouble but I get the impression that their threat model is that the last time they tried to remove syscall instructions from outside of libc people told them it was easy enough to ROP there. I am still not entirely clear what this is trying to protect against because presumably all the syscall wrappers that libc has still exist in the address space? The “number” field that this is trying to protect does not look very valuable. Like, the goal seems to be “oh the attacker cannot put an arbitrary thing in rax and jump to a syscall instruction”. But like libc by design has to have gadgets that load it up with basically any valid syscall number…