Miggo

Thank you!

You're subscribed!

Oops! Something went wrong while submitting the form.

A few weeks ago I wrote about detecting CopyFail and DirtyFrag by thinking outside the box.

The thesis there was simple: you don’t catch a kernel LPE by chasing the root shell at the end of it — you catch it by recognizing the one abnormal pattern the exploit cannot avoid producing, and you do it with high confidence and near-zero false positives.

This is part two. Same job, different beast.

This time the target is CVE-2026–23111, a use-after-free in the Linux kernel’s nf_tables subsystem - the thing behind nftables, the modern replacement for iptables. It is reachable by any unprivileged user on an affected kernel, and it is the kind of bug that ends in a root shell or a container escape.

I went all the way with this one. I built a full, self-contained, fully autonomous exploit that takes an unprivileged user to uid=0 - leaking every kernel address it needs at runtime, no cheats. I'm publishing that exploit and the deep research in my own personal repository. But, exactly like last time, we can talk about the discovery freely, and we can talk about how we detect it - which is the part that matters for everyone running production Linux.

There is a lot of material to cover. We’ll go through:

The nf_tables building blocks: chains, the use counter, verdict maps, catchall elements, and transactions.
The bug — a single inverted character, and the delicious irony of where it came from.
From a reference-count off-by-one to a root shell (the short version).
Why the obvious detections fall apart.
How we catch it anyway — by watching the control-plane ritual instead of the payload.

The building blocks

nftables lets you build firewall rules out of a few primitives. You need four of them to follow this bug.

Chains hold rules. A chain carries a reference counter, chain->use, that counts how many things point at it - rules that goto/jump to it, map entries that resolve to it, and so on. The kernel refuses to delete a chain while use > 0. Hold that thought: this counter is the entire ballgame.
Verdict maps map a key to a verdict, and a verdict can be goto <chain> or jump <chain>. Taking such a reference bumps the target chain's use; dropping it decrements it.
Catchall elements are the wildcard entry in a set or map — * : goto some_chain. Here is the important detail: the pipapo set backend stores its normal elements in its own data structure, but catchall elements live somewhere else entirely, on a separate catchall_list. That separation is exactly where the bug hides.
Transactions. Every nftables change is a netlink batch that is staged and then either committed or aborted as a single unit. If anything in the batch fails, the kernel walks back every staged operation on the abort path and undoes it, so the ruleset ends up exactly as it was. Deactivating a catchall element during a staged delete has to be undone - reactivated - on abort. That undo is the line of code that's wrong.

The bug: one inverted character

On the abort path, catchall map elements are reactivated by nft_map_catchall_activate(). It is supposed to mirror its non-catchall sibling: skip the elements that are already active, and process the inactive ones (reactivating them and restoring the chain reference they hold). The vulnerable version does the exact opposite. The whole fix is the removal of a single !:

/* vulnerable */ if (!nft_set_elem_active(ext, genmask)) continue;
/* fixed      */ if ( nft_set_elem_active(ext, genmask)) continue;

Because the abort path skips the inactive (just-deactivated) catchall element instead of restoring it, the call that would bump the chain’s reference back never runs. The chain’s use counter is now permanently one too low - there is a live, uncounted reference to it.

The map still points at the chain. The kernel just believes one fewer reference exists than actually does. Drive that counter to zero and you can delete — and free — a chain that something is still pointing at. That is the use-after-free.

Here’s the part I love. This bug is not some ancient dusty corner of the kernel. It is a regression introduced by a security fix!

The very commit that added nft_map_catchall_activate() - 628bd3e49cba, "drop map element references from preparation phase" - was the remediation for an earlier nftables refcount bug, CVE-2023-4244. Fixing one reference-counting flaw quietly planted another, and it rode along into mainline (~v6.6) and got backported across the stable LTS branches. It sat there from late 2023 until a one-character patch landed in February 2026.

From a reference-count off-by-one to root (the short version)

I’ll keep this part high-level — the full chain and the source live in my personal repo — but the shape matters for understanding why the detection works.

Inside an unprivileged user + network namespace, build a table, a base chain, a victim chain, and a verdict map with a catchall : goto victim_chain. Add a goto rule. The victim chain's use is now 2.
In one batch, delete the map (this deactivates the catchall and drops use to 1) and then deliberately fail a later operation. The kernel aborts - and thanks to the bug, use is not restored. It stays at 1.
A benign transaction commits the undercount.
Cleanly delete a second map. use drops to 0 while the goto rule still references the chain.
Delete the chain. use == 0, so the kernel frees it - leaving the rule's verdict pointing at freed memory. Use-after-free.

From there it’s a heap game: the freed chain object gets reclaimed with attacker controlled data, and a packet that traverses the dangling verdict drives execution through a forged chain.

My exploit turns that into a KASLR leak, an arbitrary read, a walk of live kernel structures to recover the addresses it needs, and finally a short ROP chain that calls commit_creds (init_cred) and returns cleanly to userspace as ROOT - all of it leaked at runtime, nothing hardcoded that a real attacker couldn't obtain.

One honest aside worth its own post: affected is not the same as exploitable. The bug is reachable on a huge range of kernels — ./exploit check confirms it instantly - but turning it into a root shell depends on heap determinism (CONFIG_RANDOM_KMALLOC_CACHES) and forward-edge control-flow integrity (CONFIG_X86_KERNEL_IBT). A kernel can be fully vulnerable and still resist a given weaponization. That distinction is going to matter in a second, because it tells you where a detector should live.

Credit where it’s due: the vulnerability and the original exploitation strategy are the work of Exodus Intelligence and FuzzingLabs, and the fix is the upstream maintainers’. The functional, self-contained exploit is my own build.

Why the obvious detections fall apart

Now put on the defender hat. How do you catch this at runtime, on a production fleet, without a wall of false alarms?

The instinct is to detect the outcome. Catch the root shell. Catch the commit_creds. Catch the modprobe_path overwrite. Watch for the suspicious binary.

Every one of those is a trap:

The payload is infinitely variable. The ROP chain, the gadgets, the post-exploitation command, the target distro - all attacker-controlled and all different on every kernel build.

The FuzzingLabs writeup even says the quiet part out loud: gadgets change between builds, so "we will not go into detail on how to craft a ROP chain … and leave this exercise to the reader." If the published research won't pin the payload down, your detector certainly can't. We had to do it to detect it properly.

The heap grooming is invisible from where you'd want to watch. The spray is just a flood of ordinary allocations. There is no syscall that says "I am now reclaiming a freed nft_chain."
commit_creds / cred changes are noisy and late. By the time creds flip, you're detecting success, not the attack - and plenty of legitimate machinery touches creds.
Signaturing the exploit binary is pointless. It's a few hundred lines of C anyone can rewrite, rename, or fold into another process.

And here's the kicker from the section above: because affected ≠ exploitable, a payload-centric detector will also miss the attempts that matter most - the recon and the failed escalations on hardened kernels, where the bug is being poked but the root shell never materializes.

Those are exactly the events a defender wants to see first!!!

Thinking outside the box, again! Detect the ritual, not the result!

So we do what we did for CopyFail and DirtyFrag. We ignore the payload completely and ask: what does the vulnerability force the attacker to do, every single time, that normal software never does? For this bug, the answer is a very specific control-plane ritual.

To even reach the UAF, the attacker must:

Force an nftables abort - deliberately fail a batch - to trigger the buggy reactivation path.
Successfully delete a set/map shortly after, to push the corrupted reference count down to zero.
Successfully delete the chain while it's still referenced - the step that frees the live object.

That ordering - abort → DELSET → DELCHAIN, from the same process, in a tight time window - is not cosmetic. It is LOAD-BERING. You cannot remove it and still have the bug. No ROP chain, no gadget choice, no payload obfuscation changes it. It is the one invariant.

And - this is the whole point - it has no benign equivalent:

A failed nftables batch on its own? Routine. nft, firewalld, libvirt all produce aborted transactions.
A chain deletion on its own? Routine. Tearing down rules happens constantly.
A failed batch, immediately followed by both a successful set deletion and a successful chain deletion, all from one task, inside a few seconds?

NO PRODUCTION nftables FRONTEND does that under NORMAL operation - That sequence means "I errored on purpose and am now tearing a table apart" - which is the exploit's fingerprint, not a firewall manager's behavior.

We don't need to know which chain, which map, which namespace, or what the attacker plans to do with root. We only need to see the ritual.

How the detection works

The detector is a small eBPF program. It hooks three nf_tables control-plane functions
and runs a per-task state machine over them.

Hook	Used in exploit	Normal usage	Detection role
`nf_tables_abort`	Yes – forced via a bad batch	Failed batches	Start the window; count the abort
`nf_tables_delset`	Yes – DELSET after the abort	Normal set deletion	Sequence step 2 (only if `ret == 0`)
`nf_tables_delchain`	Yes – DELCHAIN after DELSET	Normal chain removal	Alert if a qualifying DELSET preceded it

The logic, in pseudocode (intentionally trimmed - see the disclaimer):

/* keyed per task leader; window is a few seconds */
on nf_tables_abort():
    if (now - last_abort > WINDOW) abort_count = 1; /* stale -> reset */
    else                           abort_count++;
    last_abort = now;

on nf_tables_delset(ret):
    if (ret != 0) return;                    /* only successful deletes */
    if (now - last_abort > WINDOW) return;   /* must follow a recent abort */
    last_delset = now;

on nf_tables_delchain(ret):
    if (ret != 0) return;
    if (now - last_abort  > WINDOW) return;
    if (now - last_delset > WINDOW) return;  /* the DELSET must have happened */
    ALERT("nft catchall UAF attempt");

A few choices in there are deliberate.

We only count successful deletes.
A DELSET or DELCHAIN that fails (ret != 0) doesn't advance the exploit - and on a patched kernel the critical DELCHAIN returns -EBUSY because the reference count is correct. The return value is, quite literally, the difference between "this kernel is vulnerable and the chain just got freed" and "this kernel is fine." Watching it for free is a gift.
Everything is keyed and time-boxed per task leader.
A distant unrelated abort never contributes to a later alert; the window resets. The state we keep per task is tiny - a couple of timestamps and a counter - so the overhead is proportional to how often these (rare) control-plane operations actually happen, which on a normal box is "almost never."
Why kretprobes instead of fexit?
This one is a thinking-outside-the-box detail in its own right. The most specific function in the public writeups, nft_map_catchall_activate(), isn't BTF-visible on current kernels, so we hook the stable command callbacks instead.
But even those have a portability trap: on Debian and Ubuntu, nf_tables is a loadable module; on RHEL and others it's built into vmlinux.
fexit hooks need BTF-visible signatures and libbpf has to find the function in either vmlinux or a module BTF file - and when nf_tables is a module, that lookup silently comes up empty and the hook never attaches.
kretprobe resolves any exported kernel symbol through kallsyms, module or not. We give up typed parameters, but we don't need them - all we want is the integer return value of DELSET/DELCHAIN, and ABORT doesn't use a return value at all.
The result is one detector that works identically whether nf_tables is baked in or loaded as a module. (We also had to teach our symbol resolver that module functions show up as name [module] in available_filter_functions - strip the suffix and both forms match.)

The payoff is the same one we got with CopyFail and DirtyFrag: the detector is blind to the payload and immune to it changing. Rewrite the ROP chain, swap the gadget, change the distro, obfuscate the binary, escape into a container:

NONE of it touches the abort → DELSET → DELCHAIN ritual.

And because we key on the control plane and the return codes, we light up on the attempt - including the failed escalations on hardened kernels that a payload-chasing tool would never see.

Conclusion (and the usual disclaimer)

CVE-2026-23111 is a beautiful little bug - a single inverted character, planted by the fix for a previous bug, turning a reference count into a use-after-free and a use-after-free into root.

The exploitation is deep and build-specific. The detection is not!

Because we refused to look at the exploitation at all. We looked at the one thing the vulnerability makes mandatory - a failed transaction followed by a set delete and a chain delete from a single task in a tight window - and that pattern simply does not occur in legitimate nftables usage.

As with part one, the detection code shown here is intentionally incomplete. It's a teaching sketch, not the production rule. A real-world version tightens this further: correlate the DELCHAIN to a chain that was actually a goto/jump target, accumulate aborts with a leaky bucket instead of a single timestamp to handle multi-round triggers, key on the network namespace as well as the task, and treat the -EBUSY return on DELCHAIN as a positive signal that a patched kernel just refused the exploit - useful telemetry in its own right. I'll leave those as the reader's exercise, the same way the public research left the ROP chain as ours.

The full self-contained exploit and the complete technical write-up - the leaks, the walk, the IBT aware finale, and the honest reliability and affectability caveats - are in my personal repository. Here, as always, we kept the part that helps defenders: the bug, why the obvious detections lose, and the one pattern that wins.

Special Thanks to Exodus Intelligence and FuzzingLabs for the original vulnerability research and exploitation strategy, and to the upstream netfilter maintainers for the one-character fix.

‍

<script src="https://cdn.jsdelivr.net/npm/gsap@3.12.5/dist/gsap.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/gsap@3.12.5/dist/Flip.min.js"></script>

<script>
  document.addEventListener("DOMContentLoaded", (event) => {
    gsap.registerPlugin(Flip);
    const state = Flip.getState("");
    const element = document.querySelector("");
    element.classList.toggle("");
    Flip.from(state, {
      duration: 0,
      ease: "none",
      absolute: true,
    });
  });
</script>

<script src="https://cdn.jsdelivr.net/npm/gsap@3.12.5/dist/gsap.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/gsap@3.12.5/dist/Flip.min.js"></script>

<script>
  document.addEventListener("DOMContentLoaded", (event) => {
    gsap.registerPlugin(Flip);
    const state = Flip.getState("");
    const element = document.querySelector("");
    element.classList.toggle("");
    Flip.from(state, {
      duration: 0,
      ease: "none",
      absolute: true,
    });
  });
</script>