In the Linux kernel, the following vulnerability has been resolved:
netfilter: nftables: do not defer rule destruction via callrcu
nftableschaindestroy can sleep, it can't be used from callrcu callbacks.
Moreover, nftablesrulerelease() is only safe for error unwinding, while transaction mutex is held and the to-be-desroyed rule was not exposed to either dataplane or dumps, as it deactives+frees without the required synchronizercu() in-between.
nftruleexprdeactivate() callbacks will change ->use counters of other chains/sets, see e.g. nftlookup .deactivate callback, these must be serialized via transaction mutex.
Also add a few lockdep asserts to make this more explicit.
Calling synchronize_rcu() isn't ideal, but fixing this without is hard and way more intrusive. As-is, we can get:
WARNING: .. net/netfilter/nftablesapi.c:5515 nftsetdestroy+0x.. Workqueue: events nftablestransdestroywork RIP: 0010:nftsetdestroy+0x3fe/0x5c0 Call Trace: <TASK> nftablestransdestroywork+0x6b7/0xad0 processonework+0x64a/0xce0 worker_thread+0x613/0x10d0
In case the synchronize_rcu becomes an issue, we can explore alternatives.
One way would be to allocate nfttransrule objects + one nfttranschain object, deactivate the rules + the chain and then defer the freeing to the nft destroy workqueue. We'd still need to keep the synchronize_rcu path as a fallback to handle -ENOMEM corner cases though.