In the Linux kernel, the following vulnerability has been resolved:
tcp: prevent concurrent execution of tcpskexit_batch
Its possible that two threads call tcpskexitbatch() concurrently, once from the cleanupnet workqueue, once from a task that failed to clone a new netns. In the latter case, error unwinding calls the exit handlers in reverse order for the 'failed' netns.
tcpskexitbatch() calls tcptwskpurge(). Problem is that since commit b099ce2602d8 ("net: Batch inettwskpurge"), this function picks up twsk in any dying netns, not just the one passed in via exitbatch list.
This means that the error unwind of setup_net() can "steal" and destroy timewait sockets belonging to the exiting netns.
This allows the netns exit worker to proceed to call
WARNONONCE(!refcountdecandtest(&net->ipv4.tcpdeathrow.twrefcount));
without the expected 1 -> 0 transition, which then splats.
At same time, error unwind path that is also running inettwskpurge() will splat as well:
WARNING: .. at lib/refcount.c:31 refcountwarnsaturate+0x1ed/0x210 ... refcountdec include/linux/refcount.h:351 [inline] inettwskkill+0x758/0x9c0 net/ipv4/inettimewaitsock.c:70 inettwskdescheduleput net/ipv4/inettimewaitsock.c:221 inettwskpurge+0x725/0x890 net/ipv4/inettimewaitsock.c:304 tcpskexitbatch+0x1c/0x170 net/ipv4/tcpipv4.c:3522 opsexitlist+0x128/0x180 net/core/netnamespace.c:178 setupnet+0x714/0xb40 net/core/netnamespace.c:375 copynetns+0x2f0/0x670 net/core/netnamespace.c:508 createnewnamespaces+0x3ea/0xb10 kernel/nsproxy.c:110
... because refcountdec() of twrefcount unexpectedly dropped to 0.
This doesn't seem like an actual bug (no tw sockets got lost and I don't see a use-after-free) but as erroneous trigger of debug check.
Add a mutex to force strict ordering: the task that calls tcptwskpurge() blocks other task from doing final decand_test before mutex-owner has removed all tw sockets of dying netns.