In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix TCP timers deadlock after rmmod
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") fixed a netns UAF by manually enabled socket refcounting (sk->sknetrefcnt=1 and sockinuseadd(net, 1)).
The reason the patch worked for that bug was because we now hold references to the netns (getnettrack() gets a ref internally) and they're properly released (internally, on _skdestruct()), but only because sk->sknetrefcnt was set.
Problem: (this happens regardless of CONFIGNETNSREFCNTTRACKER and regardless if init_net or other)
Setting sk->sknetrefcnt=1 manually and after socket creation is not only out of cifs scope, but also technically wrong -- it's set conditionally based on user (=1) vs kernel (=0) sockets. And net/ implementations seem to base their user vs kernel space operations on it.
e.g. upon TCP socket close, the TCP timers are not cleared because sk->sknetrefcnt=1: (cf. commit 151c9c724d05 ("tcp: properly terminate timers for kernel sockets"))
net/ipv4/tcp.c: void tcpclose(struct sock *sk, long timeout) { locksock(sk); _tcpclose(sk, timeout); releasesock(sk); if (!sk->sknetrefcnt) inetcskclearxmittimerssync(sk); sock_put(sk); }
Which will throw a lockdep warning and then, as expected, deadlock on tcpwritetimer().
A way to reproduce this is by running the reproducer from ef7134c7fc48 and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep warning shows up.
Fix: We shouldn't mess with socket internals ourselves, so do not set sknetrefcnt manually.
Also change _sockcreate() to sockcreatekern() for explicitness.
As for non-init_net network namespaces, we deal with it the best way we can -- hold an extra netns reference for server->ssocket and drop it when it's released. This ensures that the netns still exists whenever we need to create/destroy server->ssocket, but is not directly tied to it.