In the Linux kernel, the following vulnerability has been resolved:
bpf: reject unhashed sockets in bpfskassign
The semantics for bpfskassign are as follows:
sk = some_lookup_func()
bpf_sk_assign(skb, sk)
bpf_sk_release(sk)
That is, the sk is not consumed by bpfskassign. The function therefore needs to make sure that sk lives long enough to be consumed from _inetlookup_skb. The path through the stack for a TCPv4 packet is roughly:
netifreceiveskbcore: takes RCU read lock _netifreceiveskbcore: schhandleingress: tcfclassify: bpfskassign() deliverptypelistskb: deliverskb: ippackettype->func == iprcv: iprcvcore: iprcvfinishcore: dstinput: iplocaldeliver: iplocaldeliverfinish: ipprotocoldeliverrcu: tcpv4rcv: _inetlookupskb: skbstealsock
The existing helper takes advantage of the fact that everything happens in the same RCU critical section: for sockets with SOCKRCUFREE set bpfskassign never takes a reference. skbstealsock then checks SOCKRCUFREE again and does sock_put if necessary.
This approach assumes that SOCKRCUFREE is never set on a sk between bpfskassign and skbstealsock, but this invariant is violated by unhashed UDP sockets. A new UDP socket is created in TCPCLOSE state but without SOCKRCUFREE set. That flag is only added in udplibgetport() which happens when a socket is bound.
When bpfskassign was added it wasn't possible to access unhashed UDP sockets from BPF, so this wasn't a problem. This changed in commit 0c48eefae712 ("sock_map: Lift socket state restriction for datagram sockets"), but the helper wasn't adjusted accordingly. The following sequence of events will therefore lead to a refcount leak:
Fix the problem by rejecting unhashed sockets in bpfskassign(). This matches the behaviour of _inetlookupskb which is ultimately the goal of bpfsk_assign().