In the Linux kernel, the following vulnerability has been resolved:
tcpbpf: Fix the skmemuncharge logic in tcpbpf_sendmsg
The current sk memory accounting logic in _SKREDIRECT is pre-uncharging tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.
Potential problems with this strategy are as follows:
If the actual sent bytes are smaller than tosend, we need to charge some bytes back, as in line 487, which is okay but seems not clean.
When tosend is set to applybytes, as in line 417, and (ret < 0), we may miss uncharging (msg->sg.size - applybytes) bytes.
[...] 415 tosend = msg->sg.size; 416 if (psock->applybytes && psock->applybytes < tosend) 417 tosend = psock->applybytes; [...] 443 skmsgreturn(sk, msg, tosend); 444 releasesock(sk); 446 origsize = msg->sg.size; 447 ret = tcpbpfsendmsgredir(skredir, rediringress, 448 msg, tosend, flags); 449 sent = origsize - msg->sg.size; [...] 454 locksock(sk); 455 if (unlikely(ret < 0)) { 456 int free = skmsgfreenocharge(sk, msg); 458 if (!cork) 459 *copied -= free; 460 } [...] 487 if (eval == _SKREDIRECT) 488 skmem_charge(sk, tosend - sent); [...]
When running the selftest testtxmsgredirwaitsndmem with txmsg_apply, the following warning will be reported:
------------[ cut here ]------------ WARNING: CPU: 6 PID: 57 at net/ipv4/afinet.c:156 inetsockdestruct+0x190/0x1a0 Modules linked in: CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events skpsockdestroy RIP: 0010:inetsockdestruct+0x190/0x1a0 RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206 RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800 RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900 RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0 R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400 R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100 FS: 0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? _warn+0x89/0x130 ? inetsockdestruct+0x190/0x1a0 ? reportbug+0xfc/0x1e0 ? handlebug+0x5c/0xa0 ? excinvalidop+0x17/0x70 ? asmexcinvalidop+0x1a/0x20 ? inetsockdestruct+0x190/0x1a0 _skdestruct+0x25/0x220 skpsockdestroy+0x2b2/0x310 processscheduledworks+0xa3/0x3e0 workerthread+0x117/0x240 ? _pfxworkerthread+0x10/0x10 kthread+0xcf/0x100 ? _pfxkthread+0x10/0x10 retfromfork+0x31/0x40 ? _pfxkthread+0x10/0x10 retfromforkasm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]---
In _SKREDIRECT, a more concise way is delaying the uncharging after sent bytes are finalized, and uncharge this value. When (ret < 0), we shall invoke skmsgfree.
Same thing happens in case _SKDROP, when tosend is set to applybytes, we may miss uncharging (msg->sg.size - applybytes) bytes. The same warning will be reported in selftest.
[...] 468 case _SKDROP: 469 default: 470 skmsgfreepartial(sk, msg, tosend); 471 skmsgapplybytes(psock, tosend); 472 *copied -= (tosend + delta); 473 return -EACCES; [...]
So instead of skmsgfreepartial we can do skmsg_free here.