In the Linux kernel, the following vulnerability has been resolved: bpf: fix ktls panic with sockmap [ 2172.936997] ------------[ cut here ]------------ [ 2172.936999] kernel BUG at lib/ioviter.c:629! ...... [ 2172.944996] PKRU: 55555554 [ 2172.945155] Call Trace: [ 2172.945299] <TASK> [ 2172.945428] ? die+0x36/0x90 [ 2172.945601] ? dotrap+0xdd/0x100 [ 2172.945795] ? ioviterrevert+0x178/0x180 [ 2172.946031] ? ioviterrevert+0x178/0x180 [ 2172.946267] ? doerrortrap+0x7d/0x110 [ 2172.946499] ? ioviterrevert+0x178/0x180 [ 2172.946736] ? excinvalidop+0x50/0x70 [ 2172.946961] ? ioviterrevert+0x178/0x180 [ 2172.947197] ? asmexcinvalidop+0x1a/0x20 [ 2172.947446] ? ioviterrevert+0x178/0x180 [ 2172.947683] ? ioviterrevert+0x5c/0x180 [ 2172.947913] tlsswsendmsglocked.isra.0+0x794/0x840 [ 2172.948206] tlsswsendmsg+0x52/0x80 [ 2172.948420] ? inetsendmsg+0x1f/0x70 [ 2172.948634] _syssendto+0x1cd/0x200 [ 2172.948848] ? findheldlock+0x2b/0x80 [ 2172.949072] ? syscalltraceenter+0x140/0x270 [ 2172.949330] ? _lockrelease.isra.0+0x5e/0x170 [ 2172.949595] ? findheldlock+0x2b/0x80 [ 2172.949817] ? syscalltraceenter+0x140/0x270 [ 2172.950211] ? lockdephardirqsonprepare+0xda/0x190 [ 2172.950632] ? ktimegetcoarserealts64+0xc2/0xd0 [ 2172.951036] _x64syssendto+0x24/0x30 [ 2172.951382] dosyscall64+0x90/0x170 ...... After calling bpfexectxverdict(), the size of msgpl->sg may increase, e.g., when the BPF program executes bpfmsgpushdata(). If the BPF program sets corkbytes and sg.size is smaller than corkbytes, it will return -ENOSPC and attempt to roll back to the non-zero copy logic. However, during rollback, msg->msgiter is reset, but since msgpl->sg.size has been increased, subsequent executions will exceed the actual size of msgiter. ''' ioviterrevert(&msg->msgiter, msgpl->sg.size - origsize); ''' The changes in this commit are based on the following considerations: 1. When corkbytes is set, rolling back to non-zero copy logic is pointless and can directly go to zero-copy logic. 2. We can not calculate the correct number of bytes to revert msgiter. Assume the original data is "abcdefgh" (8 bytes), and after 3 pushes by the BPF program, it becomes 11-byte data: "abc?de?fgh?". Then, we set corkbytes to 6, which means the first 6 bytes have been processed, and the remaining 5 bytes "?fgh?" will be cached until the length meets the corkbytes requirement. However, some data in "?fgh?" is not within 'sg->msgiter' (but in msgpl instead), especially the data "?" we pushed. So it doesn't seem as simple as just reverting through an offset of msgiter. 3. For non-TLS sockets in tcpbpfsendmsg, when a "cork" situation occurs, the user-space send() doesn't return an error, and the returned length is the same as the input length parameter, even if some data is cached. Additionally, I saw that the current non-zero-copy logic for handling corking is written as: ''' line 1177 else if (ret != -EAGAIN) { if (ret == -ENOSPC) ret = 0; goto sendend; ''' So it's ok to just return 'copied' without error when a "cork" situation occurs.