In the Linux kernel, the following vulnerability has been resolved:
bpf: fix ktls panic with sockmap
[ 2172.936997] ------------[ cut here ]------------ [ 2172.936999] kernel BUG at lib/ioviter.c:629! ...... [ 2172.944996] PKRU: 55555554 [ 2172.945155] Call Trace: [ 2172.945299] <TASK> [ 2172.945428] ? die+0x36/0x90 [ 2172.945601] ? dotrap+0xdd/0x100 [ 2172.945795] ? ioviterrevert+0x178/0x180 [ 2172.946031] ? ioviterrevert+0x178/0x180 [ 2172.946267] ? doerrortrap+0x7d/0x110 [ 2172.946499] ? ioviterrevert+0x178/0x180 [ 2172.946736] ? excinvalidop+0x50/0x70 [ 2172.946961] ? ioviterrevert+0x178/0x180 [ 2172.947197] ? asmexcinvalidop+0x1a/0x20 [ 2172.947446] ? ioviterrevert+0x178/0x180 [ 2172.947683] ? ioviterrevert+0x5c/0x180 [ 2172.947913] tlsswsendmsglocked.isra.0+0x794/0x840 [ 2172.948206] tlsswsendmsg+0x52/0x80 [ 2172.948420] ? inetsendmsg+0x1f/0x70 [ 2172.948634] _syssendto+0x1cd/0x200 [ 2172.948848] ? findheldlock+0x2b/0x80 [ 2172.949072] ? syscalltraceenter+0x140/0x270 [ 2172.949330] ? _lockrelease.isra.0+0x5e/0x170 [ 2172.949595] ? findheldlock+0x2b/0x80 [ 2172.949817] ? syscalltraceenter+0x140/0x270 [ 2172.950211] ? lockdephardirqsonprepare+0xda/0x190 [ 2172.950632] ? ktimegetcoarserealts64+0xc2/0xd0 [ 2172.951036] _x64syssendto+0x24/0x30 [ 2172.951382] dosyscall_64+0x90/0x170 ......
After calling bpfexectxverdict(), the size of msgpl->sg may increase, e.g., when the BPF program executes bpfmsgpush_data().
If the BPF program sets corkbytes and sg.size is smaller than corkbytes, it will return -ENOSPC and attempt to roll back to the non-zero copy logic. However, during rollback, msg->msgiter is reset, but since msgpl->sg.size has been increased, subsequent executions will exceed the actual size of msgiter. ''' ioviterrevert(&msg->msgiter, msgpl->sg.size - origsize); '''
The changes in this commit are based on the following considerations:
When cork_bytes is set, rolling back to non-zero copy logic is pointless and can directly go to zero-copy logic.
We can not calculate the correct number of bytes to revert msg_iter.
Assume the original data is "abcdefgh" (8 bytes), and after 3 pushes by the BPF program, it becomes 11-byte data: "abc?de?fgh?". Then, we set corkbytes to 6, which means the first 6 bytes have been processed, and the remaining 5 bytes "?fgh?" will be cached until the length meets the corkbytes requirement.
However, some data in "?fgh?" is not within 'sg->msgiter' (but in msgpl instead), especially the data "?" we pushed.
So it doesn't seem as simple as just reverting through an offset of msg_iter.
Additionally, I saw that the current non-zero-copy logic for handling corking is written as: ''' line 1177 else if (ret != -EAGAIN) { if (ret == -ENOSPC) ret = 0; goto send_end; '''
So it's ok to just return 'copied' without error when a "cork" situation occurs.