In the Linux kernel, the following vulnerability has been resolved:
fs/netfs/fscachecookie: add missing "naccesses" check
This fixes a NULL pointer dereference bug due to a data race which looks like this:
BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: errorcode(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: eventsunbound netfsrreqwritetocachework RIP: 0010:cachefilespreparewrite+0x30/0xa0 Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10 RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286 RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000 RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438 RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001 R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68 R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00 FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0 Call Trace: <TASK> ? _die+0x1f/0x70 ? pagefaultoops+0x15d/0x440 ? searchmoduleextables+0xe/0x40 ? fixupexception+0x22/0x2f0 ? excpagefault+0x5f/0x100 ? asmexcpagefault+0x22/0x30 ? cachefilespreparewrite+0x30/0xa0 netfsrreqwritetocachework+0x135/0x2e0 processonework+0x137/0x2c0 workerthread+0x2e9/0x400 ? _pfxworkerthread+0x10/0x10 kthread+0xcc/0x100 ? _pfxkthread+0x10/0x10 retfromfork+0x30/0x50 ? _pfxkthread+0x10/0x10 retfromforkasm+0x1b/0x30 </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]---
This happened because fscachecookiestatemachine() was slow and was still running while another process invoked fscacheunusecookie(); this led to a fscachecookielrudoone() call, setting the FSCACHECOOKIEDOLRUDISCARD flag, which was picked up by fscachecookiestatemachine(), withdrawing the cookie via cachefileswithdrawcookie(), clearing cookie->cache_priv.
At the same time, yet another process invoked cachefilespreparewrite(), which found a NULL pointer in this code line:
struct cachefilesobject *object = cachefilescres_object(cres);
The next line crashes, obviously:
struct cachefiles_cache *cache = object->volume->cache;
During cachefilespreparewrite(), the "naccesses" counter is non-zero (via fscachebegin_operation()). The cookie must not be withdrawn until it drops to zero.
The counter is checked by fscachecookiestatemachine() before switching to FSCACHECOOKIESTATERELINQUISHING and FSCACHECOOKIESTATEWITHDRAWING (in "case FSCACHECOOKIESTATEFAILED"), but not for FSCACHECOOKIESTATELRUDISCARDING ("case FSCACHECOOKIESTATE_ACTIVE").
This patch adds the missing check. With a non-zero access counter, the function returns and the next fscacheendcookieaccess() call will queue another fscachecookiestatemachine() call to handle the still-pending FSCACHECOOKIEDOLRUDISCARD.