In the Linux kernel, the following vulnerability has been resolved:
amd/amdkfd: enhance kfd process check in switch partition
current switch partition only check if kfdprocessestable is empty. kfdprcessestable entry is deleted in kfdprocessnotifierrelease, but kfdprocess tear down is in kfdprocesswq_release.
consider two processes:
Process A (workqueue) -> kfdprocesswqrelease -> Access kfdnode member Process B switch partition -> amdgpuxcpprepartitionswitch -> amdgpuamdkfddevicefinisw -> kfd_node tear down.
Process A and B may trigger a race as shown in dmesg log.
This patch is to resolve the race by adding an atomic kfdprocess counter kfdprocessescount, it increment as create kfd process, decrement as finish kfdprocesswqrelease.
v2: Put kfdprocessescount per kfddev, move decrement to kfdprocessdestroypdds and bug fix. (Philip Yang)
[3966658.307702] divide error: 0000 [#1] SMP NOPTI [3966658.350818] i10nmedac [3966658.356318] CPU: 124 PID: 38435 Comm: kworker/124:0 Kdump: loaded Tainted [3966658.356890] Workqueue: kfdprocesswq kfdprocesswqrelease [amdgpu] [3966658.362839] nfit [3966658.366457] RIP: 0010:kfdgetnumsdmaengines+0x17/0x40 [amdgpu] [3966658.366460] Code: 00 00 e9 ac 81 02 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 4f 08 48 8b b7 00 01 00 00 8b 81 58 26 03 00 99 <f7> be b8 01 00 00 80 b9 70 2e 00 00 00 74 0b 83 f8 02 ba 02 00 00 [3966658.380967] x86pkgtempthermal [3966658.391529] RSP: 0018:ffffc900a0edfdd8 EFLAGS: 00010246 [3966658.391531] RAX: 0000000000000008 RBX: ffff8974e593b800 RCX: ffff888645900000 [3966658.391531] RDX: 0000000000000000 RSI: ffff888129154400 RDI: ffff888129151c00 [3966658.391532] RBP: ffff8883ad79d400 R08: 0000000000000000 R09: ffff8890d2750af4 [3966658.391532] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000 [3966658.391533] R13: ffff8883ad79d400 R14: ffffe87ff662ba00 R15: ffff8974e593b800 [3966658.391533] FS: 0000000000000000(0000) GS:ffff88fe7f600000(0000) knlGS:0000000000000000 [3966658.391534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [3966658.391534] CR2: 0000000000d71000 CR3: 000000dd0e970004 CR4: 0000000002770ee0 [3966658.391535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [3966658.391535] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [3966658.391536] PKRU: 55555554 [3966658.391536] Call Trace: [3966658.391674] deallocatesdmaqueue+0x38/0xa0 [amdgpu] [3966658.391762] processterminationcpsch+0x1ed/0x480 [amdgpu] [3966658.399754] intelpowerclamp [3966658.402831] kfdprocessdequeuefromalldevices+0x5b/0xc0 [amdgpu] [3966658.402908] kfdprocesswqrelease+0x1a/0x1a0 [amdgpu] [3966658.410516] coretemp [3966658.434016] processonework+0x1ad/0x380 [3966658.434021] workerthread+0x49/0x310 [3966658.438963] kvmintel [3966658.446041] ? processonework+0x380/0x380 [3966658.446045] kthread+0x118/0x140 [3966658.446047] ? _kthreadbindmask+0x60/0x60 [3966658.446050] retfromfork+0x1f/0x30 [3966658.446053] Modules linked in: kpatch20765354(OEK) [3966658.455310] kvm [3966658.464534] mptcpdiag xskdiag rawdiag unixdiag afpacketdiag netlinkdiag udpdiag actpedit actmirred actvlan clsflower kpatch21951273(OEK) kpatch18424469(OEK) kpatch19749756(OEK) [3966658.473462] idxdmdev [3966658.482306] kpatch17971294(OEK) schingress xtconntrack amdgpu(OE) amdxcp(OE) amddrmbuddy(OE) amdsched(OE) amdttm(OE) amdkcl(OE) intelifs iptablemangle tcmloop targetcorepscsi tcpdiag targetcorefile inetdiag targetcoreiblock targetcoreuser targetcoremod coldpgs kpatch18383292(OEK) ip6tablenat ip6tablefilter ip6tables ipsethashipportip ipsethashipportnet ipsethashipport ipsetbitmapport xtcomment iptablenat nfnat iptablefilter iptables ipset ipvssh ipvswrr ipvsrr ipvs nfconntrack nfdefragipv6 nfdefragipv4 sncoreodd(OE) i40e overlay binfmt_misc tun bonding(OE) aisqos(OE) aisqo ---truncated---
{
"cna_assigner": "Linux",
"osv_generated_from": "https://github.com/CVEProject/cvelistV5/tree/main/cves/2025/68xxx/CVE-2025-68174.json"
}