In the Linux kernel, the following vulnerability has been resolved:
PCI: Fix use-after-free of slot->bus on hot remove
Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
Since commit 0fc70886569c ("thunderbolt: Reset USB4 v2 host router") and commit 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware"), USB4 v2 and v1 Host Routers are reset on probe of the thunderbolt driver.
The reset clears the Presence Detect State and Data Link Layer Link Active bits at the USB4 Host Router's Root Port and thus causes hot removal of the dock.
The crash occurs when pciehp is unbound from one of the dock's Downstream Ports: pciehp creates a pcislot on bind and destroys it on unbind. The pcislot contains a pointer to the pcibus below the Downstream Port, but a reference on that pcibus is never acquired. The pcibus is destroyed before the pcislot, so a use-after-free ensues when pcislotrelease() accesses slot->bus.
In principle this should not happen because pcistopbusdevice() unbinds pciehp (and therefore destroys the pcislot) before the pcibus is destroyed by pciremovebusdevice().
However the stacktrace provided by Dennis shows that pciehp is unbound from pciremovebusdevice() instead of pcistopbusdevice(). To understand the significance of this, one needs to know that the PCI core uses a two step process to remove a portion of the hierarchy: It first unbinds all drivers in the sub-hierarchy in pcistopbusdevice() and then actually removes the devices in pciremovebusdevice(). There is no precaution to prevent driver binding in-between pcistopbusdevice() and pciremovebusdevice().
In Dennis' case, it seems removal of the hierarchy by pciehp races with driver binding by pcibusadddevices(). pciehp is bound to the Downstream Port after pcistopbusdevice() has run, so it is unbound by pciremovebusdevice() instead of pcistopbusdevice(). Because the pci_bus has already been destroyed at that point, accesses to it result in a use-after-free.
One might conclude that driver binding needs to be prevented after pcistopbusdevice() has run. However it seems risky that pcislot points to pcibus without holding a reference. Solely relying on correct ordering of driver unbind versus pcibus destruction is certainly not defensive programming.
If pcislot has a need to access data in pcibus, it ought to acquire a reference. Amend pcicreateslot() accordingly. Dennis reports that the crash is not reproducible with this change.
Abridged stacktrace:
pcieport 0000:00:07.0: PME: Signaling with IRQ 156 pcieport 0000:00:07.0: pciehp: Slot #12 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+ pcibus 0000:20: dev 00, created physical slot 12 pcieport 0000:00:07.0: pciehp: Slot(12): Card not present ... pcieport 0000:21:02.0: pciehp: pciedisablenotification: SLOTCTRL d8 write cmd 0 Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI CPU: 13 UID: 0 PID: 134 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #1 RIP: 0010:devdriverstring+0x12/0x40 pcidestroyslot pciehpremove pcieportremoveservice devicereleasedriverinternal busremovedevice devicedel deviceunregister removeiter deviceforeachchild pcieportdrvremove pcideviceremove devicereleasedriverinternal busremovedevice devicedel pciremovebusdevice (recursive invocation) pciremovebusdevice pciehpunconfiguredevice pciehpdisableslot pciehphandlepresenceorlinkchange pciehpist