In the Linux kernel, the following vulnerability has been resolved:
igb: clean up in all error paths when enabling SR-IOV
After commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), removing the igb module could hang or crash (depending on the machine) when the module has been loaded with the max_vfs parameter set to some value != 0.
In case of one test machine with a dual port 82580, this hang occurred:
[ 232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1 [ 233.093257] igb 0000:41:00.1: IOV Disabled [ 233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0 [ 233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.352248] igb 0000:41:00.0: device [8086:1516] error status/mask=00100000 [ 233.361088] igb 0000:41:00.0: [20] UnsupReq (First) [ 233.368183] igb 0000:41:00.0: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.388779] igb 0000:41:00.1: device [8086:1516] error status/mask=00100000 [ 233.397629] igb 0000:41:00.1: [20] UnsupReq (First) [ 233.404736] igb 0000:41:00.1: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.538214] pci 0000:41:00.1: AER: can't recover (no errordetected callback) [ 233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0 [ 233.546197] pcieport 0000:40:01.0: AER: device recovery failed [ 234.157244] igb 0000:41:00.0: IOV Disabled [ 371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds. [ 371.627489] Not tainted 6.4.0-dirty #2 [ 371.632257] "echo 0 > /proc/sys/kernel/hungtasktimeoutsecs" disables this. [ 371.641000] task:irq/35-aerdrv state:D stack:0 pid:257 ppid:2 f0 [ 371.650330] Call Trace: [ 371.653061] <TASK> [ 371.655407] _schedule+0x20e/0x660 [ 371.659313] schedule+0x5a/0xd0 [ 371.662824] schedulepreemptdisabled+0x11/0x20 [ 371.667983] _mutexlock.constprop.0+0x372/0x6c0 [ 371.673237] ? _pfxaerrootreset+0x10/0x10 [ 371.678105] reporterrordetected+0x25/0x1c0 [ 371.682974] ? _pfxreportnormaldetected+0x10/0x10 [ 371.688618] pciwalkbus+0x72/0x90 [ 371.692519] pciedorecovery+0xb2/0x330 [ 371.696899] aerprocesserrdevices+0x117/0x170 [ 371.702055] aerisr+0x1c0/0x1e0 [ 371.705661] ? _setcpusallowedptr+0x54/0xa0 [ 371.710723] ? _pfxirqthreadfn+0x10/0x10 [ 371.715496] irqthreadfn+0x20/0x60 [ 371.719491] irqthread+0xe6/0x1b0 [ 371.723291] ? _pfxirqthreaddtor+0x10/0x10 [ 371.728255] ? _pfxirqthread+0x10/0x10 [ 371.732731] kthread+0xe2/0x110 [ 371.736243] ? _pfxkthread+0x10/0x10 [ 371.740430] retfrom_fork+0x2c/0x50 [ 371.744428] </TASK>
The reproducer was a simple script:
#!/bin/sh
for i in seq 1 5; do
modprobe -rv igb
modprobe -v igb max_vfs=1
sleep 1
modprobe -rv igb
done
It turned out that this could only be reproduce on 82580 (quad and dual-port), but not on 82576, i350 and i210. Further debugging showed that igbenablesriov()'s call to pcienablesriov() is failing, because dev->is_physfn is 0 on 82580.
Prior to commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), igbenablesriov() jumped into the "err_out" cleanup branch. After this commit it only returned the error code.
So the cleanup didn't take place, and the incorrect VF setup in the igb_adapter structure fooled the igb driver into assuming that VFs have been set up where no VF actually existed.
Fix this problem by cleaning up again if pcienablesriov() fails.
{
"osv_generated_from": "https://github.com/CVEProject/cvelistV5/tree/main/cves/2023/54xxx/CVE-2023-54070.json",
"cna_assigner": "Linux"
}