In the Linux kernel, the following vulnerability has been resolved: x86/fpu: Clear XSTATEBV[i] in guest XSAVE state whenever XFD[i]=1 When loading guest XSAVE state via KVMSETXSAVE, and when updating XFD in response to a guest WRMSR, clear XFD-disabled features in the saved (or to be restored) XSTATEBV to ensure KVM doesn't attempt to load state for features that are disabled via the guest's XFD. Because the kernel executes XRSTOR with the guest's XFD, saving XSTATEBV[i]=1 with XFD[i]=1 will cause XRSTOR to #NM and panic the kernel. E.g. if fpuupdateguestxfd() sets XFD without clearing XSTATEBV: ------------[ cut here ]------------ WARNING: arch/x86/kernel/traps.c:1524 at excdevicenotavailable+0x101/0x110, CPU#29: amxtest/848 Modules linked in: kvmintel kvm irqbypass CPU: 29 UID: 1000 PID: 848 Comm: amxtest Not tainted 6.19.0-rc2-ffa07f7fd437-x86amxnmxfdnoninit-vm #171 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:excdevicenotavailable+0x101/0x110 Call Trace: <TASK> asmexcdevicenotavailable+0x1a/0x20 RIP: 0010:restorefpregsfromfpstate+0x36/0x90 switchfpureturn+0x4a/0xb0 kvmarchvcpuioctlrun+0x1245/0x1e40 [kvm] kvmvcpuioctl+0x2c3/0x8f0 [kvm] _x64sysioctl+0x8f/0xd0 dosyscall64+0x62/0x940 entrySYSCALL64afterhwframe+0x4b/0x53 </TASK> ---[ end trace 0000000000000000 ]--- This can happen if the guest executes WRMSR(MSRIA32XFD) to set XFD[18] = 1, and a host IRQ triggers kernelfpubegin() prior to the vmexit handler's call to fpuupdateguestxfd(). and if userspace stuffs XSTATEBV[i]=1 via KVMSETXSAVE: ------------[ cut here ]------------ WARNING: arch/x86/kernel/traps.c:1524 at excdevicenotavailable+0x101/0x110, CPU#14: amxtest/867 Modules linked in: kvmintel kvm irqbypass CPU: 14 UID: 1000 PID: 867 Comm: amxtest Not tainted 6.19.0-rc2-2dace9faccd6-x86amxnmxfdnoninit-vm #168 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:excdevicenotavailable+0x101/0x110 Call Trace: <TASK> asmexcdevicenotavailable+0x1a/0x20 RIP: 0010:restorefpregsfromfpstate+0x36/0x90 fpuswapkvmfpstate+0x6b/0x120 kvmloadguestfpu+0x30/0x80 [kvm] kvmarchvcpuioctlrun+0x85/0x1e40 [kvm] kvmvcpuioctl+0x2c3/0x8f0 [kvm] _x64sysioctl+0x8f/0xd0 dosyscall64+0x62/0x940 entrySYSCALL64afterhwframe+0x4b/0x53 </TASK> ---[ end trace 0000000000000000 ]--- The new behavior is consistent with the AMX architecture. Per Intel's SDM, XSAVE saves XSTATEBV as '0' for components that are disabled via XFD (and non-compacted XSAVE saves the initial configuration of the state component): If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i, the instruction does not generate #NM when XCR0[i] = IA32XFD[i] = 1; instead, it operates as if XINUSE[i] = 0 (and the state component was in its initial state): it saves bit i of XSTATEBV field of the XSAVE header as 0; in addition, XSAVE saves the initial configuration of the state component (the other instructions do not save state component i). Alternatively, KVM could always do XRSTOR with XFD=0, e.g. by using a constant XFD based on the set of enabled features when XSAVEing for a struct fpuguest. However, having XSTATEBV[i]=1 for XFD-disabled features can only happen in the above interrupt case, or in similar scenarios involving preemption on preemptible kernels, because fpuswapkvmfpstate()'s call to savefpregstofpstate() saves the outgoing FPU state with the current XFD; and that is (on all but the first WRMSR to XFD) the guest XFD. Therefore, XFD can only go out of sync with XSTATEBV in the above interrupt case, or in similar scenarios involving preemption on preemptible kernels, and it we can consider it (de facto) part of KVM ABI that KVMGETXSAVE returns XSTATEBV[i]=0 for XFD-disabled features. [Move clea ---truncated---