In the Linux kernel, the following vulnerability has been resolved:
x86/mce: use iscopyfrom_user() to determine copy-from-user context
Patch series "mm/hwpoison: Fix regressions in memory failure handling", v4.
This patchset resolves two critical regressions related to memory failure handling that have appeared in the upstream kernel since version 5.17, as compared to 5.10 LTS.
- copyin case: poison found in user page while kernel copying from user space
- instr case: poison found while instruction fetching in user space
Kernel can recover from poison found where kernel is doing getuser() or copyfromuser() if those places get an error return and the kernel return -EFAULT to the process instead of crashing. More specifily, MCE handler checks the fixup handler type to decide whether an in kernel #MC can be recovered. When EXTYPEUACCESS is found, the PC jumps to recovery code specified in _ASMEXTABLE_FAULT() and return a -EFAULT to user space.
If a poison found while instruction fetching in user space, full recovery is possible. User process takes #PF, Linux allocates a new page and fills by reading from storage.
Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable fixup type, EXTYPEEFAULTREG, and later patches updated the extable fixup type for copy-from-user operations, changing it from EXTYPEUACCESS to EXTYPEEFAULTREG. It breaks previous EXTYPEUACCESS handling when posion found in getuser() or copyfrom_user().
When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed.
Prior to Icelake memory controllers reported patrol scrub events that detected a previously unseen uncorrected error in memory by signaling a broadcast machine check with an SRAO (Software Recoverable Action Optional) signature in the machine check bank. This was overkill because it's not an urgent problem that no core is on the verge of consuming that bad data. It's also found that multi SRAO UCE may cause nested MCE interrupts and finally become an IERR.
Hence, Intel downgrades the machine check bank signature of patrol scrub from SRAO to UCNA (Uncorrected, No Action required), and signal changed to
ucdecodenotifier()) to try to offline the page despite the UCNA signature name.
Intel platform [1]
Having decided that CMCI/UCNA is the best action for patrol scrub errors, the memory controller uses it for reads too. But the memory controller is executing asynchronously from the core, and can't tell the difference between a "real" read and a speculative read. So it will do CMCI/UCNA if an error is found in any read.
Thus:
1) Core is clever and thinks address A is needed soon, issues a speculative read.
2) Core finds it is going to use address A soon after sending the read request
3) The CMCI from the memory controller is in a race with MCE from the core that will soon try to retire the load from address A.
Quite often (because speculation has got better) the CMCI from the memory controller is delivered before the core is committed to the instruction reading address A, so the interrupt is taken, and Linux offlines the page (marking it as poison).
Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not ---truncated---