In the Linux kernel, the following vulnerability has been resolved:
vfs: Don't evict inode under the inode lru traversing context
The inode reclaiming process(See function pruneicachesb) collects all reclaimable inodes and mark them with IFREEING flag at first, at that time, other processes will be stuck if they try getting these inodes (See function findinodefast), then the reclaiming process destroy the inodes by function disposelist(). Some filesystems(eg. ext4 with ea_inode feature, ubifs with xattr) may do inode lookup in the inode evicting callback function, if the inode lookup is operated under the inode lru traversing context, deadlock problems may happen.
Case 1: In function ext4evictinode(), the ea inode lookup could happen if ea_inode feature is enabled, the lookup process will be stuck under the evicting context like this:
Then, following three processes running like this:
PA PB echo 2 > /proc/sys/vm/dropcaches shrinkslab prunedcachesb // ireg is added into lru, lru->iea->ireg pruneicachesb listlruwalkone inodelruisolate iea->istate |= IFREEING // set inode state inodelruisolate _iget(ireg) spinunlock(&ireg->ilock) spinunlock(lrulock) rm file A ireg->nlink = 0 iput(ireg) // ireg->nlink is 0, do evict ext4evictinode ext4xattrdeleteinode ext4xattrinodedecrefall ext4xattrinodeiget ext4iget(iea->iino) igetlocked findinodefast _waitonfreeinginode(iea) ----→ AA deadlock disposelist // cannot be executed by pruneicachesb wakeupbit(&iea->istate)
Case 2: In deleted inode writing function ubifsjnlwriteinode(), file deleting process holds BASEHD's wbuf->iomutex while getting the xattr inode, which could race with inode reclaiming process(The reclaiming process could try locking BASEHD's wbuf->io_mutex in inode evicting function), then an ABBA deadlock problem would happen as following:
Then, following three processes running like this:
PA PB PC
echo 2 > /proc/sys/vm/drop_caches
shrink_slab
prune_dcache_sb
// ib and ia are added into lru, lru->ixa->ib->ia
prune_icache_sb
list_lru_walk_one
inode_lru_isolate
ixa->i_state |= I_FREEING // set inode state
inode_lru_isolate
__iget(ib)
spin_unlock(&ib->i_lock)
spin_unlock(lru_lock)
rm file B
ib->nlink = 0
rm file A iput(ia) ubifsevictinode(ia) ubifsjnldeleteinode(ia) ubifsjnlwriteinode(ia) makereservation(BASEHD) // Lock wbuf->iomutex ubifsiget(ixa->iino) igetlocked findinodefast _waitonfreeinginode(ixa) | iput(ib) // ib->nlink is 0, do evict | ubifsevictinode | ubifsjnldeleteinode(ib) ↓ ubifsjnlwriteinode ABBA deadlock ←-----makereservation(BASEHD) disposelist // cannot be executed by pruneicachesb wakeupbit(&ixa->istate)
Fix the possible deadlock by using new inode state flag ILRUISOLATING to pin the inode in memory while inodelruisolate( ---truncated---