Lately, all sorts of Linux-related sites have oohed and aahed (they more liked whined, actually) regarding a bug in kernel 6.1 LTS. The bug has been reported by Debian, but it’s an upstream bug. This was more than what the French would call a contretemps.

Some quick pointers

Debian micronews, 09 December 2023 19:35:00:

Due to an issue in ext4 with data corruption in kernel 6.1.64-1, we are pausing the 12.3 image release for today while we attend to fixes. Please do not update any systems at this time, we urge caution for users with UnattendedUpgrades configured. Please see bug# 1057843: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057843

Debian qualifies the bug as grave, despite leading to “non-serious data loss”; found in version linux/6.1.64-1, fixed in version linux/6.1.66-1.

LWN.net: Ext4 data corruption in stable kernels.

Phoronix: Debian 12.3 Delayed Due To An EXT4 Data Corruption Bug Being Addressed.

The Reg: Kernel kerfuffle kiboshes Debian 12.3 release. A mis-merged patch causing corruption on ext4 volumes is to blame.

Funny thing, there are idiots who never understood that this was an upstream issue, not something to blame on Debian. It’s this retard on Phoronix (4159 posts since July 2008, he must be Master Hater):

Ext4 is fine, Debian just happened to break it.

Then this crypto-sucker on The Reg:

Sigh. Another case of “Debian knows better than the upstreams”. Like the “improvements” to ssh key generation which removed almost all entropy.

Please: just leave the software alone, and package it. In particular, give me a kernel that’s as close to the one Linus released as possible.

Oh, those fuckers. If they knew how irresponsibly buggy is the kernel c/o Linus Torvalds and Greg Kroah-Hartman and the gang… I’ll hopefully show you before the end of the year an incredible case of such irresponsibility.

Meanwhile, Debian 12.4 was released:

Debian 12.4 is released with linux-image-6.1.0-15 (6.1.66-1), along with a few other bug fixes.

WTF, what’s with this dumb versioning scheme in Debian and Ubuntu, why can’t they use the real versions? I mean, the kernel in Debian 12 has to be called “6.1.0-something” and the kernel in Ubuntu 22.04 LTS has to be called “5.15.0-something” regardless of the actual version, because if some idiot sees another number, of if they dumbly pin a kernel version, that would be the end of the world, right?

But things are not solved for Debian! As I’m writing this, Debian still seems to be lacking 6.1.0-16 aka 6.1.67, which is what they need to have! Why is that so? Because of a second bug, also courtesy of the retarded team that releases the official Linux kernel.

Quick explanation

Courtesy of The Reg, here’s the short version of the story.

■ On November 1, a small ext4 performance-enhancing patch has been backported from 6.5 to 6.1. Unfortunately, for this patch to work, another patch should have been backported too, and it wasn’t. Without this other patch, the kernel would corrupt the filesystem in certain cases. Jan Kára (aka Honza), a SUSE kernel developer, reported the issue:

I’ve noticed at least 6.1 is still carrying the problematic commit. Greg, please take out the commit from all stable kernels before 6.5 as soon as possible, we’ll figure out proper backport once user data are not being corrupted anymore.

The issue has been fixed in 6.1.66, released in Debian as 6.1.0-15. As a user of ELRepo’s kernel-lt, I got 6.1.66 after they released it on December 8 (I got it the next day, I believe).

But 6.1.66 includes another bug! On November 24, a Wi-Fi related patch has been backported to 6.1, but this one too would have required yet another patch to have been backported (you guess it, it wasn’t). Once this has been noticed, the backporting has been reverted on December 11, which resulted in the kernels 6.1.67 and 6.6.6 being released.

The current situation

Debian still doesn’t have 6.1.67, so it’s affected by the Wi-Fi bug. Less damaging, but still.

ELRepo has released 6.1.67 for EL9 on December 11 by their file date, but I only managed to retrieve it on December 12. They also released 6.6.6 the same day.

Funny thing, this Wi-Fi-related faulty patch really affected me in AlmaLinux 9.3 with ELRepo’s kernel-lt! What happened is that on my latest and cheapest Acer, NetworkManager was waking up the laptop from sleep, with high CPU at that, and after this phenomenon happened, it would also fail to shut down. Once I upgraded to 6.1.67, everything is fine (so far).

TL;DR:
If you’re using kernel 6.1 (kernel-lt in ELRepo), you need to upgrade to 6.1.67 pronto.
If you’re using kernel 6.6 (kernel-ml in ELRepo), you need to upgrade to 6.6.6 asap.

Final thoughts

One can trust the Linux kernel only so much, even the LTS line. That’s sad for a project that has 30+ years behind it. I suppose things will get even worse once Linus Torvalds and Greg Kroah-Hartman will step down, and I don’t expect them to live forever…

LATE EDIT: Before you get ballistic, check out this one: Data-destroying defect found after OpenZFS 2.2.0 release:

A data-destroying bug has been discovered following the release of OpenZFS 2.2.0 as found in FreeBSD 14 among other OSes.

This file-trashing flaw is believed to be present in multiple versions of OpenZFS, not just version 2.2.0. It was initially thought that a feature new to that release, a feature called block cloning, primarily caused the data loss. However, it now appears, as of 1945 UTC, November 27, that this cloning feature simply exacerbates a previously unknown underlying bug. We’re told the corruption is quite rare in real-world operation.

Part of the ongoing confusion around this data-destroying bug is that it seems to happen with and without block cloning enabled. It’s suggested this may be the fix the file system needs. … Upgrading to OpenZFS 2.2.1 turns off the cloning feature, which may well reduce the chance of you encountering the flaw.

They don’t know shit. “May reduce the chance” is not good enough! And OpenZFS was supposed to be one of the safest file systems, being copy-on-write and having tons of useful features (read: a terrible complexity)! In my opinion (not that anyone would care about), copy-on-write file systems, primarily ZFS/OpenZFS in FreeBSD, Btrfs (B-tree File System) in Linux, and ReFS (Resilient File System) in Windows Server, are unsuited for everyday use for these reasons:

  1. Being copy-on-write (CoW), they create fragmentation. When data in a CoW file system needs to be modified, instead of directly overwriting the existing data, the file system creates a copy of the data, and the modifications are made to the new copy of the data. Eventually, the metadata structures are updated to point to the new copy. Changes being made atomically as part of a transaction, either all of them are applied successfully, or none is applied. It’s only after the new copy is confirmed as successfully written that the old copy can be deleted or deallocated (discarded). That’s fragmentation, pure and simple.
  2. Being copy-on-write (CoW), they are sensible to low free disk space. At least 5-10% of the filesystem should be kept free for them to perform as expected, which is ridiculous. I’m not sure about OpenZFS (which is mostly used on servers), but Btrfs is not happy when the disk is almost full.
  3. Being copy-on-write (CoW), no matter what they say about having been tuned to spare the life of the SSDs (discard until the next TRIM), they definitely shorten a SSD’s life, and might even perform much worse than other filesystems on a SSD, if they really try not to kill them. Heck, there are even NVMe SSDs that do not work with ZFS!

This being said, and journaling aside, I wished someone extended exFAT to something more suitable for a reliable filesystem. It’s simple and decent. Also, I wished there was more love for JFS, which was my choice in the past (yes, it only journals the metadata, but this keeps if fast, and it doesn’t kill the SSDs). Meanwhile, I have to live with ext4 which, to be frank, I abhor. (I never liked XFS, absolutely never. It also only journals the metadata, yet people prefer it to JFS. In the event of a power failure, it’s said that XFS can leave you with more bogus files than JFS, but XFS is the filesystem of choice in RHEL. Go figure. As for ReiserFS, you know the story. What if Linus Torvalds kills someone, should we stop using Linux altogether, and stop maintaining it?)